1
0
mirror of https://github.com/pdf2htmlEX/pdf2htmlEX.git synced 2024-10-05 19:41:40 +00:00
pdf2htmlEX/src/CoveredTextDetector.h

69 lines
1.7 KiB
C
Raw Normal View History

2014-06-14 19:44:28 +00:00
/*
* CoveredTextDetector.h
2014-06-14 19:44:28 +00:00
*
* Created on: 2014-6-14
* Author: duanyao
*/
#ifndef COVEREDTEXTDETECTOR_H__
#define COVEREDTEXTDETECTOR_H__
2014-06-14 19:44:28 +00:00
#include <vector>
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
#include "Param.h"
#include <cairo.h>
2014-06-14 19:44:28 +00:00
namespace pdf2htmlEX {
/**
* Detect characters that are covered by non-char graphics on a page.
*/
class CoveredTextDetector
2014-06-14 19:44:28 +00:00
{
public:
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
CoveredTextDetector(Param & param);
2014-06-14 19:44:28 +00:00
/**
* Reset to initial state. Should be called when start drawing a page.
*/
void reset();
/**
* Add a drawn character's bounding box.
* @param bbox (x0, y0, x1, y1)
*/
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
void add_char_bbox(cairo_t *, double * bbox);
2014-06-14 19:44:28 +00:00
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
void add_char_bbox_clipped(cairo_t *,double * bbox, int pts_covered);
2014-06-14 19:44:28 +00:00
/**
* Add a drawn non-char graphics' bounding box.
* If it intersects any previously drawn char's bbox, the char is marked as covered
* and treated as an non-char.
* @param bbox (x0, y0, x1, y1)
* @param index this graphics' drawing order: assume it is drawn after (index-1)th
* char. -1 means after the last char.
*/
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
void add_non_char_bbox(cairo_t *cairo, double * bbox, int what);
2014-06-14 19:44:28 +00:00
/**
* An array of flags indicating whether a char is covered by any non-char graphics.
* Index by the order that these chars are added.
* This vector grows as add_char_bbox() is called, so its size is the count
* of currently drawn chars.
*/
const std::vector<bool> & get_chars_covered() { return chars_covered; }
private:
std::vector<bool> chars_covered;
// x00, y00, x01, y01; x10, y10, x11, y11;...
std::vector<double> char_bboxes;
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
std::vector<int> char_pts_visible;
Param & param;
2014-06-14 19:44:28 +00:00
};
}
#endif /* COVEREDTEXTDETECTOR_H__ */