pdf2htmlEX/README.md

86 lines
5.1 KiB
Markdown
Raw Permalink Normal View History

# ![](https://pdf2htmlEX.github.io/pdf2htmlEX/images/pdf2htmlEX-64x64.png) pdf2htmlEX
[![Build Status](https://travis-ci.org/pdf2htmlEX/pdf2htmlEX.svg?branch=master)](https://travis-ci.org/pdf2htmlEX/pdf2htmlEX)
2013-01-25 13:11:27 +00:00
2019-12-13 14:33:16 +00:00
# Differences from upstream pdf2htmlEX:
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
2020-09-04 17:27:00 +00:00
This is my branch of pdf2htmlEX which aims to allow an open collaboration to help keep the project active. A number of changes and improvements have been incorporated from other forks:
New master (#2) * Show header in font map files * fix a usage of unique_ptr with array * Added '--quiet' argument to hide progress messages (resolves #503) * Revert cout messages to cerr (see #622) * bump version * fix build; fix some coverity warnings * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Many bug fixes and improvements, including: - Incorporated latest Cairo files from cairo-0.15.2 - Moved build to out-of-source - Added clean script - Rewritten correct_text_visibility option to improve accuracy - Transparent characters drawn on background layer - Improved bad unicode detection * Rationlise DPI to single number. Implement actual_dpi - clamp maximum background image size in cases of huge PDF pages * DPI fixes - increase DPI when partially covered text to covered-text-dpi Add font-style italic for oblique fonts Reduce char bbox for occlusion tests * Don't shrink bbox - not required if zoom=25 used * Ignore occlusion from stroke/fill with opacity < 0.5 Better compute char bbox for occlusion Use 10% inset for char bbox for occlusion Back out adding font-weight: bold to potentially bold fonts Fix bug to ensure CID ascent/descent matches subfont values * Removed zero char logging * Remove forced italic - missing italic is due to fontforge bug which needs fixing * Typos fixed, readme updated * Typos * Increase maximum background image width Fix private use range to avoid stupid mobile safari switching to emoji font * included -pthread switch to link included 3rdparty poppler files. * Updated files from poppler 0.59.0 and adjusted includes. * Support updated "Object" class from poppler 0.59.0
2018-01-10 19:31:38 +00:00
* Lots of bugs fixes, mostly of edge cases
* Integration of latest Cairo code
* Out of source building
* Rewritten handling of obscured/partially obscured text - now much more accurate
* Some support for transparent text
* Improvement of DPI settings - clamping of DPI to ensure output graphic isn't too big
`--correct-text-visibility` tracks the visibility of 4 sample points for each character (currently the 4 corners of the character's bounding box, inset slightly) to determine visibility.
It now has two modes. 1 = Fully occluded text handled (i.e. doesn't get put into the HTML layer). 2 = Partially occluded text handled.
The default is now "1", so fully occluded text should no longer show through. If "2" is selected then if the character is partially occluded it will be drawn in the background layer. In this case, the rendered DPI of the page will be automatically increased to `--covered-text-dpi` (default: 300) to reduce the impact of rasterized text.
For maximum accuracy I strongly recommend using the output options: `--font-size-multiplier 1 --zoom 25`. This will circumvent rounding errors inside web browsers. You will then have to scale down the resulting HTML page using an appropriate "scale" transform.
If you are concerned about file size of the resulting HTML, then I recommend patching fontforge to prevent it writing the current time into the dumped fonts, and then post-process the pdf2htmlEX data to remove duplicate files - there will usually be many duplicate background images and fonts.
>一图胜千言<br>A beautiful demo is worth a thousand words
2012-08-04 18:25:47 +00:00
2018-01-22 16:26:30 +00:00
- **Bible de Genève, 1564** (fonts and typography): [HTML](https://pdf2htmlEX.github.io/pdf2htmlEX/demo/geneve.html) / [PDF](https://github.com/raphink/geneve_1564/releases/download/2015-07-08_01/geneve_1564.pdf)
- **Cheat Sheet** (math formulas): [HTML](https://pdf2htmlEX.github.io/pdf2htmlEX/demo/cheat.html) / [PDF](http://www.tug.org/texshowcase/cheat.pdf)
- **Scientific Paper** (text and figures): [HTML](https://pdf2htmlEX.github.io/pdf2htmlEX/demo/demo.html) / [PDF](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.148.349&rep=rep1&type=pdf)
- **Full Circle Magazine** (read while downloading): [HTML](https://pdf2htmlEX.github.io/pdf2htmlEX/demo/issue65_en.html) / [PDF](http://dl.fullcirclemagazine.org/issue65_en.pdf)
- **Git Manual** (CJK support): [HTML](https://pdf2htmlEX.github.io/pdf2htmlEX/demo/chn.html) / [PDF](http://files.cnblogs.com/phphuaibei/git%E6%90%AD%E5%BB%BA.pdf)
2013-03-19 11:00:40 +00:00
2013-01-31 11:25:53 +00:00
pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies.
2014-06-19 14:50:44 +00:00
Academic papers with lots of formulas and figures? Magazines with complicated layouts? No problem!
2012-08-04 18:03:53 +00:00
2018-01-22 16:26:30 +00:00
pdf2htmlEX is also an [online publishing tool](https://pdf2htmlEX.github.io/pdf2htmlEX/doc/tb108wang.html) which is flexible for many different use cases.
2012-08-28 09:54:27 +00:00
2018-01-22 15:35:37 +00:00
Learn more about [who](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Use-Cases) and [why](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Introduction) should use pdf2htmlEX.
2012-12-12 08:54:48 +00:00
2013-05-10 14:25:05 +00:00
### Features
2012-08-04 18:03:53 +00:00
2014-06-18 21:30:53 +00:00
* Native HTML text with precise font and location.
* Flexible output: all-in-one HTML or on demand page loading (needs JavaScript).
* Moderate file size, sometimes even smaller than PDF.
2018-01-22 15:35:37 +00:00
* Supporting links, outlines (bookmarks), printing, SVG background, Type 3 fonts and [more...](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Feature-List)
2013-02-01 08:16:01 +00:00
2018-01-22 15:35:37 +00:00
[Compare to others](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Comparison)
2012-08-04 18:03:53 +00:00
2016-12-11 12:27:50 +00:00
### Portals
2018-01-22 15:35:37 +00:00
* [:house:Wiki Home](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki)
* [Download](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Download) & [Building](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Building)
* [Quick Start](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Quick-Start)
* [Report Issues / Ask for Help](https://github.com/pdf2htmlEX/pdf2htmlEX/blob/master/CONTRIBUTING.md#guidance)
* [:question:FAQ](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/FAQ)
2014-06-18 22:31:55 +00:00
* [:envelope:Mailing List](https://groups.google.com/forum/#!forum/pdf2htmlex)
* [:mahjong:中文邮件列表](https://groups.google.com/forum/#!forum/pdf2htmlex-cn)
2016-12-11 12:27:50 +00:00
2013-05-10 14:25:05 +00:00
### LICENSE
2012-08-04 18:03:53 +00:00
2014-07-14 00:16:36 +00:00
pdf2htmlEX, as a whole package, is licensed under GPLv3+.
2013-10-16 10:21:31 +00:00
Some resource files are released with relaxed licenses, read `LICENSE` for more details.
2012-08-04 18:03:53 +00:00
2013-05-10 14:38:59 +00:00
### Acknowledgements
2012-08-28 09:54:27 +00:00
2012-09-21 13:38:23 +00:00
pdf2htmlEX is made possible thanks to the following projects:
2012-09-21 13:35:27 +00:00
2012-09-21 13:38:23 +00:00
* [poppler](http://poppler.freedesktop.org/)
* [Fontforge](http://fontforge.org/)
2012-09-21 13:35:27 +00:00
2020-07-02 18:31:18 +00:00
[![Testing Powered By SauceLabs](https://saucelabs.github.io/images/opensauce/powered-by-saucelabs-badge-gray.png?sanitize=true "Testing Powered By SauceLabs")](https://saucelabs.com)
2012-09-18 16:45:20 +00:00
pdf2htmlEX is inspired by the following projects:
2012-08-04 18:03:53 +00:00
2014-06-18 22:31:55 +00:00
* pdftohtml from poppler
2012-08-11 11:55:06 +00:00
* MuPDF
2012-08-04 18:03:53 +00:00
* PDF.js
* Crocodoc
* Google Doc
2013-05-10 14:25:05 +00:00
#### Special Thanks
2012-08-28 09:54:27 +00:00
2013-02-05 14:53:14 +00:00
* Hongliang Tian
* Wanmin Liu