From 1f5ebb1756c20d0cf409e6ae5e490358633cf0e0 Mon Sep 17 00:00:00 2001 From: Lu Wang Date: Tue, 11 Sep 2012 01:21:08 +0800 Subject: [PATCH] .. --- pdf2htmlEX.1 | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 pdf2htmlEX.1 diff --git a/pdf2htmlEX.1 b/pdf2htmlEX.1 new file mode 100644 index 0000000..292c3bb --- /dev/null +++ b/pdf2htmlEX.1 @@ -0,0 +1,136 @@ +.TH pdf2htmlEX 1 "Aug 31, 2012" "pdf2htmlEX 0.1" +.SH NAME +.PP +.nf + pdf2htmlEX \- converts PDF to HTML without losing text and format. +.fi + +.SH USAGE +.PP +.nf + pdf2htmlEX [options] [] +.fi + +.SH DESCRIPTION +.PP +pdf2htmlEX is a utility that converts PDF files to HTML files. + +pdf2htmlEX tries its best to render the PDF precisely, maintain proper styling, while retaining text and optmizing for Web. + +Fonts are extracted form PDF and then embedded into HTML (Type 3 fonts are not supported). Text in the converted HTML file is usually selectable and copyable. + +Other objects are rendered as images and also embedded. + +.SH OPTIONS +.TP +.B --help +Show all options +.TP +.B -v, --version +Show copyright and version +.TP +.B -o, --owner-password +Specify owner password +.TP +.B -u, --user-password +Specify user password +.TP +.B --dest-dir (Default: ".") +Specify destination folder +.TP +.B -f, --first-page (Default: 1) +Specify the first page to process +.TP +.B -l, --last-page (Default: last page) +Specify the last page to process +.TP +.B --zoom (Default: 1.0) +Specify the zoom ratio of the HTML file +.TP +.B --hpdi , --vpdi (Default: 144) +Specify the horizontal and vertical DPI for images +.TP +.B --process-nontext <0|1> (Default: 1) +Whether to process non-text objects (as images) +.TP +.B --single-html <0|1> (Default: 1) +Whether to embed everything into one HTML file. + +If switched out, there will be several files generated along with the HTML file including files for fonts, css, images. +.TP +.B --embed-base-font <0|1> (Default: 1) +Whether to embed base 14 fonts. + +There are several base font defined in PDF standards, which are supposed to be provided by the PDF reader. + +If this switch is on, local matched font will be used and embedded; otherwise only font names are exported such that web browsers may try to find proper fonts themselves. +.TP +.B --embed-external-font <0|1> (Default: 0) +Similar as above but for non-base fonts. +.TP +.B --decompose-ligature <0|1> (Default: 0) +Decompose ligatures. For example 'fi' -> 'f''i'. +.TP +.B --heps , --veps (Default: 1) +Specify the maximum tolerable horizontal/vertical offset (in pixels). + +pdf2htmlEX would try to optimize the generated HTML file moving Text within this distance. +.TP +.B --space-threshold (Default: 1.0/6) +pdf2htmlEX would insert a whitespace character ' ' if the distance between two consecutive letters in the same line is wider than ratio * font_size. +.TP +.B --font-size-multiplier (Default: 10) +Many web browsers limit the minimum font size, and many would round the given font size, which results in incorrect rendering. + +Specify a ratio greater than 1 would resolve this issue. + +For some versions of Firefox, however, there will be a problem when the font size is too large, in which case a smaller value should be specified here. +.TP +.B --tounicode <-1|0|1> (Default: 0) +A ToUnicode map may be provided for each font in PDF which indicates the 'meaning' of the characters. However often there is better "ToUnicode" info in Type 0/1 fonts, and sometimes the ToUnicode map provided is wrong. + +If this value is set to 1, the ToUnicode Map is always applied, if provided in PDF, and characters may not render correctly in HTML if there are collisions. + +If set to -1, a customized map is used such that rendering will be correct in HTML (visually the same), but you may not get correct characters by select & copy & paste. + +If set to 0, pdf2htmlEX would try it best to balance the two methods above. +.TP +.B --space-as-offset <0|1> (Default: 0) +Treat space characters as offsets, which may increase the size of the output. + +Turn it on if space characters are not displayed correctly, or you want to remove positional spaces. +.TP +.B --font-suffix (Default: ".ttf"), --font-format (Default: "truetype") +Specify the suffix and format of fonts extracted from the PDF file. They should be consistent. +.TP +.B --debug <0|1> (Default: 0) +Show debug information. +.TP +.B --clean-tmp <0|1> (Default: 1) +If switched off, intermediate files won't be cleaned in the end. + +.SH EXAMPLE +.TP +.B pdf2htmlEX /path/to/file.pdf +Convert file.pdf into file.html +.TP +.B pdf2htmlEX --tmp-dir tmp --clean-tmp 0 --debug 1 /path/to/file.pdf +Convert file.pdf and leave all intermediate files. +.TP +.B pdf2htmlEX --dest-dir out --single-html 0 --debug 1 /path/to/file.pdf +Convert file.pdf into out/file.html and leave font/image files separated. + +.SH COPYRIGHT +.PP +Copyright 2012 Lu Wang + +pdf2htmlEX is GPLv2 & GPLv3 dual licensed + +.SH AUTHOR +.PP +pdf2htmlEX is written by Lu Wang + +.SH SEE ALSO +.TP +Home page +http://github.com/coolwanglu/pdf2htmlEX