diff --git a/TODO b/TODO index 9a3eee1..0184f18 100644 --- a/TODO +++ b/TODO @@ -1,4 +1,4 @@ -bug found in baidu & github +bug found in baidu & github & mail argument auto-completion diff --git a/pdf2htmlEX.1 b/pdf2htmlEX.1 deleted file mode 100644 index 292c3bb..0000000 --- a/pdf2htmlEX.1 +++ /dev/null @@ -1,136 +0,0 @@ -.TH pdf2htmlEX 1 "Aug 31, 2012" "pdf2htmlEX 0.1" -.SH NAME -.PP -.nf - pdf2htmlEX \- converts PDF to HTML without losing text and format. -.fi - -.SH USAGE -.PP -.nf - pdf2htmlEX [options] [] -.fi - -.SH DESCRIPTION -.PP -pdf2htmlEX is a utility that converts PDF files to HTML files. - -pdf2htmlEX tries its best to render the PDF precisely, maintain proper styling, while retaining text and optmizing for Web. - -Fonts are extracted form PDF and then embedded into HTML (Type 3 fonts are not supported). Text in the converted HTML file is usually selectable and copyable. - -Other objects are rendered as images and also embedded. - -.SH OPTIONS -.TP -.B --help -Show all options -.TP -.B -v, --version -Show copyright and version -.TP -.B -o, --owner-password -Specify owner password -.TP -.B -u, --user-password -Specify user password -.TP -.B --dest-dir (Default: ".") -Specify destination folder -.TP -.B -f, --first-page (Default: 1) -Specify the first page to process -.TP -.B -l, --last-page (Default: last page) -Specify the last page to process -.TP -.B --zoom (Default: 1.0) -Specify the zoom ratio of the HTML file -.TP -.B --hpdi , --vpdi (Default: 144) -Specify the horizontal and vertical DPI for images -.TP -.B --process-nontext <0|1> (Default: 1) -Whether to process non-text objects (as images) -.TP -.B --single-html <0|1> (Default: 1) -Whether to embed everything into one HTML file. - -If switched out, there will be several files generated along with the HTML file including files for fonts, css, images. -.TP -.B --embed-base-font <0|1> (Default: 1) -Whether to embed base 14 fonts. - -There are several base font defined in PDF standards, which are supposed to be provided by the PDF reader. - -If this switch is on, local matched font will be used and embedded; otherwise only font names are exported such that web browsers may try to find proper fonts themselves. -.TP -.B --embed-external-font <0|1> (Default: 0) -Similar as above but for non-base fonts. -.TP -.B --decompose-ligature <0|1> (Default: 0) -Decompose ligatures. For example 'fi' -> 'f''i'. -.TP -.B --heps , --veps (Default: 1) -Specify the maximum tolerable horizontal/vertical offset (in pixels). - -pdf2htmlEX would try to optimize the generated HTML file moving Text within this distance. -.TP -.B --space-threshold (Default: 1.0/6) -pdf2htmlEX would insert a whitespace character ' ' if the distance between two consecutive letters in the same line is wider than ratio * font_size. -.TP -.B --font-size-multiplier (Default: 10) -Many web browsers limit the minimum font size, and many would round the given font size, which results in incorrect rendering. - -Specify a ratio greater than 1 would resolve this issue. - -For some versions of Firefox, however, there will be a problem when the font size is too large, in which case a smaller value should be specified here. -.TP -.B --tounicode <-1|0|1> (Default: 0) -A ToUnicode map may be provided for each font in PDF which indicates the 'meaning' of the characters. However often there is better "ToUnicode" info in Type 0/1 fonts, and sometimes the ToUnicode map provided is wrong. - -If this value is set to 1, the ToUnicode Map is always applied, if provided in PDF, and characters may not render correctly in HTML if there are collisions. - -If set to -1, a customized map is used such that rendering will be correct in HTML (visually the same), but you may not get correct characters by select & copy & paste. - -If set to 0, pdf2htmlEX would try it best to balance the two methods above. -.TP -.B --space-as-offset <0|1> (Default: 0) -Treat space characters as offsets, which may increase the size of the output. - -Turn it on if space characters are not displayed correctly, or you want to remove positional spaces. -.TP -.B --font-suffix (Default: ".ttf"), --font-format (Default: "truetype") -Specify the suffix and format of fonts extracted from the PDF file. They should be consistent. -.TP -.B --debug <0|1> (Default: 0) -Show debug information. -.TP -.B --clean-tmp <0|1> (Default: 1) -If switched off, intermediate files won't be cleaned in the end. - -.SH EXAMPLE -.TP -.B pdf2htmlEX /path/to/file.pdf -Convert file.pdf into file.html -.TP -.B pdf2htmlEX --tmp-dir tmp --clean-tmp 0 --debug 1 /path/to/file.pdf -Convert file.pdf and leave all intermediate files. -.TP -.B pdf2htmlEX --dest-dir out --single-html 0 --debug 1 /path/to/file.pdf -Convert file.pdf into out/file.html and leave font/image files separated. - -.SH COPYRIGHT -.PP -Copyright 2012 Lu Wang - -pdf2htmlEX is GPLv2 & GPLv3 dual licensed - -.SH AUTHOR -.PP -pdf2htmlEX is written by Lu Wang - -.SH SEE ALSO -.TP -Home page -http://github.com/coolwanglu/pdf2htmlEX