mirror of
https://github.com/pdf2htmlEX/pdf2htmlEX.git
synced 2024-12-21 20:50:07 +00:00
..
This commit is contained in:
parent
d49dbc3deb
commit
6961e040c6
2
TODO
2
TODO
@ -1,4 +1,4 @@
|
||||
bug found in baidu & github
|
||||
bug found in baidu & github & mail
|
||||
|
||||
argument auto-completion
|
||||
|
||||
|
136
pdf2htmlEX.1
136
pdf2htmlEX.1
@ -1,136 +0,0 @@
|
||||
.TH pdf2htmlEX 1 "Aug 31, 2012" "pdf2htmlEX 0.1"
|
||||
.SH NAME
|
||||
.PP
|
||||
.nf
|
||||
pdf2htmlEX \- converts PDF to HTML without losing text and format.
|
||||
.fi
|
||||
|
||||
.SH USAGE
|
||||
.PP
|
||||
.nf
|
||||
pdf2htmlEX [options] <input\-filename> [<output\-filename>]
|
||||
.fi
|
||||
|
||||
.SH DESCRIPTION
|
||||
.PP
|
||||
pdf2htmlEX is a utility that converts PDF files to HTML files.
|
||||
|
||||
pdf2htmlEX tries its best to render the PDF precisely, maintain proper styling, while retaining text and optmizing for Web.
|
||||
|
||||
Fonts are extracted form PDF and then embedded into HTML (Type 3 fonts are not supported). Text in the converted HTML file is usually selectable and copyable.
|
||||
|
||||
Other objects are rendered as images and also embedded.
|
||||
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
.B --help
|
||||
Show all options
|
||||
.TP
|
||||
.B -v, --version
|
||||
Show copyright and version
|
||||
.TP
|
||||
.B -o, --owner-password <password>
|
||||
Specify owner password
|
||||
.TP
|
||||
.B -u, --user-password <password>
|
||||
Specify user password
|
||||
.TP
|
||||
.B --dest-dir <dir> (Default: ".")
|
||||
Specify destination folder
|
||||
.TP
|
||||
.B -f, --first-page <num> (Default: 1)
|
||||
Specify the first page to process
|
||||
.TP
|
||||
.B -l, --last-page <num> (Default: last page)
|
||||
Specify the last page to process
|
||||
.TP
|
||||
.B --zoom <ratio> (Default: 1.0)
|
||||
Specify the zoom ratio of the HTML file
|
||||
.TP
|
||||
.B --hpdi <dpi>, --vpdi <dpi> (Default: 144)
|
||||
Specify the horizontal and vertical DPI for images
|
||||
.TP
|
||||
.B --process-nontext <0|1> (Default: 1)
|
||||
Whether to process non-text objects (as images)
|
||||
.TP
|
||||
.B --single-html <0|1> (Default: 1)
|
||||
Whether to embed everything into one HTML file.
|
||||
|
||||
If switched out, there will be several files generated along with the HTML file including files for fonts, css, images.
|
||||
.TP
|
||||
.B --embed-base-font <0|1> (Default: 1)
|
||||
Whether to embed base 14 fonts.
|
||||
|
||||
There are several base font defined in PDF standards, which are supposed to be provided by the PDF reader.
|
||||
|
||||
If this switch is on, local matched font will be used and embedded; otherwise only font names are exported such that web browsers may try to find proper fonts themselves.
|
||||
.TP
|
||||
.B --embed-external-font <0|1> (Default: 0)
|
||||
Similar as above but for non-base fonts.
|
||||
.TP
|
||||
.B --decompose-ligature <0|1> (Default: 0)
|
||||
Decompose ligatures. For example 'fi' -> 'f''i'.
|
||||
.TP
|
||||
.B --heps <len>, --veps <len> (Default: 1)
|
||||
Specify the maximum tolerable horizontal/vertical offset (in pixels).
|
||||
|
||||
pdf2htmlEX would try to optimize the generated HTML file moving Text within this distance.
|
||||
.TP
|
||||
.B --space-threshold <ratio> (Default: 1.0/6)
|
||||
pdf2htmlEX would insert a whitespace character ' ' if the distance between two consecutive letters in the same line is wider than ratio * font_size.
|
||||
.TP
|
||||
.B --font-size-multiplier <ratio> (Default: 10)
|
||||
Many web browsers limit the minimum font size, and many would round the given font size, which results in incorrect rendering.
|
||||
|
||||
Specify a ratio greater than 1 would resolve this issue.
|
||||
|
||||
For some versions of Firefox, however, there will be a problem when the font size is too large, in which case a smaller value should be specified here.
|
||||
.TP
|
||||
.B --tounicode <-1|0|1> (Default: 0)
|
||||
A ToUnicode map may be provided for each font in PDF which indicates the 'meaning' of the characters. However often there is better "ToUnicode" info in Type 0/1 fonts, and sometimes the ToUnicode map provided is wrong.
|
||||
|
||||
If this value is set to 1, the ToUnicode Map is always applied, if provided in PDF, and characters may not render correctly in HTML if there are collisions.
|
||||
|
||||
If set to -1, a customized map is used such that rendering will be correct in HTML (visually the same), but you may not get correct characters by select & copy & paste.
|
||||
|
||||
If set to 0, pdf2htmlEX would try it best to balance the two methods above.
|
||||
.TP
|
||||
.B --space-as-offset <0|1> (Default: 0)
|
||||
Treat space characters as offsets, which may increase the size of the output.
|
||||
|
||||
Turn it on if space characters are not displayed correctly, or you want to remove positional spaces.
|
||||
.TP
|
||||
.B --font-suffix <suffix> (Default: ".ttf"), --font-format <format> (Default: "truetype")
|
||||
Specify the suffix and format of fonts extracted from the PDF file. They should be consistent.
|
||||
.TP
|
||||
.B --debug <0|1> (Default: 0)
|
||||
Show debug information.
|
||||
.TP
|
||||
.B --clean-tmp <0|1> (Default: 1)
|
||||
If switched off, intermediate files won't be cleaned in the end.
|
||||
|
||||
.SH EXAMPLE
|
||||
.TP
|
||||
.B pdf2htmlEX /path/to/file.pdf
|
||||
Convert file.pdf into file.html
|
||||
.TP
|
||||
.B pdf2htmlEX --tmp-dir tmp --clean-tmp 0 --debug 1 /path/to/file.pdf
|
||||
Convert file.pdf and leave all intermediate files.
|
||||
.TP
|
||||
.B pdf2htmlEX --dest-dir out --single-html 0 --debug 1 /path/to/file.pdf
|
||||
Convert file.pdf into out/file.html and leave font/image files separated.
|
||||
|
||||
.SH COPYRIGHT
|
||||
.PP
|
||||
Copyright 2012 Lu Wang <coolwanglu@gmail.com>
|
||||
|
||||
pdf2htmlEX is GPLv2 & GPLv3 dual licensed
|
||||
|
||||
.SH AUTHOR
|
||||
.PP
|
||||
pdf2htmlEX is written by Lu Wang <coolwanglu@gmail.com>
|
||||
|
||||
.SH SEE ALSO
|
||||
.TP
|
||||
Home page
|
||||
http://github.com/coolwanglu/pdf2htmlEX
|
Loading…
Reference in New Issue
Block a user