pdf2htmlEX/pdf2htmlEX.1

.TH pdf2htmlEX 1 "Aug 31, 2012" "pdf2htmlEX 0.1"
.SH NAME
.PP
.nf
  pdf2htmlEX \- converts PDF to HTML without losing text and format.
.fi

.SH USAGE
.PP
.nf
  pdf2htmlEX [options] <input\-filename> [<output\-filename>]
.fi

.SH DESCRIPTION
.PP
pdf2htmlEX is a utility that converts PDF files to HTML files.

pdf2htmlEX tries its best to render the PDF precisely, maintain proper styling, while retaining text and optmizing for Web.

Fonts are extracted form PDF and then embedded into HTML (Type 3 fonts are not supported). Text in the converted HTML file is usually selectable and copyable. 

Other objects are rendered as images and also embedded.

.SH OPTIONS
.TP
.B --help
Show all options
.TP
.B -v, --version
Show copyright and version
.TP
.B -o, --owner-password <password>
Specify owner password
.TP
.B -u, --user-password <password>
Specify user password
.TP
.B --dest-dir <dir> (Default: ".")
Specify destination folder
.TP
.B --tmp-dir <dir> (Default: "/tmp/pdf2htmlEX")
Specify a folder for intermediate files
.TP
.B -f, --first-page <num> (Default: 1)
Specify the first page to process
.TP
.B -l, --last-page <num> (Default: last page)
Specify the last page to process
.TP
.B --zoom <ratio> (Default: 1.0)
Specify the zoom ratio of the HTML file
.TP
.B --hpdi <dpi>, --vpdi <dpi> (Default: 144)
Specify the horizontal and vertical DPI for images
.TP
.B --process-nontext <0|1> (Default: 1)
Whether to process non-text objects (as images)
.TP
.B --single-html <0|1> (Default: 1)
Whether to embed everything into one HTML file.

If switched out, there will be several files generated along with the HTML file including files for fonts, css, images.
.TP
.B --embed-base-font <0|1> (Default: 1)
Whether to embed base 14 fonts.

There are several base font defined in PDF standards, which are supposed to be provided by the PDF reader.

If this switch is on, local matched font will be used and embedded; otherwise only font names are exported such that web browsers may try to find proper fonts themselves.
.TP
.B --embed-external-font <0|1> (Default: 0)
Similar as above but for non-base fonts
.TP
.B --heps <len>, --veps <len> (Default: 1)
Specify the maximum tolerable horizontal/vertical offset (in pixels).

pdf2htmlEX would try to optimize the generated HTML file moving Text within this distance.
.TP
.B --space-threshold <ratio> (Default: 1.0/6)
pdf2htmlEX would insert a whitespace character ' ' if the distance between two consecutive letters in the same line is wider than ratio * font_size
.TP
.B --font-size-multiplier <ratio> (Default: 10)
Many web browsers limit the minimum font size, and many would round the given font size, which results in incorrect rendering.

Specify a ratio greater than 1 would resolve this issue.
.TP
.B --always-apply-tounicode <0|1> (Default: 0)
A ToUnicode map may be provided for fonts in PDF which indicates the 'meaning' of the characters.

However often there is better "ToUnicode" info in Type 1 fonts, and sometimes the ToUnicode map provided is wrong. So by default pdf2htmlEX will find the Unicode value directly from the fonts instead of ToUnicode map. This behavior may be changed by turning on this switch.
.TP
.B --font-suffix <suffix> (Default: ".ttf"), --font-format <format> (Default: "truetype")
Specify the suffix and format of fonts extracted from the PDF file. They should be consistent.
.TP
.B --debug <0|1> (Default: 0)
Show debug information.
.TP
.B --clean-tmp <0|1> (Default: 1)
If switched off, intermediate files won't be cleaned in the end.

.SH EXAMPLE
.TP
.B pdf2htmlEX /path/to/file.pdf
Convert file.pdf into file.html
.TP
.B pdf2htmlEX --tmp-dir tmp --clean-tmp 0 --debug 1 /path/to/file.pdf
Convert file.pdf and leave all intermediate files
.TP
.B pdf2htmlEX --dest-dir out --single-html 0 --debug 1 /path/to/file.pdf
Convert file.pdf into out/file.html and leave font/image files separated

.SH COPYRIGHT
.PP
Copyright 2012 Lu Wang <coolwanglu@gmail.com>

pdf2htmlEX is GPLv2 & GPLv3 dual licensed

.SH AUTHOR
.PP
pdf2htmlEX is written by Lu Wang <coolwanglu@gmail.com>

.SH SEE ALSO
.TP
Home page
http://github.com/coolwanglu/pdf2htmlEX
add manpag 2012-08-31 10:31:43 +00:00			`.TH pdf2htmlEX 1 "Aug 31, 2012" "pdf2htmlEX 0.1"`
			`.SH NAME`
			`.PP`
			`.nf`
update manpage 2012-08-31 13:47:08 +00:00			`pdf2htmlEX \- converts PDF to HTML without losing text and format.`
add manpag 2012-08-31 10:31:43 +00:00			`.fi`

			`.SH USAGE`
			`.PP`
			`.nf`
			`pdf2htmlEX [options] <input\-filename> [<output\-filename>]`
			`.fi`

			`.SH DESCRIPTION`
			`.PP`
			`pdf2htmlEX is a utility that converts PDF files to HTML files.`

updating manpage 2012-08-31 13:20:04 +00:00			`pdf2htmlEX tries its best to render the PDF precisely, maintain proper styling, while retaining text and optmizing for Web.`

			`Fonts are extracted form PDF and then embedded into HTML (Type 3 fonts are not supported). Text in the converted HTML file is usually selectable and copyable.`

			`Other objects are rendered as images and also embedded.`
add manpag 2012-08-31 10:31:43 +00:00
			`.SH OPTIONS`
			`.TP`
			`.B --help`
updating manpage 2012-08-31 13:20:04 +00:00			`Show all options`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
			`.B -v, --version`
updating manpage 2012-08-31 13:20:04 +00:00			`Show copyright and version`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
			`.B -o, --owner-password <password>`
updating manpage 2012-08-31 13:20:04 +00:00			`Specify owner password`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
			`.B -u, --user-password <password>`
updating manpage 2012-08-31 13:20:04 +00:00			`Specify user password`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
updating manpage 2012-08-31 13:20:04 +00:00			`.B --dest-dir <dir> (Default: ".")`
			`Specify destination folder`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
updating manpage 2012-08-31 13:20:04 +00:00			`.B --tmp-dir <dir> (Default: "/tmp/pdf2htmlEX")`
			`Specify a folder for intermediate files`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
updating manpage 2012-08-31 13:20:04 +00:00			`.B -f, --first-page <num> (Default: 1)`
			`Specify the first page to process`
			`.TP`
			`.B -l, --last-page <num> (Default: last page)`
			`Specify the last page to process`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
updating manpage 2012-08-31 13:20:04 +00:00			`.B --zoom <ratio> (Default: 1.0)`
			`Specify the zoom ratio of the HTML file`
			`.TP`
			`.B --hpdi <dpi>, --vpdi <dpi> (Default: 144)`
			`Specify the horizontal and vertical DPI for images`
			`.TP`
			`.B --process-nontext <0\|1> (Default: 1)`
			`Whether to process non-text objects (as images)`
			`.TP`
			`.B --single-html <0\|1> (Default: 1)`
			`Whether to embed everything into one HTML file.`
add manpag 2012-08-31 10:31:43 +00:00
updating manpage 2012-08-31 13:20:04 +00:00			`If switched out, there will be several files generated along with the HTML file including files for fonts, css, images.`
add manpag 2012-08-31 10:31:43 +00:00			`.TP`
updating manpage 2012-08-31 13:20:04 +00:00			`.B --embed-base-font <0\|1> (Default: 1)`
			`Whether to embed base 14 fonts.`
add manpag 2012-08-31 10:31:43 +00:00
updating manpage 2012-08-31 13:20:04 +00:00			`There are several base font defined in PDF standards, which are supposed to be provided by the PDF reader.`

update manpage 2012-08-31 13:47:08 +00:00			`If this switch is on, local matched font will be used and embedded; otherwise only font names are exported such that web browsers may try to find proper fonts themselves.`
			`.TP`
			`.B --embed-external-font <0\|1> (Default: 0)`
			`Similar as above but for non-base fonts`
			`.TP`
			`.B --heps <len>, --veps <len> (Default: 1)`
			`Specify the maximum tolerable horizontal/vertical offset (in pixels).`

			`pdf2htmlEX would try to optimize the generated HTML file moving Text within this distance.`
			`.TP`
			`.B --space-threshold <ratio> (Default: 1.0/6)`
			`pdf2htmlEX would insert a whitespace character ' ' if the distance between two consecutive letters in the same line is wider than ratio * font_size`
			`.TP`
			`.B --font-size-multiplier <ratio> (Default: 10)`
			`Many web browsers limit the minimum font size, and many would round the given font size, which results in incorrect rendering.`

			`Specify a ratio greater than 1 would resolve this issue.`
			`.TP`
			`.B --always-apply-tounicode <0\|1> (Default: 0)`
			`A ToUnicode map may be provided for fonts in PDF which indicates the 'meaning' of the characters.`

install manpage 2012-08-31 13:52:50 +00:00			`However often there is better "ToUnicode" info in Type 1 fonts, and sometimes the ToUnicode map provided is wrong. So by default pdf2htmlEX will find the Unicode value directly from the fonts instead of ToUnicode map. This behavior may be changed by turning on this switch.`
update manpage 2012-08-31 13:47:08 +00:00			`.TP`
			`.B --font-suffix <suffix> (Default: ".ttf"), --font-format <format> (Default: "truetype")`
			`Specify the suffix and format of fonts extracted from the PDF file. They should be consistent.`
			`.TP`
			`.B --debug <0\|1> (Default: 0)`
			`Show debug information.`
			`.TP`
			`.B --clean-tmp <0\|1> (Default: 1)`
			`If switched off, intermediate files won't be cleaned in the end.`
add manpag 2012-08-31 10:31:43 +00:00
update manpage 2012-08-31 13:47:08 +00:00			`.SH EXAMPLE`
			`.TP`
			`.B pdf2htmlEX /path/to/file.pdf`
			`Convert file.pdf into file.html`
			`.TP`
			`.B pdf2htmlEX --tmp-dir tmp --clean-tmp 0 --debug 1 /path/to/file.pdf`
			`Convert file.pdf and leave all intermediate files`
			`.TP`
			`.B pdf2htmlEX --dest-dir out --single-html 0 --debug 1 /path/to/file.pdf`
			`Convert file.pdf into out/file.html and leave font/image files separated`

			`.SH COPYRIGHT`
			`.PP`
			`Copyright 2012 Lu Wang <coolwanglu@gmail.com>`

			`pdf2htmlEX is GPLv2 & GPLv3 dual licensed`

			`.SH AUTHOR`
			`.PP`
			`pdf2htmlEX is written by Lu Wang <coolwanglu@gmail.com>`

			`.SH SEE ALSO`
			`.TP`
			`Home page`
			`http://github.com/coolwanglu/pdf2htmlEX`