-
Notifications
You must be signed in to change notification settings - Fork 16
/
dpsprep.1
94 lines (93 loc) · 3.6 KB
/
dpsprep.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
.\" generated with Ronn-NG/v0.9.1
.\" http://github.com/apjanke/ronn-ng/tree/0.9.1
.TH "DPS" "1" "July 2024" ""
.SH "NAME"
\fBdps\fR \- a DjVu to PDF converter
.SH "SYNOPSIS"
\fBdpsprep\fR \fIoptions\fR src [dest]
.SH "DESCRIPTION"
This tool, initially made specifically for use with Sony\'s Digital Paper System (DPS), is now a general\-purpose DjVu to PDF converter with a focus on small output size and the ability to preserve document outlines (e\.g\. TOC) and text layers (e\.g\. OCR)\.
.SH "OPTIONS"
.IP "\[ci]" 4
\fB\-q\fR, \fB\-\-quality\fR: Quality of images in output\. Used only for JPEG compression, i\.e\. RGB and Grayscale images\. Passed directly to Pillow and to OCRmyPDF\'s optimizer\.
.IP "\[ci]" 4
\fB\-p\fR, \fB\-\-pool\-size\fR: Size of MultiProcessing pool for handling page\-by\-page operations\.
.IP "\[ci]" 4
\fB\-v\fR, \fB\-\-verbose\fR: Display debug messages\.
.IP "\[ci]" 4
\fB\-o\fR, \fB\-\-overwrite\fR: Overwrite destination file\.
.IP "\[ci]" 4
\fB\-w\fR, \fB\-\-preserve\-working\fR: Preserve the working directory after script termination\.
.IP "\[ci]" 4
\fB\-d\fR, \fB\-\-delete\-working\fR: Delete any existing files in the working directory prior to writing to it\.
.IP "\[ci]" 4
\fB\-t\fR, \fB\-\-no\-text\fR: Disable the generation of text layers\. Implied by \-\-ocr\.
.IP "\[ci]" 4
\fB\-m\fR, \fB\-\-mode\fR: Image mode\. The default is to ask libdjvu for the image mode of every page\. It sometimes makes sense to force bitonal images since they compress well\.
.IP "\[ci]" 4
\fB\-\-ocr\fR Perform OCR via OCRmyPDF rather than trying to convert the text layer\. If this parameter has a value, it should be a JSON dictionary of options to be passed to OCRmyPDF\.
.IP "\[ci]" 4
\fB\-O1\fR: Use the lossless PDF image optimization from OCRmyPDF (without performing OCR)\.
.IP "\[ci]" 4
\fB\-O2\fR: Use the PDF image optimization from OCRmyPDF\.
.IP "\[ci]" 4
\fB\-O3\fR: Use the aggressive lossy PDF image optimization from OCRmyPDF\.
.IP "\[ci]" 4
\fB\-\-help\fR: Show this message and exit\.
.IP "" 0
.SH "EXAMPLES"
Produce \fBfile\.pdf\fR in the current directory:
.IP "" 4
.nf
dpsprep /wherever/file\.djvu
.fi
.IP "" 0
.P
Produce \fBoutput\.pdf\fR with reduced image quality and aggressive PDF image optimizations:
.IP "" 4
.nf
dpsprep \-\-\-quality=30 \-O3 input\.djvu output\.pdf
.fi
.IP "" 0
.P
Produce an output file using a large pool of workers:
.IP "" 4
.nf
dpsprep \-\-pool=16 input\.djvu
.fi
.IP "" 0
.P
Force bitonal images:
.IP "" 4
.nf
dpsprep \-\-mode bitonal input\.djvu
.fi
.IP "" 0
.P
Produce an output file by disregarding the text layer and running OCRmyPDF instead:
.IP "" 4
.nf
dpsprep \-\-ocr \'{"language": ["rus", "eng"]}\' input\.djvu
.fi
.IP "" 0
.P
Or simply disregard the text layer without OCR:
.IP "" 4
.nf
dpsprep \-\-no\-text input\.djvu
.fi
.IP "" 0
.SH "NOTE REGARDING COMPRESSION"
We perform compression in two stages:
.IP "\[ci]" 4
The first one is the default compression provided by Pillow\. For bitonal images, the PDF generation code says that, if \fBlibtiff\fR is available, \fBgroup4\fR compression is used\.
.IP "\[ci]" 4
If OCRmyPDF is installed, its PDF optimization can be used via the flags \fB\-O1\fR to \fB\-O3\fR (this involves no OCR)\. This allows us to use advanced techniques, including JBIG2 compression via \fBjbig2enc\fR\.
.IP "" 0
.P
If manually running OCRmyPDF, note that the optimization command suggested in the documentation (setting \fB\-\-tesseract\-timeout\fR to \fB0\fR) may ruin existing text layers\. To perform only PDF optimization you can use the following undocumented tool instead:
.IP "" 4
.nf
python \-m ocrmypdf\.optimize <input_file> <level> <output_file>
.fi
.IP "" 0