Skip to content

A set of utility classes and functions to process documents with Python

License

Notifications You must be signed in to change notification settings

reclamador/document_clipper

Repository files navigation

document-clipper

Documentation Status Updates https://coveralls.io/repos/github/reclamador/document_clipper/badge.svg?branch=master

A set of utility classes and functions to process documents with Python

Installation

The document_clipper package uses libraries that relies on several command-line tools included in the poppler-utils package such as: - pdftohtml - pdfimages - pftocairo

Before attempting to use document_clipper, please install the poppler-utils package.

For instance, in Ubuntu, you may do so by running the following command:

$ sudo apt-get install poppler-utils

Then, you may install document_clipper as usual via Python package managers, such as PIP:

$ pip install document_clipper

Features

  • Fetch the number of pages associated to a PDF file.
  • Extract the coordinates and dimensions of a given text located in a PDF file.
  • Combine multiple PDFs into a single PDF.
  • Combine multiple PDF and image files into a single PDF.
  • Generate a new PDF file containing a subset of a provided source PDF file's pages. Rotations can be applied to each page individually.
  • Optionally fix the document(s) involved in the slicing/merging processes beforehand.