Python tools for performing various operations on ALTO XML files
You can install from PyPI by running
pip install alto-tools
or clone the repository, enter it and run
pip install .
alto-tools <INPUT> [OPTION]
INPUT
should be the path to an ALTO xml file or directory containing ALTO xml files.
The following OPTIONS
are currently supported:
OPTION | Description |
---|---|
-t --text |
Extract UTF-8 encoded text content |
-c --confidence |
Extract mean OCR word confidence score |
-i --illustrations |
Extract bounding box coordinates of <Illustration> elements |
-g --graphics |
Extract bounding box coordinates of <GraphicalElement> elements |
-s --statistics |
Extract statistical info (no. of textlines, words, glyphs etc.) |
All output is sent to stdout
.