PDF-shape

PDF-shape is a Rust library dedicated to analyse XML files produced by pdf2xml

Features

Implemented :

Alignement extraction
Coordinates extraction
Shape extraction
Spacing extraction
Style extraction
Blocks extraction (get all the block elements of a given document)
Texts extraction (get all the text elements of a given document)
Tokens extraction (get all the token elements of a given document)

Not implemented yet:

Line detection
Column detection
Paragraph detection
Blocks detection

Examples

You can run the example with :

cargo run --example=main

Documentation

You can build the documentation with :

cargo doc --open --lib --no-deps

Layout detection

Shape (width/heigt) and spacing (vertical/horizontal)

The following diagram represents the shape of objects/set of objects and the spacing between them

Line detection

A line is a set of objects sharing the same base or a set of objects which are horizontally aligned. Horizontal spacing between objects shouldn't be greater than the horizontal spacing mode of the document.

Columns detection

Paragraph detection

A paragraph is a set of lines that are equally spaced vertically. In most cases the paragraph spacing should be greater than the document line spacing. Each paragraph lines have to be vertically aligned.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
images		images
src		src
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-shape

Features

Examples

Documentation

Layout detection

Shape (width/heigt) and spacing (vertical/horizontal)

Line detection

Columns detection

Paragraph detection

Orphans detection

About

Releases

Packages

Languages

License

eonm-pro/pdf-shape

Folders and files

Latest commit

History

Repository files navigation

PDF-shape

Features

Examples

Documentation

Layout detection

Shape (width/heigt) and spacing (vertical/horizontal)

Line detection

Columns detection

Paragraph detection

Orphans detection

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages