Document Query System

This implementation leverages the Hugging Face Transformers library, streamlining the code for transformer-based models. It currently utilizes the pre-trained "layoutlm-document-qa" pipeline model for question answering tasks on document layouts. This model can be further fine-tuned for enhanced performance. Additionally, "pix2struct-docvqa-large" is another option for structured queries.

The model takes both image and text as input, where the image provides visual context and the text helps correlate with the questions. Currently, support for PDFs and images is included, with easy extensibility for other document types like text files, spreadsheets, and custom document classes.

The first time you run this script, it will download the model checkpoints from Hugging Face, so it may take longer to execute. To reduce the project size, the checkpoint file is not included.

Getting Started

Code is tested with Python 3.10
Setup python virtual env and activate it.

Install python packages.

python -m pip install -r requirements.txt

Install Tesseract-OCR and set PATH. (https://github.com/tesseract-ocr/tesseract)
Download, unzip Poppler and set PATH. (Only requierd for PDF file)

Run Examples

python src/main.py "What is the invoice number?" data/invoice.png
python src/main.py "Who authored this paper?" data/2210.03347v2.pdf
python src/main.py "What is the purchase amount?" data/purchase.jpg

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Query System

Getting Started

About

Releases

Packages

Languages

crajnits/document-query

Folders and files

Latest commit

History

Repository files navigation

Document Query System

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages