This is a project to use Vision-text models to recognize the text in pdf files. Currently, only GOT_OCR2.0 is supported. The goal of this project is to extract the text from the images and then convert it into a formatted text file (LaTeX, HTML, Markdown).
- Python 3.10+
- Pytorch 2.4+
- Create a python virtual environment:
python -m venv venv
- Install the required packages:
pip install -r requirements.txt
- Start the server:
python server.py
- Process a file:
python main.py docs/test1.pdf
The converted files will be saved in the docs
directory.