Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 615 Bytes

README.md

File metadata and controls

17 lines (11 loc) · 615 Bytes

Image-text OCR

This is a project to use Vision-text models to recognize the text in pdf files. Currently, only GOT_OCR2.0 is supported. The goal of this project is to extract the text from the images and then convert it into a formatted text file (LaTeX, HTML, Markdown).

Requirements

  • Python 3.10+
  • Pytorch 2.4+

Usage

  1. Create a python virtual environment: python -m venv venv
  2. Install the required packages: pip install -r requirements.txt
  3. Start the server: python server.py
  4. Process a file: python main.py docs/test1.pdf

The converted files will be saved in the docs directory.