Skip to content

Containerised version of tesseract v4 tools required for training a new font

Notifications You must be signed in to change notification settings

artdevgame/tesseract-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tesseract Trainer

A containerised version of the tools required to train/fine tune Tesseract for a new font.

Based on: https://www.youtube.com/watch?v=TpD76k2HYms

How to use

  1. Clone this repo (git clone https://github.com/artdevgame/tesseract-trainer.git)
  2. Copy your selected font into the src/fonts directory
  3. Configure docker-compose.yml with your preferences (see below)
  4. Download and install Docker for your OS (https://www.docker.com/products/docker-desktop)
  5. From the project root directory, run docker-compose up
  6. After the process has finished, you will have a final.traineddata in the src/output directory. Use this in your Tesseract project

Configuration

Change the following environment values in docker-compose.yml:

Property Example Description
TESSTRAIN_FONT Agency FB Condensed The name of the font (not the filename)
TESSTRAIN_LANG eng The language of the training data
TESSTRAIN_MAX_PAGES 10 Training text size
TESSTRAIN_MAX_ITERATIONS 400 Number of iterations for the neural network, more will give a better result but may also lead to overfitting (bad)

About

Containerised version of tesseract v4 tools required for training a new font

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published