Generate text images for training deep learning OCR model (e.g. CRNN).
- Modular design. You can easily add Corpus, Effect, Layout.
- Support generate
lmdb
dataset which compatible with PaddleOCR, see Dataset - Support render multi corpus on image with different font, font size or font color. Layout is responsible for the layout between multiple corpora
- Generate vertical text
- Corpus sampler: helpful to perform character balance
To use text_renderer, you should prepare:
- Font file:
.ttf
or... - Background image
- Text: Optional. Depends on the corpus you use.
- Character set: Optional. Depends on the corpus you use.
Run following command to generate image using example data:
git clone https://github.com/oh-my-ocr/text_renderer
cd text_renderer
python3 setup.py develop
pip3 install -r docker/requirements.txt
python3 main.py \
--config example_data/example.py \
--dataset img \
--num_processes 2 \
--log_period 10
The data is generated in the example_data/output
directory.
main.py
script only has 4 arguments:
- config:Python config file path
- dataset: Dataset format
img
/lmdb
- num_processes: Number of processes used
- log_period: Period of log printing. (0, 100)
All parameters related to the example image generation process are all configured in example.py
Learn more at documentation
Build image
docker build -f docker/Dockerfile -t text_renderer .
Config file is provided by CONFIG
environment.
In example.py
file, data is generated in example_data/output
directory,
so we map this directory to the host.
docker run --rm \
-v `pwd`/example_data/docker_output/:/app/example_data/output \
--env CONFIG=/app/example_data/example.py \
--env DATASET=img \
--env NUM_PROCESSES=2 \
--env LOG_PERIOD=10 \
text_renderer
cd docs
make html
Open _build/html/index.html
If you use text_renderer in your research, please consider use the following BibTeX entry.
@misc{text_renderer,
author = {weiqing.chu},
title = {text_renderer},
howpublished = {\url{https://github.com/oh-my-ocr/text_renderer}},
year = {2021}
}