Skip to content

My personal implementation of SVTR model for handwritten OCR

Notifications You must be signed in to change notification settings

trinhtuanvubk/handwritten-ocr

Repository files navigation

Vietnamese Handwritten OCR (Top5 Kalapa Challenge 2023)

Problem Statements

  • Problem: Building a lightweight model suitable for mobile devices to perform Vietnamese Handwritten OCR in the context of Vietnamese addresses

  • Input: Raw image with one line text hihi
    hihi
    hihi
    hihi
    hihi

  • Output: the text in the input image

  • Metric: the custom of edit distance between output with lable

  • Requirements:

    • Model size <= 50mb
    • Inference time <= 2s
    • No pretrained model for OCR task or handwritten dataset
  • Some issues with data:

    • White space at the end of the image.
    • Short text lacking linguistic context.
    • Excessive use of colors.
    • Two lines of text.
    • Text not fully visible.
    • Empty images.
  • Ideas:

    • Choose a very lightweight OCR model: SVTR
    • Train a pretrained model with generated data
    • Finetune on the real dataset

Prepare data:

|___data
|    |___train
|    |    |___images
|    |    |    |___0.jpg
|    |    |    |___...
|    |    |___labels
|    |    |    |___0.txt
|    |    |    |___...
|    |___val
|    |    |___images
|    |    |    |___0.jpg
|    |    |    |___...
|    |    |___labels
|    |    |    |___0.txt
|    |    |    |___...
Pretrained
  • Collect address text:

    • Extract data from an Excel file provided by the government.
    • Get text label from other OCR datasets
    • Crawl information on villages from Google.
  • To generate data, use some handwritten fonts and the text corpus to generate with my repo OCR-Handwritten-Text-Generator

hihi
hihi
hihi
hihi
hihi

  • Then, apply some augmentation in above repository

  • Total: 250k - 350k images

Finetuned
  • Manually check to crop 2 line image and correct the label

  • To crop image to remove the white part at the end, help handle the empty image

python3 main.py --scenario preprocess \
--raw_data_path "./path/to/raw/data/"

hihi
hihi
hihi
hihi

  • Then, create lmdb data from raw data:
python3 main.py --scenario create_lmdb_data \
--raw_data_path "./data/OCR/training_data" \
--raw_data_type "folder" \
--data_mode "train" \
--lmdb_data_path "./data/kalapa_lmdb/"
  • Flag:
    • raw_data_path: path to raw data
    • raw_data_type: have 3 values:
      • json: a dir contains image and a json file with each line contains path to image and text label.
      • folder: a dir contains image subdirs and a dir contains subfile .txt label.
      • other: the second gen type from my repo.
    • data_mode: train data or eval data
    • lmdb_data_path: path to output lmdb data

Trainning

  • To run training:
python3 main.py --scenario train \
--model SVTR \
--lmdb_data_path "./data/kalapa_lmdb/"
--batch_size 16
--num_epoch 1000
  • To run inference test:
python3 main.py --scenario infer --image_test_path "path/to/image.jpg"

Postprocess

  • To handle some cases that not show fully sigh or very ugly text, or model is wrong -> decode using beamsearch with ngram model
  • To build ngram model from the text file generated from the preprocess part, go https://github.com/kmario23/KenLM-training

Export Onxx

  • To export model to onnx (optional):
python3 export_onnx.py

Submission

  • To run infer with a folder:
    • run in batch:
      python3 submission.py
    • run each image:
      python3 torch_submission.py
    • run each image with onnx:
      python3 onnx_submission.py

Releases

No releases published

Packages

No packages published

Languages