4. Model Setup

The Model Setup section is divided into the following subsections:

1.1 Installation
1.2 Writing TfRecords
1.3 Training AOCR model
1.4 Validating AOCR model
1.5 Testing AOCR model
1.6 Fine-tuning

2.1 Installation
2.2 Writing TfRecords
2.3 Training CRNN model
2.4 Validating CRNN model
2.5 Testing CRNN model
2.6 Fine-tuning

Sanskrit AOCR model

The model used for Attention LSTM is referenced from attention ocr. Note that this link refers to the older version of the repo that we have used, not the latest one.

The model architecture:

Attention-LSTM OCR model

CNN was used at the encoder side to encode the features of a line image. The CNN encoded features were passed into a single BLSTM layer. The final hidden state features of BLSTM are passed to the 2-layer LSTM with the attention module at the decoder side.

1.1 Installation

To install the attention-lstm module, run following in repo's root(sanksrit-ocr/):

Make sure you have Anaconda distribution installed in your system. The code has been tested on CUDA v10.2 and CUDNN v7.1.2

cd model/attention-lstm

conda create --name ocr --file requirements.txt

conda activate ocr

python setup.py install

1.2 Writing TfRecords

In the parent directory, run the following:

mkdir datasets

aocr dataset ./label_data/annot_mixed.txt ./datasets/training.tfrecords

aocr dataset ./label_data/annot_realTest.txt ./datasets/testing.tfrecords

aocr dataset ./label_data/annot_realValidation.txt ./datasets/validation.tfrecords

1.3 Training

From the repository's root directory, run

bash ./model/evaluate/copyfiles.sh

to ensure that all models (per 1000 step) are copied to ./modelss directory.

Open a new terminal, run the following command from the root directory(sanskrit-ocr/).

CUDA_VISIBLE_DEVICES=0 aocr train ./datasets/training.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --num-epoch <num-epoch>

In our case, run the above with following arguments:

CUDA_VISIBLE_DEVICES=0 aocr train ./datasets/training.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --gpu-id=0

Have a look at the original repo to know more about the other arguments.

1.4 Validation

To get the validation data prediction for each model, run the evaluate/attention_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.

python ./model/evaluate/attention_predictions.py <initial_step> <final_step> <steps_per_checkpoint>

python ./model/evaluate/attention_predictions.py 1000 30100 1000

steps_per_checkpoint denotes the interval between the iterations the model is saved.

The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/attention-lstm/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :

python ./model/evaluate/get_errorrates.py

This will also give the best model with the highest WER and CER on validation data. The best model, wer and cer would be saved in model/attention-lstm/logs/error_rates.txt.

1.5 Testing

For testing, we take the best model with lowest WER and tested that model on testing data. Ensure to change the first line in checkpoint file in ./modelss directory with the best model number. please delete or copy the data of val_preds.txt to some other file before proceeding with the below command. It will save the text predictions in val_preds.txt file.

CUDA_VISIBLE_DEVICES=0 aocr test ./datasets/testing.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --model-dir ./modelss --gpu-id 1

Run get_errorrates.py again to get the WER and CER on the test data

python ./model/evaluate/get_errorrates.py

test_gt.txt and test_pred.txt in the model/attention-lstm/logs/ folder which store the prediction and gt texts.

1.6 Fine-tuning

For fine-tuning, take the best model that we got from Section 1.4 and trained it further but now only real training data.

Create only real data tfrecords:

aocr dataset ./label_data/annot_realTrain.txt ./datasets/training_real.tfrecords

Training (or fine-tuning):

aocr train ./datasets/training_real.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --model-dir ./modelss

Please ensure in the checkpoint file in ./modelss directory, you change the first line for the best model.

Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.

Now, repeat the steps from the Section 1.4 .

CRNN Model

The model used for CRNN is referenced from crnn. Note that this link refers to the older version of the repo that we have used, not the latest one. We have used CRNN as our baseline model.

2.1 Installation

Make sure you are in the repo's root. Then run the following:

cd model/CRNN

conda create --name ocr_blstm --file requirements.txt

conda activate ocr_blstm

2.2 Writing TfRecords

Run from the repo's root:

python ./model/CRNN/create_tfrecords.py ./label_data/annot_mixed.txt ./model/CRNN/data/tfReal/training.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTest.txt ./model/CRNN/data/tfReal/testing.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realValidation.txt ./model/CRNN/data/tfReal/validating.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_synthetic_only.txt ./model/CRNN/data/tfReal/synthetic.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords

2.3 Training

python ./model/CRNN/train.py <training tfrecords filename> <train_epochs> <path_to_previous_saved_model> <steps-per_checkpoint>

For example,

python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 200

path_to_previous_saved_model could be set to 0 if training is to be started from scratch. steps_per_checkpoint indicates after how many iterations model will be saved. It could be set to 0 if you want to save output after every iteration.

2.4 Validation

To get the validation data prediction for each model, run the evaluate/crnn_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.

python ./model/evaluate/crnn_predictions.py <initial_step> <final_step> <steps_per_checkpoint>

python ./model/evaluate/crnn_predictions.py 1000 30100 1000

steps_per_checkpoint denotes the interval between the iterations the model is saved.

The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/CRNN/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :

python ./model/evaluate/get_errorrates_crnn.py validate

This will also give the best model with the highest WER and CER on validation data. The best model and its wer and cer would be saved in model/CRNN/logs/error_rates.txt.

2.5 Testing

For testing, we take the best model with the lowest WER and tested that model on testing data. It will save the text predictions in _ model/CRNN/logs/test_preds.txt_ file.

python model/CRNN/test.py <testing_tfrecord_filename> <weights_path_for_best_model>

For example.

python model/CRNN/test.py testing.tfrecords model/CRNN/model/shadownet/shadownet_-40

Run get_errorrates.py again to get the WER and CER on the test data

python ./model/evaluate/get_errorrates_crnn.py test

crnn_gt.txt and crnn_pred.txt in the model/CRNN/logs/ folder which store the prediction and gt texts.

2.6 Fine-tuning

For fine-tuning, take the best model that we got from Section 2.4 and trained it further but now only real training data.

Create only real data tfrecords:

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords

Training (or fine-tuning):

python ./model/CRNN/train.py train_feature.tfrecords 20 <path_to_best_model> <steps_per_checkpoint>

python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 50

Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when the loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.

Now, repeat the steps from the Section 2.4 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly