Skip to content

4. Model Setup

nimishasri edited this page Jan 16, 2021 · 4 revisions

We have two models. The attention-lstm model and the CRNN model.

1. Attention LSTM model

2. CRNN model

Sanskrit AOCR model

The model used for Attention LSTM is referenced from attention ocr. Note that this link refers to the older version of the repo that we have used, not the latest one.

Model Architecture:

Attention-LSTM OCR model

The encoder uses CNN to encode the features of a line image. The CNN encoded features are passed into a single BLSTM layer. The final hidden state features of BLSTM are passed to a 2-layer LSTM with the attention module at the decoder side.

1.1 Installation

To install the attention-lstm module, run following in the repositories root(sanksrit-ocr/):

Pre-requisite: You should have anaconda installed on your local desktop.

The code was tested on CUDA v10.2 and CUDNN v7.1.2

To setup the aocr model, run the following commands:

conda create --name ocr

conda activate ocr

conda install pip

cd model/attention-lstm

pip install -r requirements.txt

python setup.py install

1.2 Writing tfrecords

To create tfrecords for each of train,valid and test sets run the following commands in the parent directory:

aocr dataset ./label_data/annot_mixed.txt /path/to/tfrecords/folder/training.tfrecords

aocr dataset ./label_data/annot_realTest.txt /path/to/tfrecords/folder/testing.tfrecords

aocr dataset ./label_data/annot_realValidation.txt /path/to/tfrecords/folder/validation.tfrecords

1.3 Training

From the repository's root directory, run

bash ./model/evaluate/copyfiles.sh

to ensure that all models are copied to ./modelss directory.

Open a new terminal, run the following command from the root directory(sanskrit-ocr/).

CUDA_VISIBLE_DEVICES=0 aocr train /path/to/tfrecords/folder/training.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --num-epoch <num-epoch>

For example:

CUDA_VISIBLE_DEVICES=0 aocr train /path/to/tfrecords/folder/training.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --gpu-id=0

Have a look at the original repo to know more about the other arguments.

1.4 Validation

To get prediction for the validation set for each of the saved models, run

python ./model/evaluate/attention_predictions.py <initial_step> <final_step> <steps_per_checkpoint>

python ./model/evaluate/attention_predictions.py 1000 30100 1000

where the initial and final steps correspond to the first and last checkpoint number and steps_per_checkpoint denotes the difference between two consecutive checkpoints.

The ground-truth labels, their corresponding predicted outputs, character accuracy and validation loss for each validation example will be stored in val_preds.txt in the model/attention-lstm/logs/ directory on running the above script. To get model-wise word error rates, character error rates and loss, run :

python ./model/evaluate/get_errorrates.py

This will also give the best model with the highest WER and CER on validation data.

1.5 Testing

For testing, we take the best model with lowest WER and test that model on testing data. Ensure to change the first line in checkpoint file in ./modelss directory with the best model number.

CUDA_VISIBLE_DEVICES=0 aocr test ./datasets/testing.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --model-dir ./modelss --gpu-id 1

Note that val_preds.txt file will be overwritten. So if you have previously stored predictions that you may need, make sure to copy the file or change its name.

Run get_errorrates.py again to get the WER and CER on the test data

python ./model/evaluate/get_errorrates.py

test_gt.txt and test_pred.txt in the model/attention-lstm/logs/ folder which store the prediction and ground-truth texts.

1.6 Fine-tuning

For fine-tuning, take the best model after validation and train it further on real data.

Create only real data tfrecords:

aocr dataset ./label_data/annot_realTrain.txt ./datasets/training_real.tfrecords

Training (or fine-tuning):

aocr train ./datasets/training_real.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --model-dir ./modelss

Ensure to update the checkpoint file with the best model checkpoint number obtained.

Now, repeat the steps from the Section 1.4.

CRNN Model

The model used for CRNN is referenced from crnn. Note that this link refers to the older version of the repo that we have used, not the latest one. We have used CRNN as our baseline model.

2.1 Installation

Make sure you are in the repo's root. Then run the following:

cd model/CRNN

conda create --name ocr_blstm

conda activate ocr_blstm

pip install -r requirements.txt

2.2 Writing TfRecords

Run from the repo's root:

python ./model/CRNN/create_tfrecords.py ./label_data/annot_mixed.txt ./model/CRNN/data/tfReal/training.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTest.txt ./model/CRNN/data/tfReal/testing.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realValidation.txt ./model/CRNN/data/tfReal/validating.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_synthetic_only.txt ./model/CRNN/data/tfReal/synthetic.tfrecords

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords

2.3 Training

python ./model/CRNN/train.py <training tfrecords filename> <train_epochs> <path_to_previous_saved_model> <steps-per_checkpoint>

For example,

python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 200

path_to_previous_saved_model could be set to 0 if training is to be started from scratch. steps_per_checkpoint indicates after how many iterations model will be saved. It could be set to 0 if you want to save output after every iteration.

2.4 Validation

To get the validation data prediction for each model, run the evaluate/crnn_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.

python ./model/evaluate/crnn_predictions.py <initial_step> <final_step> <steps_per_checkpoint>

python ./model/evaluate/crnn_predictions.py 1000 30100 1000

steps_per_checkpoint denotes the interval between the iterations the model is saved.

The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/CRNN/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :

python ./model/evaluate/get_errorrates_crnn.py validate

This will also give the best model with the highest WER and CER on validation data. The best model and its wer and cer would be saved in model/CRNN/logs/error_rates.txt.

2.5 Testing

For testing, we take the best model with the lowest WER and tested that model on testing data. It will save the text predictions in _ model/CRNN/logs/test_preds.txt_ file.

python model/CRNN/test.py <testing_tfrecord_filename> <weights_path_for_best_model>

For example.

python model/CRNN/test.py testing.tfrecords model/CRNN/model/shadownet/shadownet_-40

Run get_errorrates.py again to get the WER and CER on the test data

python ./model/evaluate/get_errorrates_crnn.py test

crnn_gt.txt and crnn_pred.txt in the model/CRNN/logs/ folder which store the prediction and gt texts.

2.6 Fine-tuning

For fine-tuning, take the best model that we got from Section 2.4 and trained it further but now only real training data.

Create only real data tfrecords:

python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords

Training (or fine-tuning):

python ./model/CRNN/train.py train_feature.tfrecords 20 <path_to_best_model> <steps_per_checkpoint>

python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 50

Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when the loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.

Now, repeat the steps from the Section 2.4 .