-
Notifications
You must be signed in to change notification settings - Fork 23
4. Model Setup
The Model Setup section is divided into the following subsections:
The model used for Attention LSTM is referenced from attention ocr. Note that this link refers to the older version of the repo that we have used, not the latest one.
The model architecture:
CNN was used at the encoder side to encode the features of a line image. The CNN encoded features were passed into a single BLSTM layer. The final hidden state features of BLSTM are passed to the 2-layer LSTM with the attention module at the decoder side.
To install the attention-lstm module, run following in repo's root(sanksrit-ocr/):
Make sure you have Anaconda distribution installed in your system. The code has been tested on CUDA v10.2 and CUDNN v7.1.2
cd model/attention-lstm
conda create --name ocr --file requirements.txt
conda activate ocr
python setup.py install
In the parent directory, run the following:
mkdir datasets
aocr dataset ./label_data/annot_mixed.txt ./datasets/training.tfrecords
aocr dataset ./label_data/annot_realTest.txt ./datasets/testing.tfrecords
aocr dataset ./label_data/annot_realValidation.txt ./datasets/validation.tfrecords
From the repository's root directory, run
bash ./model/evaluate/copyfiles.sh
to ensure that all models (per 1000 step) are copied to ./modelss directory.
Open a new terminal, run the following command from the root directory(sanskrit-ocr/).
CUDA_VISIBLE_DEVICES=0 aocr train ./datasets/training.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --num-epoch <num-epoch>
In our case, run the above with following arguments:
CUDA_VISIBLE_DEVICES=0 aocr train ./datasets/training.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --gpu-id=0
Have a look at the original repo to know more about the other arguments.
To get the validation data prediction for each model, run the evaluate/attention_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.
python ./model/evaluate/attention_predictions.py <initial_step> <final_step> <steps_per_checkpoint>
python ./model/evaluate/attention_predictions.py 1000 30100 1000
steps_per_checkpoint denotes the interval between the iterations the model is saved.
The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/attention-lstm/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :
python ./model/evaluate/get_errorrates.py
This will also give the best model with the highest WER and CER on validation data. The best model, wer and cer would be saved in model/attention-lstm/logs/error_rates.txt.
For testing, we take the best model with lowest WER and tested that model on testing data. Ensure to change the first line in checkpoint file in ./modelss directory with the best model number. please delete or copy the data of val_preds.txt to some other file before proceeding with the below command. It will save the text predictions in val_preds.txt file.
CUDA_VISIBLE_DEVICES=0 aocr test ./datasets/testing.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --model-dir ./modelss --gpu-id 1
Run get_errorrates.py again to get the WER and CER on the test data
python ./model/evaluate/get_errorrates.py
test_gt.txt and test_pred.txt in the model/attention-lstm/logs/ folder which store the prediction and gt texts.
For fine-tuning, take the best model that we got from Section 1.4 and trained it further but now only real training data.
Create only real data tfrecords:
aocr dataset ./label_data/annot_realTrain.txt ./datasets/training_real.tfrecords
Training (or fine-tuning):
aocr train ./datasets/training_real.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --model-dir ./modelss
Please ensure in the checkpoint file in ./modelss directory, you change the first line for the best model.
Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.
Now, repeat the steps from the Section 1.4 .
The model used for CRNN is referenced from crnn. Note that this link refers to the older version of the repo that we have used, not the latest one. We have used CRNN as our baseline model.
Make sure you are in the repo's root. Then run the following:
cd model/CRNN
conda create --name ocr_blstm --file requirements.txt
conda activate ocr_blstm
Run from the repo's root:
python ./model/CRNN/create_tfrecords.py ./label_data/annot_mixed.txt ./model/CRNN/data/tfReal/training.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTest.txt ./model/CRNN/data/tfReal/testing.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realValidation.txt ./model/CRNN/data/tfReal/validating.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_synthetic_only.txt ./model/CRNN/data/tfReal/synthetic.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords
python ./model/CRNN/train.py <training tfrecords filename> <train_epochs> <path_to_previous_saved_model> <steps-per_checkpoint>
For example,
python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 200
path_to_previous_saved_model could be set to 0 if training is to be started from scratch. steps_per_checkpoint indicates after how many iterations model will be saved. It could be set to 0 if you want to save output after every iteration.
To get the validation data prediction for each model, run the evaluate/crnn_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.
python ./model/evaluate/crnn_predictions.py <initial_step> <final_step> <steps_per_checkpoint>
python ./model/evaluate/crnn_predictions.py 1000 30100 1000
steps_per_checkpoint denotes the interval between the iterations the model is saved.
The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/CRNN/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :
python ./model/evaluate/get_errorrates_crnn.py validate
This will also give the best model with the highest WER and CER on validation data. The best model and its wer and cer would be saved in model/CRNN/logs/error_rates.txt.
For testing, we take the best model with the lowest WER and tested that model on testing data. It will save the text predictions in _ model/CRNN/logs/test_preds.txt_ file.
python model/CRNN/test.py <testing_tfrecord_filename> <weights_path_for_best_model>
For example.
python model/CRNN/test.py testing.tfrecords model/CRNN/model/shadownet/shadownet_-40
Run get_errorrates.py again to get the WER and CER on the test data
python ./model/evaluate/get_errorrates_crnn.py test
crnn_gt.txt and crnn_pred.txt in the model/CRNN/logs/ folder which store the prediction and gt texts.
For fine-tuning, take the best model that we got from Section 2.4 and trained it further but now only real training data.
Create only real data tfrecords:
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords
Training (or fine-tuning):
python ./model/CRNN/train.py train_feature.tfrecords 20 <path_to_best_model> <steps_per_checkpoint>
python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 50
Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when the loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.
Now, repeat the steps from the Section 2.4 .