-
Notifications
You must be signed in to change notification settings - Fork 23
4. Model Setup
We have two models. The attention-lstm model and the CRNN model.
The model used for Attention LSTM is referenced from attention ocr. Note that this link refers to the older version of the repo that we have used, not the latest one.
Model Architecture:
The encoder uses CNN to encode the features of a line image. The CNN encoded features are passed into a single BLSTM layer. The final hidden state features of BLSTM are passed to a 2-layer LSTM with the attention module at the decoder side.
To install the attention-lstm module, run following in the repositories root(sanksrit-ocr/):
Pre-requisite: You should have anaconda installed on your local desktop.
The code was tested on CUDA v10.2 and CUDNN v7.1.2
To setup the aocr model, run the following commands:
conda create --name ocr
conda activate ocr
conda install pip
cd model/attention-lstm
pip install -r requirements.txt
python setup.py install
To create tfrecords for each of train,valid and test sets run the following commands in the parent directory:
aocr dataset ./label_data/annot_mixed.txt /path/to/tfrecords/folder/training.tfrecords
aocr dataset ./label_data/annot_realTest.txt /path/to/tfrecords/folder/testing.tfrecords
aocr dataset ./label_data/annot_realValidation.txt /path/to/tfrecords/folder/validation.tfrecords
From the repository's root directory, run
bash ./model/evaluate/copyfiles.sh
to ensure that all models are copied to ./modelss directory.
Open a new terminal, run the following command from the root directory(sanskrit-ocr/).
CUDA_VISIBLE_DEVICES=0 aocr train /path/to/tfrecords/folder/training.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --num-epoch <num-epoch>
For example:
CUDA_VISIBLE_DEVICES=0 aocr train /path/to/tfrecords/folder/training.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --gpu-id=0
Have a look at the original repo to know more about the other arguments.
To get prediction for the validation set for each of the saved models, run
python ./model/evaluate/attention_predictions.py <initial_step> <final_step> <steps_per_checkpoint>
python ./model/evaluate/attention_predictions.py 1000 30100 1000
where the initial and final steps correspond to the first and last checkpoint number and steps_per_checkpoint
denotes the difference between two consecutive checkpoints.
The ground-truth labels, their corresponding predicted outputs, character accuracy and validation loss for each validation example will be stored in val_preds.txt in the model/attention-lstm/logs/ directory on running the above script. To get model-wise word error rates, character error rates and loss, run :
python ./model/evaluate/get_errorrates.py
This will also give the best model with the highest WER and CER on validation data.
For testing, we take the best model with lowest WER and test that model on testing data. Ensure to change the first line in checkpoint file in ./modelss directory with the best model number.
CUDA_VISIBLE_DEVICES=0 aocr test ./datasets/testing.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --model-dir ./modelss --gpu-id 1
Note that val_preds.txt file will be overwritten. So if you have previously stored predictions that you may need, make sure to copy the file or change its name.
Run get_errorrates.py again to get the WER and CER on the test data
python ./model/evaluate/get_errorrates.py
test_gt.txt and test_pred.txt in the model/attention-lstm/logs/ folder which store the prediction and ground-truth texts.
For fine-tuning, take the best model after validation and train it further on real data.
Create only real data tfrecords:
aocr dataset ./label_data/annot_realTrain.txt ./datasets/training_real.tfrecords
Training (or fine-tuning):
aocr train ./datasets/training_real.tfrecords --batch-size 16 --max-width 3200 --max-height 600 --max-prediction 600 --steps-per-checkpoint 1000 --model-dir ./modelss
Ensure to update the checkpoint file with the best model checkpoint number obtained.
Now, repeat the steps from the Section 1.4.
The model used for CRNN is referenced from crnn. Note that this link refers to the older version of the repo that we have used, not the latest one. We have used CRNN as our baseline model.
Make sure you are in the repo's root. Then run the following:
cd model/CRNN
conda create --name ocr_blstm
conda activate ocr_blstm
pip install -r requirements.txt
Run from the repo's root:
python ./model/CRNN/create_tfrecords.py ./label_data/annot_mixed.txt ./model/CRNN/data/tfReal/training.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTest.txt ./model/CRNN/data/tfReal/testing.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realValidation.txt ./model/CRNN/data/tfReal/validating.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_synthetic_only.txt ./model/CRNN/data/tfReal/synthetic.tfrecords
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords
python ./model/CRNN/train.py <training tfrecords filename> <train_epochs> <path_to_previous_saved_model> <steps-per_checkpoint>
For example,
python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 200
path_to_previous_saved_model could be set to 0 if training is to be started from scratch. steps_per_checkpoint indicates after how many iterations model will be saved. It could be set to 0 if you want to save output after every iteration.
To get the validation data prediction for each model, run the evaluate/crnn_predictions.py file, you need to set the initial checkpoint step and the final checkpoint step. The initial checkpoint step is the first model iteration that is saved. The final checkpoint step is the final iteration when the model is saved.
python ./model/evaluate/crnn_predictions.py <initial_step> <final_step> <steps_per_checkpoint>
python ./model/evaluate/crnn_predictions.py 1000 30100 1000
steps_per_checkpoint denotes the interval between the iterations the model is saved.
The ground-truth labels and their corresponding predicted outputs, character accuracies and validation loss for each line will be stored in model/CRNN/logs/val_preds.txt after running above command. To get model-wise word error rates, character error rates and loss, run, in the evaluate folder :
python ./model/evaluate/get_errorrates_crnn.py validate
This will also give the best model with the highest WER and CER on validation data. The best model and its wer and cer would be saved in model/CRNN/logs/error_rates.txt.
For testing, we take the best model with the lowest WER and tested that model on testing data. It will save the text predictions in _ model/CRNN/logs/test_preds.txt_ file.
python model/CRNN/test.py <testing_tfrecord_filename> <weights_path_for_best_model>
For example.
python model/CRNN/test.py testing.tfrecords model/CRNN/model/shadownet/shadownet_-40
Run get_errorrates.py again to get the WER and CER on the test data
python ./model/evaluate/get_errorrates_crnn.py test
crnn_gt.txt and crnn_pred.txt in the model/CRNN/logs/ folder which store the prediction and gt texts.
For fine-tuning, take the best model that we got from Section 2.4 and trained it further but now only real training data.
Create only real data tfrecords:
python ./model/CRNN/create_tfrecords.py ./label_data/annot_realTrain.txt ./model/CRNN/data/tfReal/training_real.tfrecords
Training (or fine-tuning):
python ./model/CRNN/train.py train_feature.tfrecords 20 <path_to_best_model> <steps_per_checkpoint>
python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 50
Keep on checking parallelly validation loss so that you can stop the iterations when the model is not learning anymore. (when the loss is no more reducing). Loss can be obtained by running the attention_predictions.py file as mentioned above.
Now, repeat the steps from the Section 2.4 .