Step-by-step Training Guide Using Spanish Numbers Dataset

Spanish Numbers Dataset is a small dataset of 485 images containing handwritten sentences of Spanish numbers (298 for training and 187 for testing).

Requirements

Laia
ImageMagick's convert
Optionally: Kaldi's compute-wer

Training

To train a new Laia model for the Spanish Numbers dataset just follow these steps. Given that this dataset does not provide validation partition, we will use the test partition as validation.

Download the Spanish Numbers dataset:

mkdir -p data/;
wget -P data/ https://www.prhlt.upv.es/corpora/spanish-numbers/Spanish_Number_DB.tgz;
tar -xvzf data/Spanish_Number_DB.tgz -C data/;

Execute steps/prepare.sh. This script assumes that Spanish Numbers dataset is inside data folder. This script does the following:
- Transforms the images from pbm to png.
- Scales them to 64px height.
- Creates the auxiliary files necessary for training.
Execute the laia-create-model script to create an "empty" laia model using:

../../laia-create-model \
  --cnn_batch_norm true \
  --cnn_type leakyrelu \
  -- 1 64 20 model.t7;

Use the laia-train-ctc script to train the model:

../../laia-train-ctc \
  --adversarial_weight 0.5 \
  --batch_size 16 \
  --log_also_to_stderr info \
  --log_level info \
  --log_file laia.log \
  --progress_table_output laia.dat \
  --use_distortions true \
  --early_stop_epochs 100 \
  --learning_rate 0.0005 \
  model.t7 data/lang/char/symbs.txt \
  data/train.lst data/lang/char/train.txt \
  data/test.lst data/lang/char/test.txt;

After 366 epochs the model achieves a CER=~2.08% in test, with a 95% confidence interval in [1.295%, 2.610%].

Decoding

You can use laia-decode to obtain the transcripts of any set of images.

../../laia-decode --symbols_table data/lang/char/symbs.txt \
  model.t7 data/test.lst > test_hyp.char.txt;

Once you have created the test_hyp.char.txt you can compute the character error rate (CER) using Kaldi's compute-wer, for instance:

compute-wer --mode=strict ark:data/lang/char/test.txt ark:test_hyp.char.txt |
grep WER | sed -r 's|%WER|%CER|g';

In order to compute the WER, you will need first to convert the character-level transcripts into word-level transcripts (you can use a simple AWK script, for instance). Finally, you can compute the WER using Kaldi's compute-wer as well.

# Get word-level hypothesis transcript
awk '{
  printf("%s ", $1);
  for (i=2;i<=NF;++i) {
    if ($i == "{space}")
      printf(" ");
    else
      printf("%s", $i);
  }
  printf("\n");
}' test_hyp.char.txt > test_hyp.word.txt;
# ... and compute WER
compute-wer --mode=strict ark:data/lang/word/test.txt ark:test_hyp.word.txt |
grep WER;

TL;DR

Execute run.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Step-by-step Training Guide Using Spanish Numbers Dataset

Requirements

Training

Decoding

TL;DR

Files

README.md

Latest commit

History

README.md

File metadata and controls

Step-by-step Training Guide Using Spanish Numbers Dataset

Requirements

Training

Decoding

TL;DR