Skip to content

decode ctc

Solène Tarride edited this page Feb 15, 2023 · 2 revisions

Decode your model

Once your PyLaia model is trained, you can use it to predict on test images. To improve results, you can also combine it with a statistical n-gram language model

Predict without language modeling

  • Set your configuration config_decode.yaml:
common:
  experiment_dirname: experiment
decode:
  convert_spaces: true
  join_string: ''
img_list: test_img_list.txt
syms: syms.txt
  • Predict using PyLaia
pylaia-htr-decode-ctc --config my_config.yaml | tee predict.txt

Predict with a language model (better but slower)

ngram-count -text my_text_file.txt -order 6 -lm language_model.arpa.gz -wbdiscount 6

with lm_text.txt in the following format for a character-based LM:

f o r <space> d e t <space> t i l f æ l d e <space> d e t <space> s k u l d e <space> l y k k e s <space> D i g
a t <space> o p d r i v e <space> d e t <space> o m s k r e v n e <space> e x p l : <space> a f
« F r u <space> I n g e r » , <space> a t <space> s e n d e <space> m i g <space> s a m m e
t i l <space> B e r c h t e s g a d e n <space> i n <space> B a y e r n ,
d a <space> d e t <space> s å l e d e s <space> s i k k r e s t <space> o g <space> h u r t i g s t <space> k o m -
m e r <space> m i g <space> i h æ n d e . <space> T ø r <space> j e g <space> b e d e <space> D i g <space> g ø r e
M o r g e n b l a d e t s <space> e x p e d : <space> o p m æ r k s o m <space> p å
m i n <space> n y e <space> a d r e s s e ?

Note that you also should be able to use a KenLM language model, although this is not tested.

  • Set your configuration config_decode_lm.yaml
common:
  experiment_dirname: experiment
  model_filename: experiment
decode:
  convert_spaces: true
  join_string: ''
  use_language_model: True
  language_model_path: language_model.arpa.gz
  language_model_weight: 1.5
  tokens_path: tokens.txt 
  lexicon_path: lexicon.txt
img_list: test_img_list.txt
syms: syms.txt
  • Predict using PyLaia (CPU-only)
pylaia-htr-decode-ctc --config my_config.yaml | tee predict_lm.txt