-
Notifications
You must be signed in to change notification settings - Fork 43
Roadmap
Variable input height models use special operators that do not support Half precision tensors. There is an open PR to add it: https://github.com/jpuigcerver/nnutils/pull/5
The segmentation's (greedy) probability is not included in the segmentation output. https://github.com/carmocca/PyLaia/commit/04bc75ce84d84a24666702cd0dfc28b808602b58 contains part of what is required.
Another improvement is to try to estimate the word probability by using the {sum,avg,mult,softmax}. This requires some research. The WER can be used to evaluate it.
The implementation should be straightforward. Could be available in a new script (pylaia-htr-tune
)
Using this is much more complicated. The scaler algorithm assumes that all batches have the same size. This is not the case for PyLaia where even though a fixed batch size is used (B), each image might have a drastically different size (HxW).
Since images are collated (to make use of efficient batching), a batch of B items occupies the memory of the largest item times B. For this reason, the scaler algorithm would have to be run with batches that occupy as much memory as the batch that contains the largest image in the dataset. Related to the item below.
During training, the dataset images are shuffled and then sampled into batches of size B. Since images are collated, a batch of B items occupies the memory of the largest item times B.
If we sort the dataset by image size, we could batch (bucket) input samples efficiently by size. Two possibilities:
- Split the dataset into batches of size B where each batch contains images of very similar size. This minimizes the padding data. The downside is that if image sizes are very different, the GPU memory will be underutilized during a large part of training.
- Split the dataset into batches of any size that maximize the available GPU memory. This approach would greatly improve the training speed, however, further research would be required to check how training would be impacted. Some batches could contain many many small images which is problematic to learning. Also, a different optimization algorithm might be needed for this, as well as, some adaptive learning rate schedule based on the current batch size. An upper bound on the batch size might help.
Might also be problematic when using distributed (DDP) or automatic mixed precision (AMP).
Some work has been done on this, available at https://github.com/carmocca/PyLaia/tree/samplers
Also, references to similar ideas/implementations:
- https://github.com/pytorch/pytorch/issues/46176
- https://github.com/facebookresearch/pytext/blob/master/pytext/data/data.py#L86
- https://github.com/pytorch/pytorch/issues/25743
- https://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.samplers.html#torchnlp.samplers.BucketBatchSampler
pylaia-htr-netout
could use a library like PyKaldi to (for example) directly generate .scp
files.
A pylaia-htr-tune
script could be created to support fine-tuning the last linear layer of the model. This is very interesting for transfer learning tasks or to allow changes in the vocabulary set used.