ItaNeMoASR

Scripts and models for setting up an Italian ASR system based on NVIDIA NeMo.

These scripts have been tested on:

Python 3.6.8
NumPy 1.19.5
PyTorch 1.9.0
NVIDIA NeMo 1.1.0

For reproducing our results:

Clone the repository.
Download our models (~5GB!) from our server.
Extract model files into the main repository folder.
Download the datasets listed in the reference paper from their respective websites, process all the .wav file formatting and putting them following the paths indicated in the corresponding JSON file in the TCorpora folder (or completely modify the JSONs accordingly).

Test our models or transcribe new speech...

...by using Greedy Decoding

  python3 transcribe_speech.py model_path=models/stt_itUniBO_quartznet15x5.nemo dataset_manifest=TCorpora/cv-corpus-7.0-2021-07-21_test.json

...by applying Beam-Search Decoding & N-gram Rescoring

  python3 eval_beamsearch_ngram.py --nemo_model_file models/stt_itUniBO_quartznet15x5.nemo --input_manifest TCorpora/cv-corpus-7.0-2021-07-21_test.json --kenlm_model_file models/6gramLM_CORIS165C.kenlm --decoding_mode beamsearch_ngram --beam_width 1024 --beam_alpha 1.0 --beam_beta 0.5

...by applying Beam-Search Decoding & Neural Rescoring

  python3 eval_beamsearch_ngram.py --nemo_model_file models/stt_itUniBO_quartznet15x5.nemo --input_manifest TCorpora/cv-corpus-7.0-2021-07-21_test.json --decoding_mode beamsearch --beam_width 1024 --beam_alpha 1.0 --beam_beta 0.5 --preds_output_folder BEAM_1024_1_0.5 
  python3 eval_neural_rescorer.py --lm_model=models/TransformerLM_CORIS165C_e36.nemo --beams_file=BEAM_1024_1_0.5/preds_out_width512_alpha1.0_beta0.5.tsv --beam_size=1024 --eval_manifest=TCorpora/cv-corpus-7.0-2021-07-21_test.json

Re-Training the entire model

To re-train the Italian ASR model from scratch, you have to download the standard stt_en_quartznet15x5.nemo v1.0.0rc1 published the 30th June 2021 from the NVIDIA repository, adjust the parameters into the train_QuartzNet.py script (if you would like to test different parameters from those in the paper) and then execute

python3 train_QuartzNet.py

In case of problems contact me at fabio.tamburini@unibo.it.

Potential problems

Italian texts contain some accented letters that are stored as Unicode characters in dataset JSON files; this could cause some problems when reading these files and could require slight modifications of official NeMo codes adding explicitly the "encoding" attribute in python when opening files.

Acknowledgements

All the scripts are based on those released by NVIDIA or on some tutorial from NVIDIA scholars.

Citation

If you use my work, please cite:

@InProceedings{Tamburini2021,
  author = {Tamburini, Fabio},
  title = {{Playing with NeMo for building an Automatic Speech Recogniser for Italian}},
  booktitle = {{Proceedings of the 7th Italian Conference on Computational Linguistics - CLIC-it 2021}},
  year = {2021},
  publisher = {CEUR-WS 3033},
  location = {Milan, Italy},
  url = {[http://](http://ceur-ws.org/Vol-3033/paper19.pdf)}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ItaNeMoASR

Re-Training the entire model

Potential problems

Acknowledgements

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
TCorpora		TCorpora
models		models
LICENSE		LICENSE
README.md		README.md
eval_beamsearch_ngram.py		eval_beamsearch_ngram.py
eval_neural_rescorer.py		eval_neural_rescorer.py
train_QuartzNet.py		train_QuartzNet.py
transcribe_speech.py		transcribe_speech.py

License

ftamburin/ItaNeMoASR

Folders and files

Latest commit

History

Repository files navigation

ItaNeMoASR

Re-Training the entire model

Potential problems

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages