speech-recognition

This project implements a training and decoding pipeline for speech recognition, trained using the LibriSpeech dataset.

Before the model can be trained, the data is processed from its initial .flac format, into the MFCC respresentation of the waveform, which is associated with a transcription of the audio file to be used as target data during training.

The model is based on the model described in the Wav2Letter paper by Facebook AI Research and is found in the models.py file. The application supports training the model on either a letter output, or phoneme output, according to the value selected in the train_batch function in wav2letter_torch.py.

The project also implements several decoders in decoder.py, in order to decode the output from our model trained using CTC Loss. A user can compare outputs between greedy, beam, and log beam decoder functions, as well as compare word error rates using the wer function.

A Youtube series detailing the steps involved throughout the data pipeline can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
intro		intro
lang_models		lang_models
models		models
saved_plots		saved_plots
README.md		README.md
data.py		data.py
data_gen_torch.py		data_gen_torch.py
decoder.py		decoder.py
environment.yml		environment.yml
intro.py		intro.py
lang_model.py		lang_model.py
models.py		models.py
phones_2_words.pck		phones_2_words.pck
project_vars.py		project_vars.py
req.txt		req.txt
test_preds.pck		test_preds.pck
test_preds_phones.pck		test_preds_phones.pck
wav2letter_torch.py		wav2letter_torch.py
wer.py		wer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speech-recognition

About

Releases

Packages

Languages

sgawalsh/speech-recognition

Folders and files

Latest commit

History

Repository files navigation

speech-recognition

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages