Skip to content

Using machine learning for speech recognition on the LibriSpeech dataset

Notifications You must be signed in to change notification settings

sgawalsh/speech-recognition

Repository files navigation

speech-recognition

This project implements a training and decoding pipeline for speech recognition, trained using the LibriSpeech dataset.

Before the model can be trained, the data is processed from its initial .flac format, into the MFCC respresentation of the waveform, which is associated with a transcription of the audio file to be used as target data during training.

The model is based on the model described in the Wav2Letter paper by Facebook AI Research and is found in the models.py file. The application supports training the model on either a letter output, or phoneme output, according to the value selected in the train_batch function in wav2letter_torch.py.

The project also implements several decoders in decoder.py, in order to decode the output from our model trained using CTC Loss. A user can compare outputs between greedy, beam, and log beam decoder functions, as well as compare word error rates using the wer function.

A Youtube series detailing the steps involved throughout the data pipeline can be found here.

About

Using machine learning for speech recognition on the LibriSpeech dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages