Kaggle Speach Reconition

Build an algorithm that understands simple speech commands

Link to the Kaggle challenge

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

Project Overview

We might be on the verge of too many screens. It seems like everyday, new versions of common objects are “re-invented” with built-in wifi and bright touchscreens. A promising antidote to our screen addiction are voice interfaces.

But, for independent makers and entrepreneurs, it’s hard to build a simple speech detector using free, open data and code. Many voice recognition datasets require preprocessing before a neural network model can be built on them. To help with this, TensorFlow recently released the Speech Commands Datasets. It includes 65,000 one-second long utterances of 30 short words, by thousands of different people.

In this competition, you're challenged to use the Speech Commands Dataset to build an algorithm that understands simple spoken commands. By improving the recognition accuracy of open-sourced voice interface tools, we can improve product effectiveness and their accessibility.

Code structure

Important notes:

The main skeleton of the code, as well as some specific functions, were taken from the great TensorFlow tutorial (www.tensorflow.org/tutorials/sequences/audio_recognition). It would not have been possible to get this project started without the great help of such a great tutorial prepared by the incredible team working at TensorFlow
There is no commits history in this repository due to the fact that all the project was developed using a private repository and was finally released once the Kaggle contest was over.

main.py: Main execution that performs the training task
scan_hyperparameters.sh: Bash program that schedules training executions with different combinations of hyperparameters given
00_input_params.py: Parameters that user should input before executing the main script s_train
01_input_data.py: Reads the data in Audio format
02_get_data.py: Class that transforms it to the format that will be fed to the NN
03_models.py: Convolutional Neural Network models that are used in this project. The models used in this project present few variations on their architecture. An LSTM model was also considered but its performance was not as good as the performance coming from CNN models, so the latter approach was finally further developed.
04_make_predictions.py: Script that makes the processing and prediction over the test dataset, and postprocess the prediction so the resulting file can be downloaded. The aim of this function was to automatise the preparation of the submission to the Kaggle website, implying the minimal effort by the user. The predicting function needs to be further developed so the user can choose which model needs to be picked for prediction.
05_metrics_data.py: Task that calculates and shows the performance metrics (Accuracy) of the predictions made.
06_load_data.py: Ensemble of functions that read and load the tranining and test data in audio format

Datasets

data/train: dataset to develop our algorithm. We divide these data into:
- train
- dev
- validation
data/test: dataset to submit and rank our results to Kaggle

RAW data import

To load RAW data into a pandas dataframe, split in train, dev and test, just run the following code:

import load_data as ld

data_path = 'data'
prepared_data_df = ld.prepare_data(data_path, random_state=42)

It will return a dataframe as follows:

label	path	uid	wav	set
yes	data/train/audio/yes/004ae714_nohash_0.wav	004ae714	[-91, -176, -111, -95, -120, -151, -133, -133,..	train
...	...	...	...	...

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
00_input_params.py		00_input_params.py
01_input_data.py		01_input_data.py
02_get_data.py		02_get_data.py
03_models.py		03_models.py
04_prediction_submission.py		04_prediction_submission.py
05_metrics_data.py		05_metrics_data.py
06_load_data.py		06_load_data.py
README.md		README.md
__init__.py		__init__.py
main.py		main.py
scan_hyperparams.sh		scan_hyperparams.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Speach Reconition

Link to the Kaggle challenge

Project Overview

Code structure

Datasets

RAW data import

About

Releases

Packages

Languages

SergiGomez/kaggle-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Kaggle Speach Reconition

Link to the Kaggle challenge

Project Overview

Code structure

Datasets

RAW data import

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages