Speech Emotion Recognition

This is a Speech Emotion Recognition based on RAVDESS dataset, project repository for summer 2021, Brain and Cognitive Science Society.

Clone repo using:

git clone https://github.com/Aaka3021/Speech-Emotion-Recognition--1.git

Abstract:

Speech Emotion Recognition, abbreviated as SER, is the act of attempting to recognize human emotion and the associated affective states from speech. This is capitalizing on the fact that voice often reflects underlying emotion through tone and pitch. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. But human-computer interaction can be improved by implementing automated emotion recognition, thereby reducing the need of human intervention.

In this project, basic emotions like calm, happy, fearful, disgust etc. are analyzed from emotional speech signals. We use machine learning techniques like Multilayer perceptron Classifier (MLP Classifier) which is used to categorize the given data into respective groups which are non linearly separated. We will also use CNN (Convolutional Neural Networks) and RNN-LSTM model. Mel-frequency cepstrum coefficients (MFCC), chroma and mel features are extracted from the speech signals and used to train the MLP classifier. For achieving this objective, we use python libraries like Librosa, sklearn, pyaudio, numpy and soundfile to analyze the speech modulations and recognize the emotion.

Using RAVDESS dataset which contains around 1500 audio file inputs from 24 different actors (12 male and 12 female) who recorded short audios in 8 different emotions, we will train a NLP- based model which will be able to detect among the 8 basic emotions as well as the gender of the speaker i.e. Male voice or Female voice.
After training we can deploy this model for predicting with live voices.

Deliverables:

Learn the basics of Python, ML/DL, NLP, librosa, sklearn, etc , Literature Review , analyzing the dataset and Feature extraction. Building and training the model on the training data, followed by testing on test data. And finally, testing the model on live audio input (unseen) and collecting the results:)

Schedule:

Week1:

covering ml\dl basics

Week 2:

plotting waveform and spectrogram

learning audio preprocessing for feature extraction

Week 3:

Implementing the code for feature extraction using Librosa library

Week 4:

Implement the MLP model for emotion recognition
Evaluating it on test set

Week 5:

Implementing LSTM model
Starting to implement CNN model

Week 6:

Complete the CNN model implementation.
Model will be evaluated on our voice

Results:

CNN model gave an accuracy of 73%
LSTM model gave an accuracy of 71%
MLP model gave an accuracy of 62%

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Audio Signals Plot		Audio Signals Plot
Feature Extraction		Feature Extraction
Model Files		Model Files
Model Notebooks		Model Notebooks
Research Papers		Research Papers
RAVDESS_Dataset.txt		RAVDESS_Dataset.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition

Abstract:

Deliverables:

Schedule:

Results:

References:

About

Releases

Packages

Contributors 5

Languages

pushpanshu0501/Speech-Emotion-Recognition--1

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition

Abstract:

Deliverables:

Schedule:

Results:

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages