Identify 6-classes of emotion based on audio (speech) signal. Using Mel spectrogram to capture speech features and a CNN architecture for feature extraction
The best model is Model2.ipynb - Please open the jupyter notebook to view the architecture and results. The Confusion Matrix, Classification Report and Validation results are present in all models.
The Model 1 are transfer learning models - just an attempt to examine how spectrogram performs with existing image models.
Full report is available on request. All folders with relevant files to run (without filtering) are available at https://drive.google.com/drive/folders/1IildW2vjEOvcHgVBWTHVwkGyXLlCpY6v?usp=sharing
Repo owners (see contributors):
Link to Kaggle dataset : https://www.kaggle.com/ejlok1/cremad
Link to Github for Demographics file: https://github.com/CheyneyComputerScience/CREMA-D