Skip to content

This project explores emotion recognition in audio data, focusing on feature extraction techniques while also comparing the performance of LSTM and 1D CNN models.

License

Notifications You must be signed in to change notification settings

NajdBinrabah/Deep-Learning-with-TensorFlow-and-Keras

Repository files navigation

Deep Learning with TensorFlow & Keras: Emotion Detection in Audio

This personal project uses Deep Learning, to classify emotions in audio data, employing TensorFlow, Keras, and Librosa.


This project explores the use of Deep Learning for classifying emotions in audio data, leveraging the capabilities of TensorFlow and Keras alongside Librosa for audio analysis. The aim is to address an audio classification problem, exploring how different Deep Learning models such as LSTMs and 1D CNNs handle the unique challenges of audio data.

Libraries Used

  • TensorFlow & Keras: For building, training, and evaluating Deep Learning models.
  • Librosa: For audio analysis, particularly feature extraction, which converts audio files into numerical representations that the model can interpret.

You can view the detailed Notebook for this project here.

Models and Techniques Explored

1. Feature Extraction with Librosa

  • Mel-frequency cepstral coefficients (MFCCs)
  • Chroma features
  • Mel spectrogram

2. Long Short-Term Memory (LSTM) and 1D Convolutional Neural Network (1D CNN) Models

Two Deep Learning models were evaluated:

  • LSTM (Long Short-Term Memory): LSTM networks were developed to address the vanishing/exploding gradient problem in traditional RNNs. Known for capturing long-term dependencies, LSTMs are suited for sequential data. However, they may be overly complex for short, simple audio clips, potentially leading to overfitting in a dataset like this one.
  • 1D CNN (1D Convolutional Neural Network): A simpler model architecture adapted for sequential data, such as audio. 1D CNNs apply convolution filters along a single axis, processing information across time or sequence dimensions. This makes them ideal for tasks like time-series analysis or audio classification, where short-term dependencies are key. In this case, the 1D CNN proved more compatible with the dataset, demonstrating smoother convergence and improved accuracy.