Skip to content

Latest commit

 

History

History
55 lines (47 loc) · 1.83 KB

README.md

File metadata and controls

55 lines (47 loc) · 1.83 KB

Mel-ResNet

ResNet for classifying Mel-spectrograms Mel-ResNet figure
  • We developed a generative model that inputs EEG data during Inner Speech (Imagined Speech) and the corresponding mel spectrogram of Spoken Speech (target) into a GAN.
  • Through this approach, the mel spectrogram generated from the EEG is fed into a ResNet trained on actual speech mel spectrograms to predict the imagined word.

Requirements

`Python >= 3.7`

All the codes are written in Python 3.7.

You can install the libraries used in our project by running the following command:

pip install -r requirements.txt

Dataset

We extracted word utterance recordings for a total of 13 classes using the voices of 5 contributors and TTS technology.
Additionally, to address any data scarcity issues, we applied augmentation techniques such as time stretching, pitch shifting, and adding noise.

The pairs of recorded words are as follows:

  • Call
  • Camera
  • Down
  • Left
  • Message
  • Music
  • Off
  • On
  • Receive
  • Right
  • Turn
  • Up
  • Volume

Model & Training (ongoing)

ResNet-50

(We are continuously collecting data and will need to undergo a hyperparameter tuning process after training.)


Results (ongoing)

Performance metrics : Accuracy, F1 score

(learning curve and metrics will be here)


References