Hannibal_ERC2019

Info:

Thi 2 vòng.
Vòng 1 trong 17-28/11.
Bài toán là supervised classification.
Train set (handout) gồm ~5000 cặp (file .pcm - file âm thanh/ tiếng nói, label). Label là 1 trong 6 cảm xúc: happy, sad, anger, fear, disgust, và neutral (vui, buồn, giận, sợ, kinh tởm, trung tính).
Test set (public-test) gồm ~1000 files âm thanh (ko có label).
Mỗi ngày được submit 3 lần.
Vòng 2 thì chỉ là chọn mấy đội top lên present rồi train lại trên server của BTC, với test lại trên ~1000 files closed-test khác.

More info:

File audio chỉ là một câu nói ngắn.
Học trực tiếp từ tín hiệu âm thanh, phân tích ngữ điệu.
Thầy Nam gợi ý dùng bộ lọc feature là MFCC.
Dữ liệu tiếng anh.

Task #1:

Thanh: tập trung trả lời câu hỏi là cái feature vector của nó trả về sẽ chứa những thông tin gì.
Việt: chú ý phần cài đặt cụ thể và những tham số liên quan mà mình có thể điều chỉnh lúc dùng thuật toán (ý là cái hàm trong thư viện) đó.
Tiến: tìm hiểu về các bộ dữ liệu phổ biến có sẵn, coi qua về toàn bộ các bước phía sau... xem ngta dùng thuật toán học gì.

Meeting #1: chiều thứ 7.

Some links:

https://www.researchgate.net/profile/Manikrao_Dhore/publication/43785303_Speech_Emotion_Recognition_Using_Support_Vector_Machines/links/0deec5243c08426014000000.pdf

Berlin emotional database (tiếng Đức) -> MFCC + MEDC -> SVM -> accurate ~95% (đừng care cái này lắm)
Dùng LIBSVM (ko biết của ngôn ngữ nào nữa), dùng Radial Basis Function (RBF) kernel & Polynomial kernel functions.
Ko có hướng dẫn cụ thể hơn

https://www.academia.edu/6784167/c%C3%A1c_vector

Tài liệu về MFCC

https://pdfs.semanticscholar.org/05ba/884878eaff5f977d488fe792f78e57e18418.pdf

Berlin emotional database -> MFCC -> SVM.
Dùng 3-stage hierarchical SVM, RBF sigma value = 1, 10-fold cross-validations.
Có thể tham khảo thêm về chi tiết của SVM.

https://github.com/amanbasu/speech-emotion-recognition

IEMOCAP dataset (tiếng Anh) -> MFCC -> recurrent neural network (LSTM)
Có code đầy đủ

https://data-flair.training/blogs/python-mini-project-speech-emotion-recognition/

RAVDESS dataset (tiếng Anh, 16bit, 48kHz .wav, chắc k giống đề đâu, tại có ~2500 files à) -> MFCC -> MLPClassifier (Multi-layer Perceptron, a feedforward ANN model)
Có ảnh chụp code

https://towardsdatascience.com/speech-emotion-recognition-with-convolution-neural-network-1e6bb7130ce3 https://github.com/rezachu/emotion_recognition_cnn

RAVDESS -> MFCC -> CNN/Keras
Có code từng đoạn

https://github.com/MITESHPUTHRANNEU/Speech-Emotion-Analyzer

RAVDESS + SAVEE -> MFCC -> CNN/Keras, so sánh với LSTM và MLP

https://www.microsoft.com/en-us/research/publication/high-level-feature-representation-using-recurrent-neural-network-for-speech-emotion-recognition/

IEMOCAP -> MFCC -> RNN, bidirectional LSTM, Deep neural network (DNN) - Extreme Learning Machine (ELM), Gaussian mixture model (GMM), hidden Markov model (HMM).

https://github.com/tyiannak/pyAudioAnalysis

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
LstmCNN_1.ipynb		LstmCNN_1.ipynb
README.md		README.md
ReadData.py		ReadData.py
Read_dataset.py		Read_dataset.py
SpeechEmontion.ipynb		SpeechEmontion.ipynb
n17.wav		n17.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hannibal_ERC2019

About

Releases

Packages

Contributors 2

Languages

tien238lnd/Emotion_Recognition_Challenge_2019

Folders and files

Latest commit

History

Repository files navigation

Hannibal_ERC2019

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages