Analyze and classify audio signals in Python
The purpose of this project is to explore different machine learning classifiers for classifying music genre from an audio sample.
- Python 3.7
- Librosa 0.7.2
- sklearn 0.0
- TensorFlow 2.1.1
- pandas 1.0.4
- matplotlib 3.0.2
GTZAN Genre Collection
- 1000 audio tracks, 30 seconds long each
- 22,050 Hz mono 16-bit audio files in .wav format
- 10 genres (100 songs/genre)
- Blues
- Classical
- Country
- Disco
- Hip Hop
- Jazz
- Metal
- Pop
- Raggae
- Rock
In order to train and test our classifiers, we need to identify the features to extract from the audio samples. Luckily, prior research has already identified features that perform well in music genre classification.
The features that we extract are:
- Zero Crossing Rate - rate at which the signal changes from positive to negative or negative to positive
- Spectral Centroid - weighted mean of frequencies present in audio clip
- Spectral Roll-Off - the frequency below which a specified percentage of the total spectral energy (85% by default) lies
- Chroma Frequencies - the intensity of each of the 12 distinct musical chroma of the octave; chroma representation/chromagarm (via short-term fourier transform)
- Mel-Frequency Cepstral Coefficients (MFCC) (x20) - coefficients that collectively make up an MFC
- Mel-Frequency Cepstrum (MFC) - representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency
This results in a feature vector of length 25.
The feature extraction is done by running preprocessing.py. This file takes several minutes to run, as the processing of each sample takes a few seconds.
We use the pre-processed features in order to train and test the different machine learning classifiers:
- Linear Kernel SVM
- Polynomial Kernel SVM
- Radial Basis Function (RBF) SVM
- K Nearest Neighbors (k-NN)
- Logistic Regression
- Naïve Bayesian
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Random Forest
- Decision Tree
- Neural Network
Please note, some of these classifiers required hyper-parameter tuning to optimize the accuracy (SVM, k-NN, random forest, neural network).
We use a 90%/10% test/train split.
Here are some plots which help to visualize how certain hyper-parameters were selected.
For k-NN, we can see k nearest neighbors = 7 optimizes the accuracy. Here we've plotted random forest accuracy versus N (number of subtrees) and d (maximum depth of each subtree). It’s easy to visualize that for d > 6, the accuracy seems to converge. Similarly, accuracy seems to improve slightly for N > 6. For polynomial kernel SVM, c=1/degree=3 and c=10/degree=2 seem like good choices. Here we try SVM with different kernels and different values of gamma (c=10). We plot the x-axis on a log scale. We can see the rbf kernel performs the best with gamma=0.1.This is the structure of the neural network implemented in TensorFlow. We use the Adam optimizer and train for only 10 epochs. Otherwise, overfitting will occur as can be seen in the training curves.
The best performing classifier is the ensemble (majority) voting classifier. For this, we use the Poly SVM, RBF SVM, k-NN, and QDA as the estimators. The worst performing classifier is Naive Bayes.
Mean Accuracy | Mean Precision | Mean Recall | |
---|---|---|---|
SVM, Linear Kernel (C=1) | 0.62 | 0.61 | 0.61 |
SVM, Poly Kernel (Degree=2, C=10) | 0.76 | 0.77 | 0.77 |
SVM, RBF Kernel (Gamma=0.1, C=10) | 0.75 | 0.76 | 0.75 |
k-NN (k=7) | 0.73 | 0.74 | 0.73 |
Logistic Regression | 0.71 | 0.70 | 0.73 |
Naive Bayesian | 0.38 | 0.31 | 0.36 |
LDA | 0.69 | 0.69 | 0.7 |
QDA | 0.74 | 0.74 | 0.74 |
Random Forest (N=6, d=10) | 0.59 | 0.59 | 0.60 |
Decision Tree | 0.53 | 0.52 | 0.52 |
NN (Adam) | 0.62 | 0.63 | 0.62 |
Voting Classifier | 0.79 | 0.81 | 0.79 |
Below is the confusion matrix for the voting classifier.
- Run requirements.txt
$ pip install -r requirements.txt
data.csv or
-
Download GTZAN dataset
-
Run preprocessing.py to generate csv file (data.csv) with features for each file
- Change path to the root directory (genres) of GTZAN dataset
path = '/path/to/gtzan/genres/' # path to data
Place data.csv in the same directory as your scripts.
- Run classical_models.py to compare the different models
- Run nn_models.py to create and train neural network model
- Run plot_features.py to visualize the feautres of the dataset
- Hyper-Parameter Tuning
- Run svm_model.py to see accuracy versus kernel and C
- Run random_forest.py to see accuracy versus d and N
- Run knn_model.py to see accuracy versus k
Laura Kocubinski laurakoco
- Boston University MET Master of Science Computer Science Program
- MET CS 677 Data Science with Python
[1] https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8
[2] "Musical Genre Classification of Audio Signals" G. Tzanetakis, P. Cook. IEEE Transactions on Audio and Speech Processing, 2002.
[3] "Music Genre Classification" Archit Rathore, Margaux Dorido, https://cse.iitk.ac.in/users/cs365/2015/_submissions/archit/report.pdf