Skip to content

Latest commit

 

History

History
199 lines (149 loc) · 8.5 KB

File metadata and controls

199 lines (149 loc) · 8.5 KB

Genrify - The Music App

Hear it. Genrify it.

In this work, the objective is to classify the audio data into specific genres from GTZAN dataset which contain about 10 genres. We have built a Convolutional Neural Network model using the tensorflow library to classify the 10 genres.


DOCS UI

Introduction:

  • The idea behind this project is to see how to handle sound files in python, compute sound and audio features from them, run Deep learning algorithms on them, and predict the genre using an audio signal as its input. So we considered 2 datasets, one is the FMA dataset and the other is the GTZAN dataset.

The FMA Dataset

The FMA dataset is based on the music contributed by various, mostly indie artists to the Free Music Archive. The smallest variant of this dataset (‘fma-small’) which was about 9 GiB uncompressed and with about 8K tracks. The FMA dataset is robust as, it is actually representative of contemporary music, at least in terms of the recording quality (44.1 kHz stereo) and is generally very high quality, originally meant for end user consumption. Hence, it was chosen to be ideal for training a model for music genre classification.

The GTZAN Dataset

GTZAN dataset was used that contains 1000 music clips of time duration 30 second with 22050 Hz sampling frequency. There are in all 10 different genres like blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae and rock. Each genre has 100 audio files. An audio read with the sampling rate of 22050 Hz. After that, it split the audio of 30 seconds durations into 3 seconds durations of audio clips. The 10 genres are as follows:

  • Blues
  • Classical
  • Country
  • Disco
  • Hip-hop
  • Jazz
  • Metal
  • Pop
  • Reggae
  • Rock

Data Preprocessing steps:

  • Using Librosa library and displaying the raw audio files.
  • Plotting the Spectrograms and Mel-Spectrograms for better understanding of the audio files.
  • Splitting the data into training and testing sets.
  • Feature extraction and scaling of the features for easier model construction

MFCC's are derived as follows:

  • Fourier transform of a signal is taken.
  • The powers of the spectrum obtained above are mapped onto the mel scale, using triangular overlapping windows.
  • The logs of the powers at each of the mel frequencies are taken.
  • The discrete cosine transform of the list of mel log powers are taken, as if it were a signal.
  • The MFCCs are the amplitudes of the resulting spectrum.

Model Construction:

GTZAN Model:

Deep Learning Model :

After pre-processing the dataset, we come to the part where we use concepts of Convolutional Neural Network to build and train a model that classifies the music genre. Using the Convolutional Neural Network Model which made use of features such as MFCC's,spectral centroids, extracted features in features3sec.csv.

For the CNN model:

  • All of the hidden layers are using the RELU activation function and the output layer uses the softmax function. The loss is calculated using the sparse_categorical_crossentropy function.Dropout is used to prevent overfitting.
  • The model is compiled using the optimizer and the sparse_categorical_crossentropy loss function will be optimized which is suitable for multi-class classification.We are monitoring the classification accuracy metric since we have the same number of examples in each of the 10 classes.
  • The model accuracy can be increased by further increasing the epochs but after a certain period, we may achieve a threshold, so the value should be determined accordingly.
  • The final trained model resulted in an accuracy around 92% on the dataset with 6693 .wav files.

Project architecture:

The basic project architechture is given below:


Preview




Instructions to run

  • Pre-requisites:

    Installing Pre-requsites using environment.yml

     conda env create -f environment.yml 
       conda activate {Environment_Name} 

    Installing Pre-requistes using requirements.txt

     pip install -r /path/to/requirements.txt
  • Directions to run Flask Server

     python app.py 

Contributors

Kruthi M

Kruthi M

Shashwat Ganesh

Shashwat Ganesh

Anushree Bajaj

Anushree Bajaj

Jahnavi Darbhamulla

Jahnavi Darbhamulla

Rayaan Faiz

Rayaan Faiz

License

License

Made with ❤️ by DS Community SRM