This project aims to classify music genres using album cover images. The classification is based on deep learning techniques, specifically convolutional neural networks (CNNs), trained on a dataset of album cover images associated with different music genres collected from AllMusic genre tags. The dataset is generated from the The AcousticBrainz Genre Dataset.
The project consists of three main components:
Data Retrieval:
- The script retrieves album cover images from the Cover Art Archive API based on MusicBrainz release group IDs (MBIDs) stored in a tab-separated values (TSV) file.
- It downloads the images, stores them locally, and saves the file paths to a CSV file.
- The TSV file can be found here
Data Processing:
- The script preprocesses the data by mapping genre labels to MBIDs and adjusting image file paths. It also filters the data to only include the top 6 most frequent genre classes.
Model Training:
- The script trains a CNN model on the preprocessed dataset. It uses the VGG16 pre-trained model as the base and fine-tunes it on the album cover images.
- Clone the repository:
git clone
cd Genre_Recognition_Album_Cover
Create a conda environment:
conda create -n "myenv" python=3.10
(or use your preffered virtual environment) -
Install the required dependencies:
pip install -r requirements.txt
Run the data retrieval and preprocessing scripts:
python <tsv-filename>
Train the model:
python <directory-path>
Replace <tsv-filename>
with the path to the TSV file containing MBIDs, and <directory-path>
with the path to the directory containing the dataset (./
in this case).
- Archive the dataset (the
$ tar cf mydataset.tar data/*
- Load modules required by TensorFlow:
[name@server ~]$ module load python/3.10 cuda/12.2 cudnn/
- Create a new Python virtual environment:
[name@server ~]$ virtualenv --no-download tensorflow
- Activate Python virtual environment:
[name@server ~]$ source tensorflow/bin/activate
- Install TensorFlow:
(tensorflow) [name@server ~]$ pip install --no-index tensorflow
- Install requirements from
pip install -r requirementsvenv.txt --no-index
- Submit a job using the supplied bash script and
command (this example is on the Cedar cluster):
sbatch --gres=gpu:1 --cpus-per-task=6 --mem=32000M --time=6:00:00
├── data
│ ├── csv
│ │ ├── final_top_6.csv
│ │ ├── mbid_to_image_filenames.csv
│ │ └── output_interrupted.csv
│ └── images
│ ├── <image-files>
├── logs
│ └── art_retrieval.log
├── results
│ ├── best_model.keras
│ ├── confusion_matrix.png
│ ├── results.txt
│ ├── training_validation_accuracy.png
│ └── training_validation_loss.png
The trained model is saved in the results directory along with evaluation metrics such as the confusion matrix and classification report.
- best_model.keras
- confusion_matrix.png
- results.txt, which contains:
- test loss, test accuracy, classification report (precision, recall, f1 score for each class) and confusion matrix
- training and validation accuracy over each epoch of training
- training and validation loss over each epoch of training
This project utilizes data from the Cover Art Archive and the MusicBrainz database.
Bogdanov, D., Porter A., Schreiber H., Urbano J., & Oramas S. (2019).
The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale.
20th International Society for Music Information Retrieval Conference (ISMIR 2019).