Skip to content

This is the repo with the code to conduct a comparative analysis of different audio representation models.

Notifications You must be signed in to change notification settings

crypto-code/Music-Representation-Comparison

Repository files navigation

Music-Representation-Comparison

This is the repo with the code to conduct a comparative analysis of different audio representation models.

Reproducability

This repo using the MagnaTagATune dataset to evaluate the performance of different music representation model in the downstream task of music tagging.

Dataset

The audio files for MagnaTagATune dataset can be downloaded here. Extract the audio files to audio directory in MTT folder. The directory structure will be as shown below:

.               
├── MTT
│   ├── audios
│   │   │── 0
│   │   │── 1
│   │   │── ...
│   ├── magnatagatune.json
├── evaluate_clap.py
├── evaluate_mert.py
└── ...

We use the same split as Jukebox.

Model Evaluation

We evaluate the following music representation models in this paper:

Model Performance

The comparison of the models are shown below:

Model MTTAUC MTTAP
ImageBind 88.55% 40.19%
JukeBox 91.50% 41.40%
OpenL3 89.35% 42.88%
CLAP 70.04% 27.95%
Wav2CLIP 90.15% 49.12%
MERT 93.91% 59.57%

About

This is the repo with the code to conduct a comparative analysis of different audio representation models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages