- This is the official repository of A Benchmarking Initiative for Audio-domain Music Generation Using the FreeSound Loop Dataset co-authored with Paul Chen, Arthur Yeh and my supervisor Yi-Hsuan Yang. The paper has been accepted by International Society for Music Information Retrieval Conference 2021. [Demo Page], [arxiv].
- We not only provided pretrained model to generate loops on your own but also provided scripts for you to evaluate the generated loops.
$ conda env create -f environment.yml
- Generate loops from one-bar looperman pretrained model
$ gdown --id 1GQpzWz9ycIm5wzkxLsVr-zN17GWD3_6K -O looperman_one_bar_checkpoint.pt
$ bash scripts/generate_looperman_one_bar.sh
- Generate loops from four-bar looperman pretrained model
$ gdown --id 19rk3vx7XM4dultTF1tN4srCpdya7uxBV -O looperman_four_bar_checkpoint.pt
$ bash scripts/generate_looperman_four_bar.sh
- Generate loops from freesound pretrained model
$ gdown --id 197DMCOASEMFBVi8GMahHfRwgJ0bhcUND -O freesound_checkpoint.pt
$ bash scripts/generate_freesound.sh
- Looperman pretrained one-bar model
- Looperman pretrained four-bar model
- Freesound pretrained one-bar model
$ gdown --id 1fQfSZgD9uWbCdID4SzVqNGhsYNXOAbK5
$ unzip freesound_mel_80_320.zip
$ CUDA_VISIBLE_DEVICES=2 python train_drum.py \
--size 64 --batch 8 --sample_dir freesound_sample_dir \
--checkpoint_dir freesound_checkpoint \
--iter 100000
mel_80_320
$ CUDA_VISIBLE_DEVICES=2 python generate_audio.py \
--ckpt freesound_checkpoint/100000.pt \
--pics 2000 --data_path "./data/freesound" \
--store_path "./generated_freesound_one_bar"
- 2000 looperman melspectrogram link
$ cd evaluation/NDB_JS $ gdown --id 1aFGPYlkkAysVBWp9VacHVk2tf-b4rLIh $ unzip looper_2000.zip # contain 2000 looperman mel-sepctrogram $ rm looper_2000/.zip $ bash compute_ndb_js.sh
- Short-Chunk CNN checkpoint
$ cd evaluation/IS $ bash compute compute_is_score.sh
-
FAD looperman ground truth link, follow the official doc to install required packages.
$ ls --color=never generated_freesound_one_bar/100000/*.wav > freesound.csv $ python -m frechet_audio_distance.create_embeddings_main --input_files freesound.csv --stats freesound.stats $ python -m frechet_audio_distance.compute_fad --background_stats ./evaluation/FAD/looperman_2000.stats --test_stats freesound.stats
In the preprocess directory and modify some settings (e.g. data path) in the codes and run them with the following orders
$ python trim_2_seconds.py # Cut loop into the single bar and stretch them to 2 second.
$ python extract_mel.py # Extract mel-spectrogram from 2-second audio.
$ python make_dataset.py
$ python compute_mean_std.py
CUDA_VISIBLE_DEVICES=2 python train_drum.py \
--size 64 --batch 8 --sample_dir [sample_dir] \
--checkpoint_dir [checkpoint_dir] \
[mel-spectrogram dataset from the proprocessing]
- checkpoint_dir stores model in the designated directory.
- sample_dir stores mel-spectrogram generated from the model.
- You should give the data directory in the end.
- There is an example training script
We use MelGAN as the vocoder. We trained the vocoder with looperman dataset and use the vocoder in generating freesound and looperman models. The trained vocoder is in melgan directory.
The code comes heavily from the code below
- StyleGAN2 from rosinality
- Official MelGAN repo
- Official UNAGAN repo from ciaua.
- Official Short Chunk CNN repo
- FAD official document
If you find this repo useful, please kindly cite with the following information.
@inproceedings{ allenloopgen,
title={A Benchmarking Initiative for Audio-domain Music Generation using the {FreeSound Loop Dataset}},
author={Tun-Min Hung and Bo-Yu Chen and Yen-Tung Yeh, and Yi-Hsuan Yang},
booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
year={2021},
}