This is the official repository for the paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
If there is any problem related to the code running, please open an issue and I will help you as mush as I can.
To promote transparency and reproducibility in research, I have retrained a similar model using publicly available datasets after the internship. This model has been trained on public data and adheres to the same methodology described in the paper.
**Note that this is NOT the official ckpt and has NO relation with Sony. The performance is similar to the official checkpoint. **
https://huggingface.co/ldzhangyx/instruct-MusicGen/blob/main/finetuned.ckpt
https://bit.ly/instruct-musicgen
# clone project
git clone https://github.com/ldzhangyx/instruct-MusicGen/
cd instruct-MusicGen
# [OPTIONAL] create conda environment
conda create -n myenv python=3.11.7
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
# clone project
git clone https://github.com/ldzhangyx/instruct-MusicGen/
cd instruct-MusicGen
# create conda environment and install dependencies
conda env create -f environment.yaml -n myenv
# activate conda environment
conda activate myenv
Train model with default configuration
# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpu
You may need to change essential parameters in config/config.yaml
to fit your own dataset.
You can override any parameter from command line like this
python src/train.py trainer.max_epochs=50 data.batch_size=4
python src/data/slakh_datamodule.py
For add
, remove
, extract
operation, please change the parameters in both test_step()
in src/models/instructmusicgenadapter_module.py
and __getitem__()
in src/data/slakh_datamodule.py
.
Currently it should be completed manually. But we will provide a script to automate this process soon.
python src/eval.py
Please make sure the generated music files are in the corresponding locations.
python evaluation/utils.py # to generate a csv file for CLAP calculation
python evaluation/main.py
After preparing the checkpoint and the input audio file, you can generate audio via
python src/inference.py
@article{zhang2024instruct,
title={Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning},
author={Zhang, Yixiao and Ikemiya, Yukara and Choi, Woosung and Murata, Naoki and Mart{\'\i}nez-Ram{\'\i}rez, Marco A and Lin, Liwei and Xia, Gus and Liao, Wei-Hsiang and Mitsufuji, Yuki and Dixon, Simon},
journal={arXiv preprint arXiv:2405.18386},
year={2024}
}