Official PyTorch implementation of the paper:
- Shih-Lun Wu and Yi-Hsuan Yang
Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023
Paper | Audio demo (Google Drive) | Model weights
- [24-07-29] Added
transformers
GPT-2 implementation for Embellish model, which doesn't depend onfast-transformers
package.
- Python 3.8 and CUDA 10.2 recommended
- Install dependencies
pip install -r requirements.txt # Optional, only required if you are using the config `stage02_embellish/config/pop1k7_default.yaml` pip install git+https://github.com/cifkao/fast-transformers.git@39e726864d1a279c9719d33a95868a4ea2fb5ac5
- Download trained models from HuggingFace Hub (make sure you're in repository root directory)
git clone https://huggingface.co/slseanwu/compose-and-embellish-pop1k7
-
Stage 1: generate lead sheets (i.e., melody + chord progression)
python3 stage01_compose/inference.py \ stage01_compose/config/pop1k7_finetune.yaml \ generation/stage01 \ 20
You'll have 20 lead sheets under
generation/stage01
after this step. -
Stage 2: generate full performances conditioned on Stage 1 lead sheets, using GPT-2 backbone
python3 stage02_embellish/inference_gpt2.py \ stage02_embellish/config/pop1k7_gpt2.yaml \ generation/stage01 \ generation/stage02_gpt2
The
samp_**_2stage_samp**.mid
files undergeneration/stage02
are the final results.- (Optional) Use Performer (from
fast-transformers
) backbone for stage 2 insteadpython3 stage02_embellish/inference.py \ stage02_embellish/config/pop1k7_default.yaml \ generation/stage01 \ generation/stage02_performer
- (Optional) Use Performer (from
-
Stage 1: lead sheet (i.e. "Compose") model
python3 stage01_compose/train.py stage01_compose/config/pop1k7_finetune.yaml
-
Stage 2: performance (i.e. "Embellish") model w/ GPT-2 backbone
python3 stage02_embellish/train.py stage02_embellish/config/pop1k7_gpt2.yaml
- (Optional) use Performer backbone instead, which allows longer context window
python3 stage02_embellish/train.py stage02_embellish/config/pop1k7_default.yaml
- (Optional) use Performer backbone instead, which allows longer context window
Note that these two training stages may be run in parallel.
If you'd like to experiment with your own datasets, we suggest that you
- read our dataloaders (stage 1, stage 2) and
.pkl
files of our processed datasets (stage 1, stage 2) to understand what the models receive as inputs - refer to CP Transformer repo for a general guide on converting audio/MIDI files to event-based representations
- use musical structure analyzer to get required structure markings for our stage 1 models.
We would like to thank the following people for their open-source implementations that paved the way for our work:
- Performer (fast-transformers): Angelos Katharopoulos (@angeloskath) and Ondřej Cífka (@cifkao)
- Transformer w/ relative positional encoding: Zhilin Yang (@kimiyoung)
- Musical structure analysis: Shuqi Dai (@Dsqvival)
- LakhMIDI melody identification: Thomas Melistas (@gulnazaki)
- Skyline melody extraction: Wen-Yi Hsiao (@wayne391) and Yi-Hui Chou (@sophia1488)
If this repo helps with your research, please consider citing:
@inproceedings{wu2023compembellish,
title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach},
author={Wu, Shih-Lun and Yang, Yi-Hsuan},
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
year={2023},
url={https://arxiv.org/pdf/2209.08212.pdf}
}