This project employs multiple LLM models to generate symbolic music. There is an end-to-end pipeline to train, evaluate and generate music with various models. The pipeline supports extensive logging information with WandB.
WandB Report on training.
Models:
- Llama
- Music Transformer
- GPT-2
Tokenisers:
- REMI
- TSD
- Structured
Datasets:
- Maestro Dataset 2.0.0
- Los Angeles MIDI Dataset 3.1
- Custom datasets
The framework allows you to seamlessly integrate your own custom models, tokenizers, and datasets, providing you with greater flexibility and control over your project.
- /scripts - project scripts
- evaluate.py - script to run generated compositions evaluation
- install_dependencies.sh - script for dependencies installation and pre-trained models loading
- requirements.txt - Python requirements list
- test.py - script to run test
- train.py - script to run train
- train_word2vec.py - script to train Word2Vec model for further use in evaluation
It is strongly recommended to use new virtual environment for this project. Project was developed with Python3.9, Ubuntu 22.04.2 LTS and CUDA 11.8.
To install all required dependencies and load pre-trained models run:
./install_dependencies.sh
To run train Music Transformer with REMI tokenizer and Los Angeles MIDI dataset:
python -m train -c scripts/configs/REMI/train_music_tranformer.json
To run test inference with Los Angeles MIDI dataset with 512 prompt tokens and generate 512 tokens:
python test.py \
-c scripts/configs/test_LAMD.json \
-r best_model/model_best.pth \
-o test_results_LAMD \
--prompt_length 512 \
--continue_length 512 \
--save_audio \
-b 1
You can specify the number of elements in dataset by changing parameter max_items in test_LAMD.json.
To test model on a custom dataset you need to put MIDI files in some directory. To run test with custom dataset in custom_dataset directory:
python test.py \
-c scripts/configs/test_custom.json \
-r best_model/model_best.pth \
-o test_results_custom \
--prompt_length 512 \
--continue_length 512 \
-b 1 \
-t custom_dataset/
To evaluate quality of generated compositions the following metrics are proposed:
- Pitch Class Distribution - distribution of used notes pitches
- Notes Duration Distribution - distribution of duration of used notes
- Harmonic Reduction - evaluated harmony reduction sequence
The evaluation script calculates the features of the prompt and the continuations of the original and generated compositions. It then calculates the difference between the features of the prompt and the continuations, resulting in two distributions of feature differences. The Kullback-Leibler divergence is employed to analyze these distributions. Histograms of distances distributions saved as well.
The script considers the KL divergence between the distributions of the first two features. For the third feature, the Word2Vec model was trained on the harmonic series from the test dataset. Embeddings were then calculated for the harmonic series of the prompt and continuation using the trained model, and cosine similarity was calculated.
To train Word2Vec for harmony reduction with Los Angeles MIDI dataset:
python train_word2vec.py \
-c scripts/configs/train_word2vec.json \
-o models/word2vec.model
To evaluate the KL divergence of the proposed features from results of test.py:
python evaluate.py \
-r test_results/results.json \
-m models/word2vec.model \
-o evaluation_results
Dmitrii Uspenskii HSE AMI 4th year. dauspenskiy@edu.hse.ru