Create GloVe embeddings from monophonic MIDI tracks. You can make your own embeddings or use our pre-trained embeddings created from 263,438 monophonic MIDI tracks from the Lakh MIDI Dataset. You can download the files we used to create our pre-trained embeddings here (139MB).
Pre-trained embeddings are included for all monophonic tracks (including monophonic tracks pulled from polyphonic MIDI files) from the Lakh MIDI Dataset (LMD). We've included both csv and binary representations of these embedding in the following dimensions:
- 10-D (bin | csv)
- 25-D (bin | csv)
- 50-D (bin | csv)
- 100-D (bin | csv)
- 200-D (bin | csv)
- 300-D (bin | csv)
You can find our pre-trained embeddings in data/pre-trained-embeddings
. Below you will find the training loss for our pre-trained embeddings of each dimension.
Embeddings were created from a 159MB file that contained all note sequences from the monophonic instrument tracks in LMD seperated by an additional START_TRACK
token. Note values were expressed as general MIDI note values 0-127
and no attempt to represent timing/rythm information was made. See notebooks/process_midi_for_glove.ipynb
to see how MIDI data was prepaired before being processed by GloVe.
# clone this repo
git clone https://github.com/brangerbriz/midi-glove.git
cd midi-glove
# initialize and clone the GloVe repo
git submodule init
git submodule update
# build GloVe
cd GloVe
make
# return to midi-glove project root
cd ../
pip install -r requirements.txt
Place a folder of MIDI files you would like to use create the embeddings inside of data/
. Next start an Jupyter notebook server.
jupyter notebook
Open notebooks/process_midi_for_glove.ipynb
and change the value of the midi_dir
variable to point to the folder containing your MIDI files. Run the notebook. Processing can take a while (263,438 took ~36 min on my 4GHz CPU). When processing completes it will output a file to data/notes.txt
. This will be used as the input to GloVe.
Alternatively you can also copy the contents of the Jupyter notebook to a python script and run it without Jupyter if you prefer.
To create the GloVe embeddings, run:
./create_embeddings.sh
This script will save the embeddings to data/embeddings
by default, using these settings:
WORD_FILE=data/notes.txt
OUT_DIR=data/embeddings
MIN_COUNT=5
WINDOW_SIZE=15
DIMENSIONS=( 10 25 50 100 200 300 )
ITERATIONS=200
Edit create_embeddings.sh
to change the default settings.
To plot the training loss of the GloVe algorithm, save the output of create_embeddings.sh
.
./create_embeddings.sh &> data/embeddings/train.out
Use notebooks/plot_glove_training_loss.ipynb
to load, parse, and graph the loss from this saved file.
Colin Raffel. "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching". PhD Thesis, 2016.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.