This repository is for the paper ReconVAT: A Semi-Supervised Automatic Music Transcription Framework Towards Real-World Applications.
Demo page is available at: https://kinwaicheuk.github.io/ReconVAT/
Supplementary Materials: TODO
You can install the following libraries at once via pip install -r requirements.txt
.
For your convenience, we have provided 3 example audio clips for you to try our models out. But to transcribe your own music, you need to first downsample them to 16kHz and save them as Flac format. Then simply put your audio clips in the path Application/Input
, then run the follow code:
python transcribe_files.py with model_type=<arg> device=<arg>
model_type
: Pick the model to transcribe your music.ReconVAT
orbaseline_Multi_Inst
. Default isReconVAT
.device
: the device to be trained on. Eithercpu
orcuda:0
. Default iscuda:0
You might also need to install ffmpeg in order to do audio downsampling. On macOS:
brew install ffmpeg
On Linux:
Apt-get install ffmpeg
MAPS dataset (as labelled dataset in our experiments): download
MAESTRO (we use v2.0.0 as our unlabelled dataset in our experiments): download
MusicNet dataset (for training strings and woodwinds): download
After downloading these dataset, unzip them to their respective folders MAPS
, MAESTRO
, and MusicNet
.
Our model takes 16kHz audio as the input, therefore we need to downsample all the audio clips first. Our model also takes tsv files as the labels, so we also need to convert midi files into tsv files.
These preprocessing functions can be found in the jupyter notebook named as Preprocessing.ipynb
.
When the dataset is ready, the PyTorch Dataset class should be able toload these datasets without errors.
The python script can be run using using the sacred syntax with
.
Unet_VAT mode:
python train_UNet_VAT.py with train_on=<arg> small=<arg> VAT=<arg> reconstruction=<arg> device=<arg>
Unet_VAT with the onset module:
python train_UNet_Onset_VAT.py with train_on=<arg> small=<arg> VAT=<arg> reconstruction=<arg> device=<arg>
Baseline model Multi-instrument:
python train_baseline_Multi_Inst.py with train_on=<arg> small=<arg> device=<arg>
Onsets and Frames: (VAT can be activated in this baseline model, but according to our experiments, VAT does not work with this baseline model)
python train_baseline_onset_frame_VAT.py with train_on=<arg> small=<arg> device=<arg>
The following two baseline model requires a huge amount of GPU memory
Thickstun:
python train_baseline_Thickstun.py with train_on=<arg> small=<arg> device=<arg>
Prestack:
python train_baseline_Prestack.py with train_on=<arg> small=<arg> device=<arg>
train_on
: the dataset to be trained on. EitherMAPS
orString
orWind
small
: Activate the small version of MAPS.True
orFalse
supersmall
: Activate the oneshot version of MAPS. Thesmall
argument has to beTrue
in order for this argument to be usefulreconstruction
: to include the reconstruction loss or not. EitherTrue
orFalse
VAT
: VAT module,True
orFalse
device
: the device to be trained on. Eithercpu
orcuda:0
python evaluate.py with weight_file=<arg> reconstruction=<arg> device=<arg>
weight_file
: The weight files should be located inside thetrained_weight
folderdataset
: which dataset to evaluate on, can be eitherMAPS
orMAESTRO
orMusicNet
.device
: the device to be trained on. Eithercpu
orcuda:0
The transcripted midi files, accuracy reports are saved inside the results
folder.
For Mac users, you need to add the following code in helper_functions.py
import matplotlib
matplotlib.use('TkAgg')