The source code of "A Streamlined Encoder/Decoder Architecture for Melody Extraction"
Requires following packages:
- python 3.6
- pytorch 0.4.1
- numpy
- scipy
- pysoundfile
- pandas
Melody extraction on an audio file. The output will be .txt file of time(sec) and frequency(Hz).
usage: predict_on_audio.py [-h] [-fp FILEPATH] [-t MODEL_TYPE]
[-gpu GPU_INDEX] [-o OUTPUT_DIR] [-e EVALUATE]
optional arguments:
-h
-fp filepath Path to input audio(.wav) (default: train01.wav)
-t model_type Model type: vocal or melody (default: vocal)
-gpu gpu_index Assign a gpu index for processing.
It will run with cpu if None. (default: 0)
-o output_dir Path to output folder (default: ./output/)
-e evaluate Path to ground-truth (default: None)
-m mode The mode of CFP: std and fast (default: std)
fast mode: use sr=22050 and hop=512 (faster)
std mode : use sr=native_sample_rate and hop=256 (more accurate)
Evaluate our result on three dataset: ADC2004, MIREX05, MedleyDB. The output will be .csv file of evaluation metrics (mir_eval).
usage: evaluate.py [-h] [-dd DATA_DIR] [-t MODEL_TYPE] [-gpu GPU_INDEX]
[-o OUTPUT_DIR] [-ds DATASET]
optional arguments:
-h
-dd data_dir Path to the dataset folder (default:
Dataset/MedleyDB/Source/)
-t model_type Model type: vocal or melody (default: vocal)
-gpu gpu_index Assign a gpu index for processing.
It will run with cpu if None. (default: 0)
-o output_dir Path to output foler (default: ./output/)
-ds dataset Dataset for evaluate (default: Mdb_vocal)
Must be ADC2004 or MIREX05 or Mdb_vocal or Mdb_melody2
Preparing data for training.
usage: data_arrangement.py [-h] [-df DATA_FOLDER] [-t MODEL_TYPE]
[-o OUTPUT_FOLDER]
optional arguments:
-h, --help show this help message and exit
-df DATA_FOLDER, --data_folder DATA_FOLDER
Path to the dataset folder (default:
./data/MedleyDB/Source/)
-t MODEL_TYPE, --model_type MODEL_TYPE
Model type: vocal or melody (default: vocal
-o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
Path to output foler (default: ./data/)
Please prepare the h5py file by data_arrangement.py before training.
usage: training.py [-h] [-fp FILEPATH] [-t MODEL_TYPE] [-gpu GPU_INDEX]
[-o OUTPUT_DIR] [-ep EPOCH_NUM] [-lr LEARN_RATE]
[-bs BATCH_SIZE]
optional arguments:
-h, --help show this help message and exit
-fp FILEPATH, --filepath FILEPATH
Path to input training data (h5py file) and validation
data (pickle file) (default: ./data/)
-t MODEL_TYPE, --model_type MODEL_TYPE
Model type: vocal or melody (default: vocal)
-gpu GPU_INDEX, --gpu_index GPU_INDEX
Assign a gpu index for processing. It will run with
cpu if None. (default: 0)
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to output folder (default: ./train/model/)
-ep EPOCH_NUM, --epoch_num EPOCH_NUM
the number of epoch (default: 100)
-lr LEARN_RATE, --learn_rate LEARN_RATE
the number of learn rate (default: 0.0001)
-bs BATCH_SIZE, --batch_size BATCH_SIZE
The number of batch size (default: 50)