GitHub - keums/MelodyExtraction_MCDNN: ISMIR2016: Melody extraction on vocal segments using multi-column deep neural networks

keums / MelodyExtraction_MCDNN Public

Notifications You must be signed in to change notification settings
Fork 4
Star 20

ISMIR2016: Melody extraction on vocal segments using multi-column deep neural networks

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SAVE_RESULTS		SAVE_RESULTS
ex_training		ex_training
model		model
viterbi		viterbi
Abstract_KON1.pdf		Abstract_KON1.pdf
MelodyExtraction_SCDNN.py		MelodyExtraction_SCDNN.py
README.txt		README.txt
VAD_DNN.py		VAD_DNN.py
main.py		main.py
making_multi_frame.py		making_multi_frame.py
making_multi_frame_VAD.py		making_multi_frame_VAD.py
melody_extraction_KON1.pdf		melody_extraction_KON1.pdf
myFeatureExtraction.py		myFeatureExtraction.py
mySelect_weight.py		mySelect_weight.py
pop1.wav		pop1.wav
viterbi.py		viterbi.py

Repository files navigation

README.txt

============================================================
** Contact Info 
============================================================
Sangeun Kum <keums@kaist.ac.kr>
Changheun Oh <thecow@kaist.ac.kr>
Juhan Nam <juhannam@kaist.ac.kr>

Korea Advanced Institute of Science and Technology 

============================================================
** Description 
============================================================
This is our submission to the 2016 MIREX melody extraction task.
The algorithm is a classification based approach using deep neural networks.
The file 'main.py' is the main function for calling the algorithm. 
It takes as parameter, input the full path string for the input file and output file.
If you want to know about this algorithms, 
please check https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/119_Paper.pdf

============================================================
** Platform and Requirements
============================================================
1. OS : LINUX 

2. Programming language : Python 2.7

3. Python Library : 
  1) Keras (Deep Learning library for Theano)
    >> http://keras.io/
  
  2) Theano (Backend of Keras)
    >> http://deeplearning.net/software/theano/install.html#install
    
  3) Librosa (for audio analysis such as laod,STFT,resampling)  
    >> http://librosa.github.io/librosa/

  4) ffmpeg 
    >> https://www.ffmpeg.org/
    >> for install : brew install ffmpeg 

  5) Numpy, SciPy

4. Hardware
  1) GPU : GeForce GTX 980 
    >> https://developer.nvidia.com/cuda-toolkit

5. Expected runtime : 2~3 seconds/song 
     
============================================================
** Use 
============================================================
The algorithm is called as follows: 

(to call from the command line)
>>python main.py <parameter> <input path> <ouput path>
ex) >>python main.py 0.2 '/home/keums/Melody/dataset/adc2004_full_set/file/pop3.wav' './SAVE_RESULTS/pop3.txt'

or

(to call from the shell)
>>main(param = 0.2, PATH_LOAD_FILE='/home/keums/Melody/dataset/adc2004_full_set/file/pop4.wav', PATH_SAVE_FILE='./SAVE_RESULTS/pop4.txt')

** default param = 0.2, 
if the voice recall rate is low, increaing the param would be effective (0 <= param <= 1 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

keums/MelodyExtraction_MCDNN

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages