Skip to content

Tensorflow implementation of pix2pix(cGAN) for audio source separation

Notifications You must be signed in to change notification settings

soobinseo/pix2pix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pix2pix

Description

  • This is a Tensorflow implementaion of Audio source separation (mixture to vocal) using the pix2pix. I pre-processed raw data(mixture and vocal pair dataset) to spectrogram that can be treated as 2-dimensional image, then train the model. See the file hyperparams.py for the detailed hyperparameters.

Requirements

  • NumPy >= 1.11.1
  • TensorFlow >= 1.0.0
  • librosa

Data

I used DSD100 dataset which consists of pairs of mixture audio files and vocal audio files. The complete dataset (~14 GB) can be downloaded here.

File description

  • hyperparams.py includes all hyper parameters that are needed.
  • data.py loads training data and preprocess it into units of raw data sequences.
  • modules.py contains all methods, building blocks and skip connections for networks.
  • networks.py builds networks.
  • train.py is for training.

Training the network

  • STEP 1. Adjust hyper parameters in hyperparams.py if necessary.
  • STEP 2. Download and extract DSD100 data as mentioned above at 'data' directory, and run data.py.
  • STEP 3. Run train.py.

Notes

  • I didn't implement evaluation code yet, but i will update soon.

About

Tensorflow implementation of pix2pix(cGAN) for audio source separation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages