Neural-Style-Transfer-Audio [Blog]

This is PyTorch Implementation Of Neural Style Transfer Algorithm which is modified for Audios.

Aim -

We aim to analyze and mix two audios in order to synthesize new music, we do this by applying Neural Style Transfer on two Audios to transfer style of a style audio on a content audio.

Dependencies -

python 2.7
PyTorch
librosa
numpy and matplotlib.

How to run -

The algorithm is implemented in PyTorch which can run on both GPU and CPU.

To run: python NeuralStyleTransfer.py ContentAudio StyleAudio

Output Format(Name Of Output File) : output1D_(no of filters)_iter(no of iterations)_c(content audio)_s(style audio)_sw(style weight)_k(kernel size)_s(stride)_p(padding).wav [For changing name: Change k, s and p manually]

A Sample Output is also provided with alpha.wav as content and beta.wav as style.

How to plot graph-

To plot: python graphs.py [Content Audio file name] [Style Audio file name] [Output File name]

Problems Faced -

Library for loading Audio files -
- To load audio files librosa should be used instead of audiolab because lirosa has more features to manipulate and analyze audio files.
Preprocessing -
- While working with audio files, the feature to work on is frequency and librosa loads audios as a function of time so fourier transformation is performed on the audio.
- If we apply Neural style without fourier transformations i.e. on time domain, the output contains a lot of noise and so the audios do not mix properly.
- The matrices obtained should not be downscaled beacause it would result in loss of information and also if a channel is dropped it creates an anomaly during inverse fourier transformation.
- Duration of audio loaded is kept 58.04 so as to make the matrix sizes convenient.
- Matrices should be resized to correct dimensions.
- Sampling rates should be same.
Model -
- A shallow model should be chosen with a large number of filters.
- Pooling should be avoided as it increases the computations(slowing down our model) and also results in loss of information.
- 1D convolutions should be used instead of 2D convolutions because each frequency has it's own samples which should not be interlinked with samples of other frequencies and also it causes unnecessary noise.
Loss Function and Optimizer -
- Only style loss is considered and not the content loss.
- Adam optimizer is used but other optimizers can be used as well.
Training -
- The input audio should be the content audio and not a random white noise.
- Number of steps must be chosen carefully (400 is fine).
Final output -
- Once the final output is obtained phase reconstruction should be done.

References -

Original Paper on Neural Style Transfer : https://arxiv.org/abs/1508.06576
Blog by Dmitry Ulynov : https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/
Advanced Pytorch Tutorial for Neural Style Transfer : http://pytorch.org/tutorials/advanced/neural_style_tutorial.html#sphx-glr-advanced-neural-style-tutorial-py
Paper on Audio Style Transfer By Eric Grinstein, Ngoc Duong, Alexey Ozerov, Patrick Perez : https://arxiv.org/abs/1710.11385

More Resources(Not Implemented) -

Paper From NIPS 2017 By Prateek Verma, Julius O. Smith : https://arxiv.org/abs/1801.01589
Paper By Anthony Perez, Chris Proctor, Archa Jain : http://web.stanford.edu/class/cs224s/reports/Anthony_Perez.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Sample Inputs		Sample Inputs
Sample output		Sample output
Figure_1.png		Figure_1.png
LICENSE		LICENSE
NeuralStyleTransfer.py		NeuralStyleTransfer.py
README.md		README.md
graphs.py		graphs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural-Style-Transfer-Audio [Blog]

Aim -

Dependencies -

How to run -

How to plot graph-

Problems Faced -

References -

More Resources(Not Implemented) -

About

Releases

Packages

Languages

License

alishdipani/Neural-Style-Transfer-Audio

Folders and files

Latest commit

History

Repository files navigation

Neural-Style-Transfer-Audio [Blog]

Aim -

Dependencies -

How to run -

How to plot graph-

Problems Faced -

References -

More Resources(Not Implemented) -

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages