Recurrent Neural Network based Text-to-Speech systems

This repository contains code to train a End-to-End Speech Synthesis system, based on the Tacotron2 model with modifications as described in Location Relative Attention Mechanisms for Robust Long-Form Speech Synthesis.

The system consists of two parts:

A Tacotron2 model with Dynamic Convolutional Attention which modifies the hybrid location sensitive attention mechanism to be purely location based, resulting in better generalization on long utterances. This model takes text (in the form of character sequence) as input and predicts a sequence of mel-spectrogram frames as output (the seq2seq model).
A WaveRNN based vocoder; which takes the mel-spectrogram predicted in the previous step as input and generates a waveform as output (the vocoder model).

All audio processing parameters, model hyperparameters, training configuration etc are specified in config.py.

Quick start

Train TTS from scratch

Download and extract the dataset (The assumption made is that the dataset to be used for training is in the same format as the LJSpeech dataset)

Preprocess the downloaded dataset; perform feature extraction on the wav files and create train/val/eval splits

python preprocess.py \
        --dataset_dir <Path to the root of the downloaded dataset> \
        --out_dir <Output path to write the processed dataset>

Train the Tacotron2 model

python train_Tacotron2.py \
        --data_dir <Path to the processed dataset to be used to train the model> \
        --checkpoint_dir <Path to location where training checkpoints will be saved> \
        --alignments_dir <Path to the location where training alignments will be saved> \
        --resume_checkpoint_path <If specified load checkpoint and resume training>

Train the WaveRNN model

python train_WaveRNN.py \
        --data_dir <Path to the processed dataset to be used to train the model> \
        --checkpoint_dir <Path to location where training checkpoints will be saved> \
        --resume_checkpoint_path <If specified load checkpoint and resume training>

Synthesize using a trained TTS model

Prepare the text to be synthesized

The text to be synthesized should be placed in the synthesis.txt file which has the following format
```
<TEXT_ID_1> TEXT_1
<TEXT_ID_2> TEXT_2
.
.
.
```

Text to speech synthesis

python tts_Synthesis.py \
        --synthesis_file <Path to the synthesis.txt file (created in Step 1)> \
        --Tacotron2_checkpoint <Path to the trained Tacotron2 model to use for synthesis> \
        --WaveRNN_checkpoint <Path to the trained WaveRNN model to use for synthesis> \
        --out_dir <Path to where the synthesized waveforms will be written to disk>

Acknowledgements

This code is based on the code in the following repositories

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
tacotron2		tacotron2
text/en		text/en
wavernn		wavernn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
config.py		config.py
preprocess.py		preprocess.py
train_Tacotron2.py		train_Tacotron2.py
train_WaveRNN.py		train_WaveRNN.py
tts_Synthesis.py		tts_Synthesis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recurrent Neural Network based Text-to-Speech systems

Quick start

Train TTS from scratch

Synthesize using a trained TTS model

Acknowledgements

References

About

Releases

Packages

Languages

License

anandaswarup/RNN-TTS

Folders and files

Latest commit

History

Repository files navigation

Recurrent Neural Network based Text-to-Speech systems

Quick start

Train TTS from scratch

Synthesize using a trained TTS model

Acknowledgements

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages