GitHub - ttaoREtw/semi-tts: Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

Semi-supervised TTS

Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

Data

Prepare data

Download VCTK and LJSpeech and put it into data/audio-corpus. Specifically, waves from LJSpeech should be in data/audio-corpus/lj and waves from speaker p225 of VCTK should be in data/audio-corpus/p225, etc.. The data partition is specified in data/partition_tables/<partition-table.csv>. The phoneme transcription of each wave file is in data/map_tables/lj_vctk_g2p.csv.

For members in NTU speech lab, the audio-corpus could be downloaded from /groups/public/ttao/audio-corpus.zip.

Audio preprocessing

The hyperparameter for audio features could be modified in config/<config.yaml>. The audio preprocessing code is in src/audio.py.

Model

To adjust the model hyperparameters or learning rate, please modify the configure file in config/<config.yaml>.

Running

Train from scratch

python main.py --config config/<config.yaml> --njobs <num-workers>

Continue training

python main.py --config config/<config.yaml>\
               --njobs <num-workers>\
               --load <checkpoint-path>

The training log could be found in directory log/.

For members in NTU speech lab, the checkpoints could be downloaded from /groups/public/ttao/semi-tts-ckpt.zip.

Inference

python main.py --gen-specgram\
               --config config/<config.yaml>\
               --njobs <num-workers>\
               --load <checkpoint-path>\
               --logdir <output-directory>

Reference

Grapheme-to-phoneme tool: https://github.com/Kyubyong/g2p
Tacotron-2: https://github.com/NVIDIA/tacotron2
End-to-end ASR: https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch

Citation

@inproceedings{liu2020towards,
  title={Towards unsupervised speech recognition and synthesis with quantized speech representation learning},
  author={Liu, Alexander H and Tu, Tao and Lee, Hung-yi and Lee, Lin-shan},
  booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7259--7263},
  year={2020},
  organization={IEEE}
}

@article{tu2020semi,
  title={Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation},
  author={Tu, Tao and Chen, Yuan-Jui and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2005.08024},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bin		bin
config		config
corpus		corpus
data		data
lib		lib
src		src
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
illustration.png		illustration.png
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi-supervised TTS

Data

Prepare data

Audio preprocessing

Model

Running

Train from scratch

Continue training

Inference

Reference

Citation

About

Releases

Packages

Languages

License

ttaoREtw/semi-tts

Folders and files

Latest commit

History

Repository files navigation

Semi-supervised TTS

Data

Prepare data

Audio preprocessing

Model

Running

Train from scratch

Continue training

Inference

Reference

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages