Torch implementation of NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis, [openreview]
- breathiness perturbation
- DEMAND based noise addition
Tested in python 3.7.9 conda environment.
Initialize the submodule.
git submodule init --update
Download LibriTTS[openslr:60], LibriSpeech[openslr:12] and VCTK[official] datasets.
Dump the dataset for training.
python -m speechset.utils.dump \
--out-dir ./datasets/dumped
To train model, run train.py
python train.py
To start to train from previous checkpoint, --load-epoch
is available.
python train.py \
--load-epoch 20 \
--config ./ckpt/t1.json
Checkpoint will be written on TrainConfig.ckpt, tensorboard summary on TrainConfig.log.
tensorboard --logdir ./log
[TODO] To inference model, run inference.py
[TODO] Pretrained checkpoints will be relased on releases.
To use pretrained model, download files and unzip it. Followings are sample script.
from nansypp import Nansypp
ckpt = torch.load('t1_200.ckpt', map_location='cpu')
nansypp = Nansypp.load(ckpt)
nansy.eval()