Skip to content
This repository has been archived by the owner on Sep 11, 2022. It is now read-only.

Commit

Permalink
Merge pull request #190 from yt605155624/fs2_en
Browse files Browse the repository at this point in the history
Refactor of examples, add fastspeech2_ljspeech, pwg_ljspeech, update …
  • Loading branch information
zh794390558 authored Oct 11, 2021
2 parents ffda283 + 55134d0 commit 49caf86
Show file tree
Hide file tree
Showing 106 changed files with 137,405 additions and 2,254 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,11 @@ dmypy.json
runs
syn_audios
exp/
exp_*/
dump/
dump_*/
*_test.py
*_torch.py
*.ipynb
*ckpt*
*alignment*
13 changes: 7 additions & 6 deletions benchmark/PWGAN/run_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,19 @@ fi
# 2 拷贝该模型需要数据、预训练模型
# 下载 baker 数据集到 home 目录下并解压缩到 home 目录下
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
wget https://paddlespeech.bj.bcebos.com/datasets/BZNSYP.rar
wget https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar
mkdir BZNSYP
unrar x BZNSYP.rar BZNSYP
wget https://paddlespeech.bj.bcebos.com/Parakeet/benchmark/durations.txt
fi
# 数据预处理
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then

python examples/parallelwave_gan/baker/preprocess.py --rootdir=BZNSYP/ --dumpdir=dump --num_cpu=20
python examples/parallelwave_gan/baker/compute_statistics.py --metadata=dump/train/raw/metadata.jsonl --field-name="feats" --dumpdir=dump/train
python examples/parallelwave_gan/baker/normalize.py --metadata=dump/train/raw/metadata.jsonl --dumpdir=dump/train/norm --stats=dump/train/stats.npy
python examples/parallelwave_gan/baker/normalize.py --metadata=dump/dev/raw/metadata.jsonl --dumpdir=dump/dev/norm --stats=dump/train/stats.npy
python examples/parallelwave_gan/baker/normalize.py --metadata=dump/test/raw/metadata.jsonl --dumpdir=dump/test/norm --stats=dump/train/stats.npy
python utils/vocoder_preprocess.py --rootdir=BZNSYP/ --dumpdir=dump --num-cpu=20 --cut-sil=True --dur-file=durations.txt --config=examples/parallelwave_gan/baker/conf/default.yaml
python utils/compute_statistics.py --metadata=dump/train/raw/metadata.jsonl --field-name="feats"
python utils/vocoder_normalize.py --metadata=dump/train/raw/metadata.jsonl --dumpdir=dump/train/norm --stats=dump/train/feats_stats.npy
python utils/vocoder_normalize.py --metadata=dump/dev/raw/metadata.jsonl --dumpdir=dump/dev/norm --stats=dump/train/feats_stats.npy
python utils/vocoder_normalize.py --metadata=dump/test/raw/metadata.jsonl --dumpdir=dump/test/norm --stats=dump/train/feats_stats.npy
fi
# 3 批量运行(如不方便批量,1,2需放到单个模型中)
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
Expand Down
4 changes: 2 additions & 2 deletions benchmark/PWGAN/run_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ function _train(){
--run-benchmark=true"

case ${run_mode} in
sp) train_cmd="python3 examples/parallelwave_gan/baker/train.py --nprocs=1 ${train_cmd}" ;;
mp) train_cmd="python3 examples/parallelwave_gan/baker/train.py --nprocs=8 ${train_cmd}"
sp) train_cmd="python3 utils/pwg_train.py --nprocs=1 ${train_cmd}" ;;
mp) train_cmd="python3 utils/pwg_train.py --nprocs=8 ${train_cmd}"
log_parse_file="mylog/workerlog.0" ;;
*) echo "choose run_mode(sp or mp)"; exit 1;
esac
Expand Down
20 changes: 12 additions & 8 deletions examples/fastspeech2/aishell3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Extract AISHELL-3.
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA result of BZNSYP and Extract it
### Get MFA result of AISHELL-3 and Extract it
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo.
### Preprocess the dataset
Expand Down Expand Up @@ -48,21 +48,25 @@ The dataset is split into 3 parts, namely `train`, `dev` and` test`, each of whi
Also there is a `metadata.jsonl` in each subfolder. It is a table-like file which contains phones, text_lengths, speech_lengths, durations, path of speech features, path of pitch features, path of energy features, speaker and id of each utterance.

## Train the model
`./run.sh` calls `Parakeet/utils/multi_spk_fs2_train.py`.
```bash
./run.sh
```
Or you can use `train.py` directly. Here's the complete help message.
Here's the complete help message.
```text
usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
[--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
[--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
[--phones-dict PHONES_DICT] [--speaker-dict SPEAKER_DICT]
usage: multi_spk_fs2_train.py [-h] [--config CONFIG]
[--train-metadata TRAIN_METADATA]
[--dev-metadata DEV_METADATA]
[--output-dir OUTPUT_DIR] [--device DEVICE]
[--nprocs NPROCS] [--verbose VERBOSE]
[--phones-dict PHONES_DICT]
[--speaker-dict SPEAKER_DICT]
Train a FastSpeech2 model with AISHELL-3 Mandrin TTS dataset.
Train a FastSpeech2 model with multiple speaker dataset.
optional arguments:
-h, --help show this help message and exit
--config CONFIG config file to overwrite default config.
--config CONFIG fastspeech2 config file.
--train-metadata TRAIN_METADATA
training data.
--dev-metadata DEV_METADATA
Expand Down
63 changes: 0 additions & 63 deletions examples/fastspeech2/aishell3/batch_fn.py

This file was deleted.

28 changes: 0 additions & 28 deletions examples/fastspeech2/aishell3/config.py

This file was deleted.

121 changes: 0 additions & 121 deletions examples/fastspeech2/aishell3/frontend.py

This file was deleted.

12 changes: 6 additions & 6 deletions examples/fastspeech2/aishell3/preprocess.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ fi
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
# extract features
echo "Extract features ..."
python3 ${MAIN_ROOT}/utils/fastspeech2_preprocess.py \
python3 ${MAIN_ROOT}/utils/fs2_preprocess.py \
--dataset=aishell3 \
--rootdir=~/datasets/data_aishell3/ \
--dumpdir=dump \
--dur-file=durations.txt \
--config-path=conf/default.yaml \
--config=conf/default.yaml \
--num-cpu=8 \
--cut-sil=True
fi
Expand All @@ -47,9 +47,9 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
fi

if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# normalize and covert phone to id, dev and test should use train's stats
# normalize and covert phone/speaker to id, dev and test should use train's stats
echo "Normalize ..."
python3 ${MAIN_ROOT}/utils/fastspeech2_normalize.py \
python3 ${MAIN_ROOT}/utils/fs2_normalize.py \
--metadata=dump/train/raw/metadata.jsonl \
--dumpdir=dump/train/norm \
--speech-stats=dump/train/speech_stats.npy \
Expand All @@ -58,7 +58,7 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
--phones-dict=dump/phone_id_map.txt \
--speaker-dict=dump/speaker_id_map.txt

python3 ${MAIN_ROOT}/utils/fastspeech2_normalize.py \
python3 ${MAIN_ROOT}/utils/fs2_normalize.py \
--metadata=dump/dev/raw/metadata.jsonl \
--dumpdir=dump/dev/norm \
--speech-stats=dump/train/speech_stats.npy \
Expand All @@ -67,7 +67,7 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
--phones-dict=dump/phone_id_map.txt \
--speaker-dict=dump/speaker_id_map.txt

python3 ${MAIN_ROOT}/utils/fastspeech2_normalize.py \
python3 ${MAIN_ROOT}/utils/fs2_normalize.py \
--metadata=dump/test/raw/metadata.jsonl \
--dumpdir=dump/test/norm \
--speech-stats=dump/train/speech_stats.npy \
Expand Down
4 changes: 3 additions & 1 deletion examples/fastspeech2/aishell3/run.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/bin/bash

python3 train.py \
export MAIN_ROOT=`realpath ${PWD}/../../../`

python3 ${MAIN_ROOT}/utils/multi_spk_fs2_train.py \
--train-metadata=dump/train/norm/metadata.jsonl \
--dev-metadata=dump/dev/norm/metadata.jsonl \
--config=conf/default.yaml \
Expand Down
4 changes: 3 additions & 1 deletion examples/fastspeech2/aishell3/synthesize.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/bin/bash

python3 synthesize.py \
export MAIN_ROOT=`realpath ${PWD}/../../../`

python3 ${MAIN_ROOT}/utils/multi_spk_fs2_pwg_syn.py \
--fastspeech2-config=conf/default.yaml \
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
--fastspeech2-stat=dump/train/speech_stats.npy \
Expand Down
7 changes: 3 additions & 4 deletions examples/fastspeech2/aishell3/synthesize_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,13 @@
import paddle
import soundfile as sf
import yaml
from yacs.config import CfgNode
from parakeet.frontend.cn_frontend import Frontend
from parakeet.models.fastspeech2 import FastSpeech2
from parakeet.models.fastspeech2 import FastSpeech2Inference
from parakeet.models.parallel_wavegan import PWGGenerator
from parakeet.models.parallel_wavegan import PWGInference
from parakeet.modules.normalizer import ZScore

from frontend import Frontend
from yacs.config import CfgNode


def evaluate(args, fastspeech2_config, pwg_config):
Expand Down Expand Up @@ -67,7 +66,7 @@ def evaluate(args, fastspeech2_config, pwg_config):
vocoder.eval()
print("model done!")

frontend = Frontend(args.phones_dict)
frontend = Frontend(phone_vocab_path=args.phones_dict)
print("frontend done!")

stat = np.load(args.fastspeech2_stat)
Expand Down
Loading

0 comments on commit 49caf86

Please sign in to comment.