Merge pull request #191 from yt605155624/refactor

refactor of examples
PaddlePaddle · Oct 12, 2021 · 4f97ff2 · 4f97ff2
2 parents 49caf86 + a603924
commit 4f97ff2
Show file tree

Hide file tree

Showing 103 changed files with 540 additions and 1,064 deletions.
diff --git a/README.md b/README.md
@@ -7,12 +7,14 @@ Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-spee
 
 ## News  <img src="./docs/images/news_icon.png" width="40"/>
 
+- Oct-12-2021, Parallel WaveGAN with LJSpeech. Check [examples/GANVocoder/parallelwave_gan/ljspeech](./examples/GANVocoder/parallelwave_gan/ljspeech).
+- Oct-12-2021, FastSpeech2/FastPitch with LJSpeech. Check [examples/fastspeech2/ljspeech](./examples/fastspeech2/ljspeech).
 - Sep-14-2021, Reconstruction of TransformerTTS. Check [examples/transformer_tts/ljspeech](./examples/transformer_tts/ljspeech).
 - Aug-31-2021, Chinese Text Frontend. Check [examples/text_frontend](./examples/text_frontend).
 - Aug-23-2021, FastSpeech2/FastPitch with AISHELL-3. Check [examples/fastspeech2/aishell3](./examples/fastspeech2/aishell3).
 - Aug-03-2021, FastSpeech2/FastPitch with CSMSC. Check [examples/fastspeech2/baker](./examples/fastspeech2/baker).
 - Jul-19-2021, SpeedySpeech with CSMSC. Check [examples/speedyspeech/baker](./examples/speedyspeech/baker).
-- Jul-01-2021, Parallel WaveGAN with CSMSC. Check [examples/parallelwave_gan/baker](./examples/parallelwave_gan/baker).
+- Jul-01-2021, Parallel WaveGAN with CSMSC. Check [examples/GANVocoder/parallelwave_gan/baker](./examples/GANVocoder/parallelwave_gan/baker).
 - Jul-01-2021, Montreal-Forced-Aligner. Check  [examples/use_mfa](./examples/use_mfa).
 - May-07-2021, Voice Cloning in Chinese. Check [examples/tacotron2_aishell3](./examples/tacotron2_aishell3).
 
@@ -68,7 +70,7 @@ Entries to the introduction, and the launch of training and synthsis for differe
 - [>>> Chinese Text Frontend](./examples/text_frontend)
 - [>>> FastSpeech2/FastPitch](./examples/fastspeech2)
 - [>>> Montreal-Forced-Aligner](./examples/use_mfa)
-- [>>> Parallel WaveGAN](./examples/parallelwave_gan)
+- [>>> Parallel WaveGAN](./examples/GANVocoder/parallelwave_gan)
 - [>>> SpeedySpeech](./examples/speedyspeech)
 - [>>> Tacotron2_AISHELL3](./examples/tacotron2_aishell3)
 - [>>> GE2E](./examples/ge2e)
@@ -87,9 +89,10 @@ Check our [website](https://paddleparakeet.readthedocs.io/en/latest/demo.html) f
 #### FastSpeech2/FastPitch
 1. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
 2. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
+3. [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
 
 #### SpeedySpeech
-1. [speedyspeech_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_baker_ckpt_0.4.zip)
+1. [speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)
 
 #### TransformerTTS
 
@@ -109,6 +112,7 @@ Check our [website](https://paddleparakeet.readthedocs.io/en/latest/demo.html) f
 #### Parallel WaveGAN
 
 1. [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip)
+2. [pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_ljspeech_ckpt_0.5.zip)
 
 ### Voice Cloning
 

diff --git a/examples/GANVocoder/README.md b/examples/GANVocoder/README.md
@@ -0,0 +1 @@
+different GAN Vocoders have the same preprocess.py and normalize.py
diff --git a/utils/vocoder_normalize.py → examples/GANVocoder/normalize.py b/utils/vocoder_normalize.py → examples/GANVocoder/normalize.py
diff --git a/examples/parallelwave_gan/baker/README.md → ...NVocoder/parallelwave_gan/baker/README.md b/examples/parallelwave_gan/baker/README.md → ...NVocoder/parallelwave_gan/baker/README.md
@@ -37,19 +37,19 @@ Also there is a `metadata.jsonl` in each subfolder. It is a table-like file whic
 
 ## Train the model
 
-`./run.sh` calls `Parakeet/utils/pwg_train.py`.
+`./run.sh` calls `../train.py`.
 ```bash
 ./run.sh
 ```
 Here's the complete help message.
 
 ```text
-usage: pwg_train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
-                    [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
-                    [--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
-                    [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
-                    [--run-benchmark RUN_BENCHMARK]
-                    [--profiler_options PROFILER_OPTIONS]
+usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
+                [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
+                [--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
+                [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
+                [--run-benchmark RUN_BENCHMARK]
+                [--profiler_options PROFILER_OPTIONS]
 
 Train a ParallelWaveGAN model.
 
@@ -102,14 +102,14 @@ pwg_baker_ckpt_0.4
 
 ## Synthesize
 
-`synthesize.sh` calls `Parakeet/utils/pwg_syn.py `, which can synthesize waveform from `metadata.jsonl`.
+`synthesize.sh` calls `../synthesize.py `, which can synthesize waveform from `metadata.jsonl`.
 ```bash
 ./synthesize.sh
 ```
 ```text
-usage: pwg_syn.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
-                  [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
-                  [--device DEVICE] [--verbose VERBOSE]
+usage: synthesize.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
+                     [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
+                     [--device DEVICE] [--verbose VERBOSE]
 
 Synthesize with parallel wavegan.
 

diff --git a/.../parallelwave_gan/baker/conf/default.yaml → .../parallelwave_gan/baker/conf/default.yaml b/.../parallelwave_gan/baker/conf/default.yaml → .../parallelwave_gan/baker/conf/default.yaml
@@ -15,10 +15,7 @@ window: "hann"           # Window function.
 n_mels: 80               # Number of mel basis.
 fmin: 80                 # Minimum freq in mel basis calculation. (Hz)
 fmax: 7600               # Maximum frequency in mel basis calculation. (Hz)
-trim_silence: false      # Whether to trim the start and end of silence.
-top_db: 60               # Need to tune carefully if the recording is not good.
-trim_frame_length: 2048    # Frame size in trimming. (in samples)
-trim_hop_length: 512       # Hop size in trimming. (in samples)
+
 
 ###########################################################
 #         GENERATOR NETWORK ARCHITECTURE SETTING          #

diff --git a/...ples/parallelwave_gan/baker/preprocess.sh → ...oder/parallelwave_gan/baker/preprocess.sh b/...ples/parallelwave_gan/baker/preprocess.sh → ...oder/parallelwave_gan/baker/preprocess.sh
@@ -3,24 +3,20 @@
 stage=0
 stop_stage=100
 
-fs=24000
-n_shift=300
-
-export MAIN_ROOT=`realpath ${PWD}/../../../`
+export MAIN_ROOT=`realpath ${PWD}/../../../../`
 
 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
     # get durations from MFA's result
     echo "Generate durations.txt from MFA results ..."
     python3 ${MAIN_ROOT}/utils/gen_duration_from_textgrid.py \
         --inputdir=./baker_alignment_tone \
         --output=durations.txt \
-        --sample-rate=${fs} \
-        --n-shift=${n_shift}
+        --config=conf/default.yaml
 fi
 
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
     echo "Extract features ..."
-    python3 ${MAIN_ROOT}/utils/vocoder_preprocess.py \
+    python3 ../../preprocess.py \
         --rootdir=~/datasets/BZNSYP/ \
         --dataset=baker \
         --dumpdir=dump \
@@ -42,16 +38,16 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
     # normalize, dev and test should use train's stats
     echo "Normalize ..."
 
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/train/raw/metadata.jsonl \
         --dumpdir=dump/train/norm \
         --stats=dump/train/feats_stats.npy
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/dev/raw/metadata.jsonl \
         --dumpdir=dump/dev/norm \
         --stats=dump/train/feats_stats.npy
 
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/test/raw/metadata.jsonl \
         --dumpdir=dump/test/norm \
         --stats=dump/train/feats_stats.npy

diff --git a/examples/parallelwave_gan/baker/run.sh → .../GANVocoder/parallelwave_gan/baker/run.sh b/examples/parallelwave_gan/baker/run.sh → .../GANVocoder/parallelwave_gan/baker/run.sh
@@ -1,10 +1,8 @@
 #!/bin/bash
 
-export MAIN_ROOT=`realpath ${PWD}/../../../`
-
 FLAGS_cudnn_exhaustive_search=true \
 FLAGS_conv_workspace_size_limit=4000 \
-python ${MAIN_ROOT}/utils/pwg_train.py \
+python ../train.py \
     --train-metadata=dump/train/norm/metadata.jsonl \
     --dev-metadata=dump/dev/norm/metadata.jsonl \
     --config=conf/default.yaml \

diff --git a/...ples/parallelwave_gan/baker/synthesize.sh → ...oder/parallelwave_gan/baker/synthesize.sh b/...ples/parallelwave_gan/baker/synthesize.sh → ...oder/parallelwave_gan/baker/synthesize.sh
@@ -1,8 +1,6 @@
 #!/bin/bash
 
-export MAIN_ROOT=`realpath ${PWD}/../../../`
-
-python3 ${MAIN_ROOT}/utils/pwg_syn.py \
+python3 ../synthesize.py \
   --config=conf/default.yaml \
   --checkpoint=exp/default/checkpoints/snapshot_iter_400000.pdz\
   --test-metadata=dump/test/norm/metadata.jsonl \

diff --git a/...llelwave_gan/baker/synthesize_from_wav.py → ...llelwave_gan/baker/synthesize_from_wav.py b/...llelwave_gan/baker/synthesize_from_wav.py → ...llelwave_gan/baker/synthesize_from_wav.py
@@ -76,7 +76,8 @@ def evaluate(args, config):
         # extract mel feats
         mel = mel_extractor.get_log_mel_fbank(wav)
         mel = paddle.to_tensor(mel)
-        gen_wav = pwg_inference(mel)
+        with paddle.no_grad():
+            gen_wav = pwg_inference(mel)
         sf.write(
             str(output_dir / ("gen_" + utt_name)),
             gen_wav.numpy(),

diff --git a/examples/parallelwave_gan/ljspeech/README.md → ...coder/parallelwave_gan/ljspeech/README.md b/examples/parallelwave_gan/ljspeech/README.md → ...coder/parallelwave_gan/ljspeech/README.md
@@ -39,19 +39,19 @@ The dataset is split into 3 parts, namely `train`, `dev` and `test`, each of whi
 Also there is a `metadata.jsonl` in each subfolder. It is a table-like file which contains id and paths to spectrogam of each utterance.
 
 ## Train the model
-`./run.sh` calls `Parakeet/utils/pwg_train.py`.
+`./run.sh` calls `../train.py`.
 ```bash
 ./run.sh
 ```
 Here's the complete help message.
 
 ```text
-usage: pwg_train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
-                    [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
-                    [--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
-                    [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
-                    [--run-benchmark RUN_BENCHMARK]
-                    [--profiler_options PROFILER_OPTIONS]
+usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
+                     [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
+                     [--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
+                     [--batch-size BATCH_SIZE] [--max-iter MAX_ITER]
+                     [--run-benchmark RUN_BENCHMARK]
+                     [--profiler_options PROFILER_OPTIONS]
 
 Train a ParallelWaveGAN model.
 
@@ -102,14 +102,14 @@ pwg_ljspeech_ckpt_0.5
 ```
 
 ## Synthesize
-`synthesize.sh` calls `Parakeet/utils/pwg_syn.py `, which can synthesize waveform from `metadata.jsonl`.
+`synthesize.sh` calls `../synthesize.py `, which can synthesize waveform from `metadata.jsonl`.
 ```bash
 ./synthesize.sh
 ```
 ```text
-usage: pwg_syn.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
-                  [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
-                  [--device DEVICE] [--verbose VERBOSE]
+usage: synthesize.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
+                     [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
+                     [--device DEVICE] [--verbose VERBOSE]
 
 Synthesize with parallel wavegan.
 

diff --git a/...rallelwave_gan/ljspeech/conf/default.yaml → ...rallelwave_gan/ljspeech/conf/default.yaml b/...rallelwave_gan/ljspeech/conf/default.yaml → ...rallelwave_gan/ljspeech/conf/default.yaml
@@ -15,10 +15,6 @@ window: "hann"           # Window function.
 n_mels: 80               # Number of mel basis.
 fmin: 80                 # Minimum freq in mel basis calculation. (Hz)
 fmax: 7600               # Maximum frequency in mel basis calculation. (Hz)
-trim_silence: false      # Whether to trim the start and end of silence.
-top_db: 60               # Need to tune carefully if the recording is not good.
-trim_frame_length: 2048    # Frame size in trimming. (in samples)
-trim_hop_length: 512       # Hop size in trimming. (in samples)
 
 ###########################################################
 #         GENERATOR NETWORK ARCHITECTURE SETTING          #

diff --git a/...s/parallelwave_gan/ljspeech/preprocess.sh → ...r/parallelwave_gan/ljspeech/preprocess.sh b/...s/parallelwave_gan/ljspeech/preprocess.sh → ...r/parallelwave_gan/ljspeech/preprocess.sh
@@ -3,25 +3,21 @@
 stage=0
 stop_stage=100
 
-fs=22050
-n_shift=256
-
-export MAIN_ROOT=`realpath ${PWD}/../../../`
+export MAIN_ROOT=`realpath ${PWD}/../../../../`
 
 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
     # get durations from MFA's result
     echo "Generate durations.txt from MFA results ..."
     python3 ${MAIN_ROOT}/utils/gen_duration_from_textgrid.py \
         --inputdir=./ljspeech_alignment \
         --output=durations.txt \
-        --sample-rate=${fs} \
-        --n-shift=${n_shift}
+        --config=conf/default.yaml
 fi
 
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
     # extract features
     echo "Extract features ..."
-    python3 ${MAIN_ROOT}/utils/vocoder_preprocess.py \
+    python3 ../../preprocess.py \
         --rootdir=~/datasets/LJSpeech-1.1/ \
         --dataset=ljspeech \
         --dumpdir=dump \
@@ -43,16 +39,16 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
     # normalize, dev and test should use train's stats
     echo "Normalize ..."
 
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/train/raw/metadata.jsonl \
         --dumpdir=dump/train/norm \
         --stats=dump/train/feats_stats.npy
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/dev/raw/metadata.jsonl \
         --dumpdir=dump/dev/norm \
         --stats=dump/train/feats_stats.npy
 
-    python3 ${MAIN_ROOT}/utils/vocoder_normalize.py \
+    python3 ../../normalize.py \
         --metadata=dump/test/raw/metadata.jsonl \
         --dumpdir=dump/test/norm \
         --stats=dump/train/feats_stats.npy

diff --git a/examples/parallelwave_gan/ljspeech/run.sh → ...NVocoder/parallelwave_gan/ljspeech/run.sh b/examples/parallelwave_gan/ljspeech/run.sh → ...NVocoder/parallelwave_gan/ljspeech/run.sh
@@ -1,10 +1,8 @@
 #!/bin/bash
 
-export MAIN_ROOT=`realpath ${PWD}/../../../`
-
 FLAGS_cudnn_exhaustive_search=true \
 FLAGS_conv_workspace_size_limit=4000 \
-python ${MAIN_ROOT}/utils/pwg_train.py \
+python ../train.py \
     --train-metadata=dump/train/norm/metadata.jsonl \
     --dev-metadata=dump/dev/norm/metadata.jsonl \
     --config=conf/default.yaml \

diff --git a/...s/parallelwave_gan/ljspeech/synthesize.sh → ...r/parallelwave_gan/ljspeech/synthesize.sh b/...s/parallelwave_gan/ljspeech/synthesize.sh → ...r/parallelwave_gan/ljspeech/synthesize.sh
@@ -1,8 +1,6 @@
 #!/bin/bash
 
-export MAIN_ROOT=`realpath ${PWD}/../../../`
-
-python3 ${MAIN_ROOT}/utils/pwg_syn.py \
+python3 ../synthesize.py \
   --config=conf/default.yaml \
   --checkpoint=exp/default/checkpoints/snapshot_iter_400000.pdz\
   --test-metadata=dump/test/norm/metadata.jsonl \

diff --git a/utils/pwg_syn.py → ...GANVocoder/parallelwave_gan/synthesize.py b/utils/pwg_syn.py → ...GANVocoder/parallelwave_gan/synthesize.py
@@ -80,7 +80,8 @@ def main():
         mel = example['feats']
         mel = paddle.to_tensor(mel)  # (T, C)
         with timer() as t:
-            wav = generator.inference(c=mel)
+            with paddle.no_grad():
+                wav = generator.inference(c=mel)
             wav = wav.numpy()
             N += wav.size
             T += t.elapse

diff --git a/utils/pwg_train.py → ...ples/GANVocoder/parallelwave_gan/train.py b/utils/pwg_train.py → ...ples/GANVocoder/parallelwave_gan/train.py
@@ -32,11 +32,11 @@
 from parakeet.datasets.vocoder_batch_fn import Clip
 from parakeet.models.parallel_wavegan import PWGGenerator
 from parakeet.models.parallel_wavegan import PWGDiscriminator
+from parakeet.models.parallel_wavegan import PWGUpdater
+from parakeet.models.parallel_wavegan import PWGEvaluator
 from parakeet.modules.stft_loss import MultiResolutionSTFTLoss
 from parakeet.training.extensions.snapshot import Snapshot
 from parakeet.training.extensions.visualizer import VisualDL
-from parakeet.training.pwg_updater import PWGUpdater
-from parakeet.training.pwg_updater import PWGEvaluator
 from parakeet.training.seeding import seed_everything
 from parakeet.training.trainer import Trainer
 from pathlib import Path

diff --git a/utils/vocoder_preprocess.py → examples/GANVocoder/preprocess.py b/utils/vocoder_preprocess.py → examples/GANVocoder/preprocess.py
@@ -132,14 +132,6 @@ def process_sentence(config: Dict[str, Any],
             start, end = librosa.time_to_samples([start, end], sr=config.fs)
             y = y[start:end]
 
-        # energy based silence trimming
-        if config.trim_silence:
-            y, _ = librosa.effects.trim(
-                y,
-                top_db=config.top_db,
-                frame_length=config.trim_frame_length,
-                hop_length=config.trim_hop_length)
-
         # extract mel feats
         logmel = mel_extractor.get_log_mel_fbank(y)
 

diff --git a/examples/fastspeech2/aishell3/README.md b/examples/fastspeech2/aishell3/README.md
@@ -48,21 +48,18 @@ The dataset is split into 3 parts, namely `train`, `dev` and` test`, each of whi
 Also there is a `metadata.jsonl` in each subfolder. It is a table-like file which contains phones, text_lengths, speech_lengths, durations, path of speech features, path of pitch features, path of energy features, speaker and id of each utterance.
 
 ## Train the model
-`./run.sh` calls `Parakeet/utils/multi_spk_fs2_train.py`.
+`./run.sh` calls `../train.py`.
 ```bash
 ./run.sh
 ```
 Here's the complete help message.
 ```text
-usage: multi_spk_fs2_train.py [-h] [--config CONFIG]
-                              [--train-metadata TRAIN_METADATA]
-                              [--dev-metadata DEV_METADATA]
-                              [--output-dir OUTPUT_DIR] [--device DEVICE]
-                              [--nprocs NPROCS] [--verbose VERBOSE]
-                              [--phones-dict PHONES_DICT]
-                              [--speaker-dict SPEAKER_DICT]
+usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
+                [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
+                [--device DEVICE] [--nprocs NPROCS] [--verbose VERBOSE]
+                [--phones-dict PHONES_DICT] [--speaker-dict SPEAKER_DICT]
 
-Train a FastSpeech2 model with multiple speaker dataset.
+Train a FastSpeech2 model.
 
 optional arguments:
   -h, --help            show this help message and exit
@@ -79,7 +76,7 @@ optional arguments:
   --phones-dict PHONES_DICT
                         phone vocabulary file.
   --speaker-dict SPEAKER_DICT
-                        speaker id map file.
+                        speaker id map file for multiple speaker model.
 ```
 1. `--config` is a config file in yaml format to overwrite the default config, which can be found at `conf/default.yaml`.
 2. `--train-metadata` and `--dev-metadata` should be the metadata file in the normalized subfolder of `train` and `dev` in the `dump` folder.
@@ -148,7 +145,7 @@ optional arguments:
   --phones-dict PHONES_DICT
                         phone vocabulary file.
   --speaker-dict SPEAKER_DICT
-                        speaker id map file.
+                        speaker id map file for multiple speaker model.
   --test-metadata TEST_METADATA
                         test metadata.
   --output-dir OUTPUT_DIR
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		different GAN Vocoders have the same preprocess.py and normalize.py