Fix sil (#22)

* Update infore dataset * Use new textgrid data. - Update download url and hash. - Use sil instead of sp. - Normalize audio to match hifigan preprocessing. - Random dropout of tokens when training duration model to prevent overfitting. * Load phoneme set from config instead from lexicon file. This keeps the phoneme set unchanged even if the dataset or the lexicon file changed. * use `jax.tree_map` instead of `jax.tree_multimap`. * Better log file names * Remove colab links in notebooks * Fix `zero_silence_segments` script. * Update pretrained models
NTT123 · May 16, 2022 · 8d2ee1f · longvndt602 · Dec 4, 2023 · 8d2ee1f
1 parent 07a5d8a
commit 8d2ee1f
Show file tree

Hide file tree

Showing 18 changed files with 8,346 additions and 4,454 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 A Vietnamese TTS
 ================
 
-Tacotron + HiFiGAN vocoder for vietnamese datasets.
+Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
 
 Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
 
@@ -32,12 +32,13 @@ Download InfoRe dataset
 -----------------------
 
 ```sh
-bash ./scripts/download_aligned_infore_dataset.sh
+python ./scripts/download_aligned_infore_dataset.py
 ```
 
 **Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
 
-The Montreal Forced Aligner (MFA) is used to align transcript and speech (textgrid files). [Here](https://colab.research.google.com/gist/NTT123/c99b5a391af56e0cb8f7b190d3d7f0ee/infore-mfa-example.ipynb) is a Colab notebook to align InfoRe dataset. Visit [MFA](https://montreal-forced-aligner.readthedocs.io/en/latest/) for more information on how to create textgrid files.
+See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files). 
+See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
 
 Train duration model
 --------------------

diff --git a/assets/infore/clip.wav b/assets/infore/clip.wav