Train Tacotron2 model with Japanese #3980
Unanswered
easyautoml
asked this question in
General Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I’m currently training a Tacotron2 model using the Kokoro-speech-tiny dataset, which contains approximately 6,000 samples and 14 hours of audio.
I trained the model for 100 epochs, which took me two days, but the output from the trained model only produces noise.
I checked the tokenized data and the mel-spectrograms used as input for the model. The mel-spectrograms seem fine because they convert well into audio when using a vocoder.
However, from TensorBoard, I noticed that the average alignment error remains high and hasn’t improved throughout the epochs.
Here are my settings. Could you please provide some advice on how I can improve the model to generate intelligible speech?
Beta Was this translation helpful? Give feedback.
All reactions