Replies: 1 comment
-
Let's see...
This looks rather similar to the problem in #5181, so I suspect that the fix is simply to transpose your synthesized mels when you save them, and before training. The As for WaveGlow training, is there any additional error message? Are you on the latest version of NeMo? Paths are being updated so there may be some mismatch. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I tried to use https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/tts/generate_mels.py but it works only for fastpitch.
Event If I tried remove unsupported steps:
spec_model.preprocessor(input_signal=audio, length=audio_len)
if spec_model.fastpitch.speaker_emb is not None and "speaker" in r:
If I just generate spectrogram with method spec_model.generate_spectrogram(tokens=text) it does not work for hifigan vocoder and generate this error on start training hifigan:
As I understood it is because difference in alignments between mels from wavs files and mels generated, we need to use ground truth alignment for correct matching between audio and mels. I found method forward for tacotron2 model, but don't know what audio and audio_len params means, because there are no any examples found in the internet, and by code it is too complicated to find out. may audio is blob, and audio_len is seconds, ms or size in bytes?
and for waveglow vocoder nothing works, I tried to generate mels by fastpitch, that works for hifigan vocoder, but waveglow always give me an error:
example of my manifest.json
head train_manifest_ft.json
I tried absolute paths, but it has no affects.
Thanks for help
Beta Was this translation helpful? Give feedback.
All reactions