Cannot generate Mel spectrograms by Tacotron2 model for waveglow and hifigan vocoders #6188

pivolan · 2023-03-14T00:33:37Z

pivolan
Mar 14, 2023

I tried to use https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/tts/generate_mels.py but it works only for fastpitch.
Event If I tried remove unsupported steps:

spec_model.preprocessor(input_signal=audio, length=audio_len)

if spec_model.fastpitch.speaker_emb is not None and "speaker" in r:

spectrogram = spec_model.forward(
                text=text,
                input_lens=text_len,
                spec=spect,
                mel_lens=spect_len,
                attn_prior=attn_prior,
                speaker=speaker,
            )[0]

If I just generate spectrogram with method spec_model.generate_spectrogram(tokens=text) it does not work for hifigan vocoder and generate this error on start training hifigan:

ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.9/site-packages/nemo/collections/tts/torch/data.py", line 1006, in __getitem__
    start = random.randint(0, mel.shape[1] - frames - 1)
  File "/opt/conda/lib/python3.9/random.py", line 338, in randint
    return self.randrange(a, b+1)
  File "/opt/conda/lib/python3.9/random.py", line 316, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, -178, -178)

As I understood it is because difference in alignments between mels from wavs files and mels generated, we need to use ground truth alignment for correct matching between audio and mels. I found method forward for tacotron2 model, but don't know what audio and audio_len params means, because there are no any examples found in the internet, and by code it is too complicated to find out. may audio is blob, and audio_len is seconds, ms or size in bytes?

and for waveglow vocoder nothing works, I tried to generate mels by fastpitch, that works for hifigan vocoder, but waveglow always give me an error:

root@33df3b9ba8bf:/content/internal/nemoFlemishTTS# python ./waveglow.py --config-path . --config-name=waveglow.yaml model.train_ds.dataloader_params.batch_size=32 +trainer.max_epochs=500 model.optim.lr=0.00001 train_dataset=/content/internal/dataFlemishTTS/VRT/synmels_vocoder/hifigan_train_manifest_ft.json validation_datasets=/content/internal/dataFlemishTTS/VRT/synmels_vocoder/hifigan_val_manifest_ft.json +init_from_nemo_model=tts_waveglow.nemo
[NeMo W 2023-03-14 00:14:12 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-03-14 00:14:13 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-03-14 00:14:13 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-03-14 00:14:13 nemo_logging:349] /opt/conda/lib/python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
    See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(

Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo I 2023-03-14 00:14:13 exp_manager:343] Experiments will be logged at /content/internal/nemoFlemishTTS/nemo_experiments/WaveGlow/2023-03-14_00-14-13
[NeMo I 2023-03-14 00:14:13 exp_manager:718] TensorboardLogger has been set up
[NeMo W 2023-03-14 00:14:13 exp_manager:988] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 25 epochs to ensure that checkpointing will not error out.
Error executing job with overrides: ['model.train_ds.dataloader_params.batch_size=32', '+trainer.max_epochs=500', 'model.optim.lr=0.00001', 'train_dataset=/content/internal/dataFlemishTTS/VRT/synmels_vocoder/hifigan_train_manifest_ft.json', 'validation_datasets=/content/internal/dataFlemishTTS/VRT/synmels_vocoder/hifigan_val_manifest_ft.json', '+init_from_nemo_model=tts_waveglow.nemo']
Error locating target 'nemo.collections.tts.data.tts_dataset.VocoderDataset', see chained exception above.
full_key: train_ds.dataset

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

example of my manifest.json
head train_manifest_ft.json

{"audio_filepath": "/content/internal/dataFlemishTTS/VRT/wavs/501.wav", "duration": 3.561995, "text": "Ik zal thuis nooit eenzelfde gerecht klaarmaken.", "normalized_text": "Ik zal thuis nooit eenzelfde gerecht klaarmaken.", "mel_filepath": "/content/internal/nemoFlemishTTS/synmels_vrt/train_manifest/mel_0.npy"}

I tried absolute paths, but it has no affects.

Thanks for help

redoctopus · 2023-03-22T20:31:30Z

redoctopus
Mar 22, 2023
Collaborator

Let's see...

ValueError: empty range for randrange() (0, -178, -178)

This looks rather similar to the problem in #5181, so I suspect that the fix is simply to transpose your synthesized mels when you save them, and before training. The audio and audio_len params are pretty consistent across NeMo TTS models: the former is the audio data, and the latter is the length (usually in frames) of that data.

As for WaveGlow training, is there any additional error message? Are you on the latest version of NeMo? Paths are being updated so there may be some mismatch.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot generate Mel spectrograms by Tacotron2 model for waveglow and hifigan vocoders #6188

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Cannot generate Mel spectrograms by Tacotron2 model for waveglow and hifigan vocoders #6188

pivolan Mar 14, 2023

Replies: 1 comment

redoctopus Mar 22, 2023 Collaborator

pivolan
Mar 14, 2023

redoctopus
Mar 22, 2023
Collaborator