You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with audio data using seamless_communication for audio generation. Since not all indic languages have text to speech support, I am trying to finetune it using Google/Fleur audio dataset. But I am getting the AssertionError due to text to units previous output tokens missing. I am unable to identify the cause of this as I have followed all the steps as per instruction but still facing this issue.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear Team,
I am working with audio data using seamless_communication for audio generation. Since not all indic languages have text to speech support, I am trying to finetune it using Google/Fleur audio dataset. But I am getting the AssertionError due to text to units previous output tokens missing. I am unable to identify the cause of this as I have followed all the steps as per instruction but still facing this issue.
Here is the command I gave for finetuning.
m4t_finetune
--train_dataset /home/jupyter/myfiles/fleurs/train/train_manifest.json
--eval_dataset /home/jupyter/myfiles/fleurs/validation/validation_manifest.json
--batch_size 10
--eval_steps 1000
--max_epochs 100
--learning_rate 0.00005
--patience 10
--save_model_to /home/jupyter/myfiles/checkpoints/tam_tts_m4tL.pt
--model_name seamlessM4T_large
--mode TEXT_TO_SPEECH
here is a sample from json file
{"source": {"id": 1792, "lang": "tam", "text": "\u0b87\u0ba4\u0bc1 \u0bb5\u0bc7\u0ba4\u0bbf\u0baf\u0bbf\u0baf\u0bb2\u0bcd ph \u0b8e\u0ba9 \u0b85\u0bb4\u0bc8\u0b95\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0b95\u0bbf\u0bb1\u0ba4\u0bc1 \u0ba8\u0bc0\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0b9a\u0bbf\u0bb5\u0baa\u0bcd\u0baa\u0bc1 \u0bae\u0bc1\u0b9f\u0bcd\u0b9f\u0bc8\u0b95\u0bcd\u0b95\u0bcb\u0bb8\u0bcd \u0b9a\u0bbe\u0bb1\u0bcd\u0bb1\u0bc8\u0baa\u0bcd \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf \u0b92\u0bb0\u0bc1 \u0b95\u0bc1\u0bb1\u0bbf\u0b95\u0bbe\u0b9f\u0bcd\u0b9f\u0bbf\u0baf\u0bc8 \u0b89\u0bb0\u0bc1\u0bb5\u0bbe\u0b95\u0bcd\u0b95\u0bb2\u0bbe\u0bae\u0bcd", "audio_local_path": "/home/jupyter/myfiles/fleurs/test/downloads/extracted/23ec8bcb7d5dc3059117590b1533c8fd881f6980496e2a150032d65e64a13401/test/10015420708072669120.wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1792, "lang": "tam", "text": "\u0b87\u0ba4\u0bc1 \u0bb5\u0bc7\u0ba4\u0bbf\u0baf\u0bbf\u0baf\u0bb2\u0bcd ph \u0b8e\u0ba9 \u0b85\u0bb4\u0bc8\u0b95\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0b9f\u0bc1\u0b95\u0bbf\u0bb1\u0ba4\u0bc1 \u0ba8\u0bc0\u0b99\u0bcd\u0b95\u0bb3\u0bcd \u0b9a\u0bbf\u0bb5\u0baa\u0bcd\u0baa\u0bc1 \u0bae\u0bc1\u0b9f\u0bcd\u0b9f\u0bc8\u0b95\u0bcd\u0b95\u0bcb\u0bb8\u0bcd \u0b9a\u0bbe\u0bb1\u0bcd\u0bb1\u0bc8\u0baa\u0bcd \u0baa\u0baf\u0ba9\u0bcd\u0baa\u0b9f\u0bc1\u0ba4\u0bcd\u0ba4\u0bbf \u0b92\u0bb0\u0bc1 \u0b95\u0bc1\u0bb1\u0bbf\u0b95\u0bbe\u0b9f\u0bcd\u0b9f\u0bbf\u0baf\u0bc8 \u0b89\u0bb0\u0bc1\u0bb5\u0bbe\u0b95\u0bcd\u0b95\u0bb2\u0bbe\u0bae\u0bcd", "audio_local_path": "/home/jupyter/myfiles/fleurs/test/downloads/extracted/23ec8bcb7d5dc3059117590b1533c8fd881f6980496e2a150032d65e64a13401/test/9348375609893059714.wav", "waveform": null, "sampling_rate": 16000, "units": null}}
Beta Was this translation helpful? Give feedback.
All reactions