You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the demo code provided by Coqui to create a simple dataset and fine-tune a model using Gradio. However, when I load the model and perform inference, the output audio is heavily distorted, resembling the sound of a hair shaving machine.
Followed the instructions to create a simple dataset using the demo code.
Fine-Tune Model:
Used the Gradio interface as provided in the demo to fine-tune the model.
Load Model and Inference:
Loaded the fine-tuned model.
Create a simple dataset, fine-tune and performed inference using the Gradio interface with the following setup:
py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py
The model should produce a clear and intelligible speech output corresponding to the input text.
Actual Result:
The output audio is distorted and unintelligible. You can hear the output here: Distorted Audio Output.
Additional Information:
I verified that CUDA and the NVIDIA drivers are correctly installed and operational.
The nvidia-smi command confirms that the GPU is recognized and utilized by the system.
Other models and libraries utilizing CUDA work as expected.
Logs and Error Messages:
No explicit error messages were encountered during the execution. The process completes without any exceptions.
Request:
Could you please provide guidance on how to resolve this issue or if there are any specific configurations required to avoid such distortion in the output?
Thank you for your assistance.
To Reproduce
py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py
Expected behavior
No response
Logs
No response
Environment
- Operating System: Window 11
- Python Version: 3.10.4
- CUDA Version: 11.5
- PyTorch Version: 1.11.0+cu115
- coqui-ai Version: Last Update on github
Additional context
No response
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
I followed the demo code provided by Coqui to create a simple dataset and fine-tune a model using Gradio. However, when I load the model and perform inference, the output audio is heavily distorted, resembling the sound of a hair shaving machine.
You can listen to the output at the following link: Distorted Audio Output.
Steps to Reproduce:
Create Dataset:
Followed the instructions to create a simple dataset using the demo code.
Fine-Tune Model:
Used the Gradio interface as provided in the demo to fine-tune the model.
Load Model and Inference:
Loaded the fine-tuned model.
Create a simple dataset, fine-tune and performed inference using the Gradio interface with the following setup:
py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py
The model should produce a clear and intelligible speech output corresponding to the input text.
Actual Result:
The output audio is distorted and unintelligible. You can hear the output here: Distorted Audio Output.
Additional Information:
I verified that CUDA and the NVIDIA drivers are correctly installed and operational.
The nvidia-smi command confirms that the GPU is recognized and utilized by the system.
Other models and libraries utilizing CUDA work as expected.
Logs and Error Messages:
No explicit error messages were encountered during the execution. The process completes without any exceptions.
Request:
Could you please provide guidance on how to resolve this issue or if there are any specific configurations required to avoid such distortion in the output?
Thank you for your assistance.
To Reproduce
py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py
Expected behavior
No response
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: