[Bug] tts_with_vc_to_file uses cuda even though "cpu" is specified as device #3787

ChristianRomberg · 2024-06-12T16:37:37Z

Describe the bug

The model tries to use CUDA, even though I specified it to use the CPU.
Because I have one of those computers with a soldered on GPU with only 4GB of memory, that is bound to crash and burn as soon as it tries to allocate some memory.

To Reproduce

from TTS.api import TTS

tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to("cpu")
tts.tts_with_vc_to_file(
speaker_wav="my_speaker.wav",
text="Hallo! Das ist ein Test",
file_path="output.wav")

Expected behavior

The model should run on the cpu.

Logs

--truncated--

 > voice_conversion_models/multilingual/vctk/freevc24 is already downloaded.
 > Using model: freevc
 > Loading pretrained speaker encoder model ...
Loaded the voice encoder model on cuda in 0.19 seconds.
Traceback (most recent call last):
  File "my_script.py", line 20, in <module>
    tts.tts_with_vc_to_file(
  File ".venv/lib/python3.11/site-packages/TTS/api.py", line 455, in tts_with_vc_to_file
    wav = self.tts_with_vc(
          ^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/TTS/api.py", line 420, in tts_with_vc
    wav = self.voice_converter.voice_conversion(source_wav=fp.name, target_wav=speaker_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 254, in voice_conversion
    output_wav = self.vc_model.voice_conversion(source_wav, target_wav)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/TTS/vc/models/freevc.py", line 523, in voice_conversion
    g_tgt = self.enc_spk_ex.embed_utterance(wav_tgt)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 155, in embed_utterance
    partial_embeds = self(mels).cpu().numpy()
                     ^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py", line 60, in forward
    _, (hidden, _) = self.lstm(mels)
                     ^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/rnn.py", line 911, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.35 GiB. GPU

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA RTX A500 Laptop GPU"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.3.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.9",
        "version": "#35-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 15:51:52 UTC 2024"
    }
}

Additional context

No response

Use the specified device for pretrained speaker encoder Fixes coqui-ai#3787

…45) Fixes coqui-ai#3787

stale · 2024-07-18T23:46:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

ChristianRomberg added the bug Something isn't working label Jun 12, 2024

ChristianRomberg added a commit to ChristianRomberg/TTS that referenced this issue Jun 12, 2024

Fix coqui-ai#3787

356eb70

ChristianRomberg added a commit to ChristianRomberg/TTS that referenced this issue Jun 15, 2024

Update freevc.py

8c0a5c5

Use the specified device for pretrained speaker encoder Fixes coqui-ai#3787

ChristianRomberg mentioned this issue Jun 15, 2024

Use the specified device for pretrained speaker encoder idiap/coqui-ai-TTS#45

Merged

eginhard pushed a commit to idiap/coqui-ai-TTS that referenced this issue Jun 16, 2024

fix(freevc): use the specified device for pretrained speaker encoder (#…

3a20f47

…45) Fixes coqui-ai#3787

pieris98 mentioned this issue Jun 21, 2024

[Bug] tts.tts_with_vc_to_file cannot use cpu #3797

Open

stale bot added the wontfix This will not be worked on but feel free to help. label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] tts_with_vc_to_file uses cuda even though "cpu" is specified as device #3787

[Bug] tts_with_vc_to_file uses cuda even though "cpu" is specified as device #3787

ChristianRomberg commented Jun 12, 2024

stale bot commented Jul 18, 2024

[Bug] tts_with_vc_to_file uses cuda even though "cpu" is specified as device #3787

[Bug] tts_with_vc_to_file uses cuda even though "cpu" is specified as device #3787

Comments

ChristianRomberg commented Jun 12, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

stale bot commented Jul 18, 2024