Compatibility with TTS systems #47

wetdog · 2024-02-19T00:10:39Z

Various Text-to-Speech(TTS) implementations( Grad-TTS, Matcha-TTS, P-flow ) rely on the mel spectrogram feature extractor code found in hifi-gan

This PR introduces modifications to the feature extractor in order to enable the Vocos to work seamlessly with the outputs generated by the those TTS systems.

To achieve this, the parameters within the torchaudio.transforms.MelSpectrogram were adjusted to match the features generated in the hifi-gan codebase. Specifically the changes were made in the frequency limits and the mel scale.

We trained Vocos 400k steps using this changes and we're able to obtain a reasonable good quality audio from the output of Matcha-TTS.

Closes #39

wetdog added 3 commits February 17, 2024 14:23

Update torchaudio mel spectrogram paramters

10316e2

update reconstruction loss with new mel features

342276d

Create new config with matcha parameters

734bc2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with TTS systems #47

Compatibility with TTS systems #47

wetdog commented Feb 19, 2024

Compatibility with TTS systems #47

Are you sure you want to change the base?

Compatibility with TTS systems #47

Conversation

wetdog commented Feb 19, 2024