About model implementation differences #35

sourcesur · 2024-03-01T07:41:42Z

Hi, thanks for your effort and sharing the code!
The architecture blocks in the speech prompted text encoder and CFM decoder differ from the initial ones introduced in the paper. I would like to know what made you do the changes. Was the model not converging with official architecture?

p0p4k · 2024-03-01T08:11:54Z

Just easier to implement and more modular. Keeping it open ended to make it more accessible to do experiments.

sourcesur · 2024-03-06T06:59:19Z

I wanted to reproduce the results from the paper, so I used this repo (master branch) to train the model on LibriTTS. I trained it for 800k steps and longer but the overall generation quality is quite far from the official demo.
Have you tried reproducing the results?

p0p4k · 2024-03-06T11:25:42Z

try to play around with the speech prompt encoder. I have not trained this model yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About model implementation differences #35

About model implementation differences #35

sourcesur commented Mar 1, 2024

p0p4k commented Mar 1, 2024

sourcesur commented Mar 6, 2024

p0p4k commented Mar 6, 2024 •

edited

Loading

About model implementation differences #35

About model implementation differences #35

Comments

sourcesur commented Mar 1, 2024

p0p4k commented Mar 1, 2024

sourcesur commented Mar 6, 2024

p0p4k commented Mar 6, 2024 • edited Loading

p0p4k commented Mar 6, 2024 •

edited

Loading