arch and optimizer choice #1
Replies: 2 comments
-
@e-c-k-e-r poking for visibility, whenever you have time ofc. ↑ |
Beta Was this translation helpful? Give feedback.
-
I really need to figure out why I'm not getting any notifications for this repo. Too used to gitea showing any notification for all of my repos.
If I remember right:
Compared to what I've seen, VALL-E is very straighforward in that it takes a no-frills transformer model, and just properly prepares the input sequences and processes the output sequences. No need for conditioning latents or encoded features (as the EnCodec codes are the encoded features), no additional components to further process the outputs (besides EnCodec, but that very well can be considered as an audio codec); it can all very well be implemented with encodec.cpp + llama.cpp. Treating speech synthesis as a language problem and using a language model to solve it is far too elegant and promising. There's other papers and solutions that do seem to treat it as a diffusion problem, but honestly I still am very lacking in the unet department. SESD seems a little too promising, but it feels like it's constrained to the same problems stuff like Stable Diffusion/FLUX.1 and have a lot of complexity attached. I'm for sure no expert so I could be wrong entirely.
I honestly can't remember how I stumbled upon prodigyopt. I think I saw AdaGrad under bitsandbytes, and somewhere along the lines was suggested prodigyopt over AdaGrad. Despite it requiring a lot more VRAM, it gave very pleasing results in experimentation, and it's been my goto optimizer for my other usecases. Having to deal with learning rates and schedulers was always a weak point of mine, so just having an optimizer handle that itself is a godsend.
Despite having """support""" for ScheduleFree in this repo, I don't think I actually experimented much with it beyond making sure the code for it worked in the test trainer. I think I was entertaining the uncertainty that prodigyopt might have been a weakpoint, but everything else didn't satisfy my expectations. I'm no expert by any means but I do swear by prodigyopt if you can spare the extra VRAM. |
Beta Was this translation helpful? Give feedback.
-
Very interesting! I appreciate the detailed write-ups in the README here and on HF.
Out of curiosity, what made you choose VALL-E compared to other zero-shot proposals? What led you to use the ProdiGY optimizer? How would you compare it to the Schedule-Free optimizer?
Thank you for your time!
Beta Was this translation helpful? Give feedback.
All reactions