v0.9.6.3 multigpu training fixes and optimisations

bghira released this 09 Jun 22:39

· 1731 commits to release since this release

What's Changed

MultiGPU training improvements

Thanks to Fal.ai for providing hardware to investigate and improve these areas:

VAE caching now reliably runs across all GPUs without missing any entries
Text embed caching is now parallelised safely across all GPUs
Other speed improvements, torch compile tests, and more

Pull requests

mixture-of-experts updates by @bghira in #430
vaecache: optimise preprocessing performance, getting more GPU utilisation
diffusers: update version. add workaround for sharded checkpoints, resolved in next version of the library.
parquet: optimise loading of captions by skipping filter
trainingsample: fix reference to nonexistent metadata during vae preprocessing by @bghira in #431
text embed cache: optimise to run across GPUs by @bghira in #432
vaecache: fix multigpu caching missing a huge number of relevant jobs by @bghira in #433

MultiGPU fixes made possible with hardware provided by Fal.ai

Full Changelog: v0.9.6.2...v0.9.6.3

Contributors

bghira

Assets 2