v0.9.6.3 multigpu training fixes and optimisations
What's Changed
MultiGPU training improvements
Thanks to Fal.ai for providing hardware to investigate and improve these areas:
- VAE caching now reliably runs across all GPUs without missing any entries
- Text embed caching is now parallelised safely across all GPUs
- Other speed improvements, torch compile tests, and more
Pull requests
- mixture-of-experts updates by @bghira in #430
- vaecache: optimise preprocessing performance, getting more GPU utilisation
- diffusers: update version. add workaround for sharded checkpoints, resolved in next version of the library.
- parquet: optimise loading of captions by skipping filter
- trainingsample: fix reference to nonexistent metadata during vae preprocessing by @bghira in #431
- text embed cache: optimise to run across GPUs by @bghira in #432
- vaecache: fix multigpu caching missing a huge number of relevant jobs by @bghira in #433
MultiGPU fixes made possible with hardware provided by Fal.ai
Full Changelog: v0.9.6.2...v0.9.6.3