Skip to content

v0.9.6.3 multigpu training fixes and optimisations

Compare
Choose a tag to compare
@bghira bghira released this 09 Jun 22:39
· 1731 commits to release since this release
2b6b765

What's Changed

MultiGPU training improvements

Thanks to Fal.ai for providing hardware to investigate and improve these areas:

  • VAE caching now reliably runs across all GPUs without missing any entries
  • Text embed caching is now parallelised safely across all GPUs
  • Other speed improvements, torch compile tests, and more

Pull requests

  • mixture-of-experts updates by @bghira in #430
  • vaecache: optimise preprocessing performance, getting more GPU utilisation
  • diffusers: update version. add workaround for sharded checkpoints, resolved in next version of the library.
  • parquet: optimise loading of captions by skipping filter
  • trainingsample: fix reference to nonexistent metadata during vae preprocessing by @bghira in #431
  • text embed cache: optimise to run across GPUs by @bghira in #432
  • vaecache: fix multigpu caching missing a huge number of relevant jobs by @bghira in #433

MultiGPU fixes made possible with hardware provided by Fal.ai

Full Changelog: v0.9.6.2...v0.9.6.3