Skip to content

Releases: bghira/SimpleTuner

v0.9.8.3.1 - state dict fix for final resulting safetensors

19 Aug 17:52
3172256
Compare
Choose a tag to compare

Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.

What's Changed

  • state dict export for final pipeline by @bghira in #812
  • lycoris: disable text encoder training (it doesn't work here, yet)
  • state dict fix for the final pipeline export after training by @bghira in #813
  • lycoris: final export fix, correctly save weights by @bghira in #814
  • update lycoris doc by @bghira in #815
  • lycoris updates by @bghira in #816

Full Changelog: v0.9.8.3...v0.9.8.3.1

v0.9.8.3 - essential fixes and improvements

18 Aug 22:36
da4bc81
Compare
Choose a tag to compare

What's Changed

that woman you've probably seen so many times!

General

  • Non-BF16 capable optimisers removed in favour of a series of new Optimi options
  • new crop_aspect option closest that uses crop_aspect_buckets as a list of options
  • fewer images are discarded, minimum image size isn't set by default for you any longer
  • better behaviour with mixed datasets, more equally sampling large and small sets
    • caveat dreambooth training now probably wants --data_backend_sampling=uniform instead of auto-weighting
  • multi-caption fixes, it was always using the first caption before (whoops)
  • TF32 now enabled by default for users with configure.py
  • New arguments for --custom_transformer_model_name_or_path to use a flat repository or local dir containing just the transformer model
  • InvernVL captioning script contributed by @frankchieng
  • ability to change constant learning rate on resume
  • fix SDXL controlnet training, allowing it to work with quanto
  • DeepSpeed fixes, caveat broken validations

Flux

  • New LoRA targets ai-toolkit and context-ffs, with context-ffs behaving more like text encoder training
  • New LoRA training resumption support via --init_lora
  • LyCORIS support
  • Novel attention masking implementation via --flux_attention_masked_training thanks to @AmericanPresidentJimmyCarter (#806)
  • Schnell --flux_fast_schedule fixed (still not great)

Pull Requests

New Contributors

Full Changelog: v0.9.8.2...v0.9.8.3

v0.9.8.2 - fixed comfyUI inference, better aspect bucketing limits

15 Aug 06:00
00bf989
Compare
Choose a tag to compare

Main issues solved:

  • Updated quickstart/flux documentation to have newer recommendations.
  • New pixel_area mode for resolution_type reduces guesswork.
  • bfloat16 error
  • comfyUI not reading the LoRAs created this week. that is my fault, it is now back to the way of intiialising adapters we used in the prev release.
    • stick with the latest release tag via git checkout v0.9.8.2 to avoid issues with the main branch creeping up on you in the future, if necessary
  • batch sizes no longer need to be perfectly aligned with the dataset

Thank you for testing the changes and helping make it better.

What's Changed

Full Changelog: v0.9.8.1...v0.9.8.2

v0.9.8.1 - much-improved flux training

11 Aug 05:01
6e6385c
Compare
Choose a tag to compare

image
Dreambooth results from this release

What's Changed

  • Quantised Flux LoRA training (but only non-quantised models can resume training still)
  • More VRAM and System memory use reductions contributed by team and community members
  • CUDA 12.4 requirement bump as well as blacklisting of Python 3.12
  • Dockerfile updates, allowing deployment of the latest build without errors
  • More LoRA training options for Flux Dev
  • Basic (crappy) Schnell training support
  • Support for preserving Flux Dev's distillation or introducing CFG back into the model for improved creativity
    • CFG skip logic to ensure no blurry results on undertrained LoRAs without requiring a CFG-capable base model

Detailed change list

New Contributors

Full Changelog: v0.9.8...v0.9.8.1

v0.9.8 - flux 24 gig training has entered the chat

05 Aug 00:57
Compare
Choose a tag to compare

Flux

image

It's here! Runs on 24G cards using Quanto's 8bit quantisation, or 25.7G on a Macbook system (slowly)!

If you're after accuracy, a 40G card will do Just Fine, with 80G cards being somewhat of a sweet spot for larger training efforts.

What you get:

  • LoRA, full tuning (but probably just don't do that)
  • Documentation to get you started fast
  • Probably better for just square crop training for now - might artifact for weird resolutions
  • Quantised base model unlocking the ability to safely use Adafactor, Prodigy, and other neat optimisers as a consolation prize for losing access to full bf16 training (AdamWBF16 just won't work with Quanto)

What's Changed

  • trainer: simplify check by @bghira in #592
  • documentation updates, apple pytorch 2.4 by @bghira in #595
  • staged storage for image embed support by @bghira in #596
  • fix: loading default image embed backend by @bghira in #597
  • fix: loading default image embed backend by @bghira in #598
  • multi-gpu console output improvements by @bghira in #599
  • vae cache: hash_filenames option for image sets by @bghira in #601
  • multi-gpu console output reduction by @bghira in #602
  • fix for relative cache directories with NoneType being unsubscriptable by @bghira in #603
  • multigpu / relative path fixes for caching by @bghira in #604
  • backend for csv based datasets by @bghira in #600
  • CSV data backend by @bghira in #605
  • config file versioning to allow updating defaults without breaking backwards compat by @bghira in #607
  • config file versioning for backwards compat by @bghira in #608
  • experiment: small DiT model by @bghira in #609
  • merge by @bghira in #610
  • Fix crash when using jsonl files by @swkang-rp in #611
  • merge by @bghira in #612
  • flux training by @bghira in #614
  • update base_dir to output_dir by @bghira in #615
  • merge by @bghira in #616
  • flux: validations should ignore any custom schedulers by @bghira in #618
  • release: flux by @bghira in #617
  • bugfix: correctly set hash_filenames to true or false for an initial dataset creation by @bghira in #620
  • release: minor follow-up fixes by @bghira in #628
  • Flux: Fix random validation errors due to some tensors being on the cpu by @mhirki in #629
  • Improve config support for transformers with accelerate by @touchwolf in #630
  • quanto: exploring low-precision training. by @bghira in #622
  • remove all text encoders from memory correctly by @bghira in #637

New Contributors

Full Changelog: v0.9.7.8...v0.9.8

v0.9.7.8 - Kwai Kolors

21 Jul 22:14
36754eb
Compare
Choose a tag to compare

What's Changed

Kolors is now a first-class citizen with full training support (no ControlNet).

Full Changelog: v0.9.7.7...v0.9.7.8

v0.9.7.7 - fixed SD3 training

17 Jul 20:13
cb7eccf
Compare
Choose a tag to compare

What's Changed

There is a new folder layout.

  • All configs now belong in config/ directory
  • sdxl-env.sh is now config/config.env
  • sd21-env.sh is just gone, train_sd2x.py is now gone, merged into train.py
  • train_sdxl.py is now just train.py
  • You will have to update your config.env to point to the config/ subdirectory for multidatbackend.json

Aura Flow v0.1

Train LoRAs and full model (DeepSpeed required) of AuraFlow v0.1 at 1024px.

LoRA is recommended with default settings other than learning rate or LoRA rank depth and alpha. Tweaking these is still recommended.

Stable Diffusion 3

Mostly, your SD3 training should work now because we're no longer zeroing out the embeds for caption dropout. If you weren't using dropout because you're secretly a sly genius, you can continue on as usual.

  • cleanup debug logs by @bghira in #561
  • sd3: Fix random errors when generating validation images. (#546) by @mhirki in #562
  • Fix text encoder unload reporting by @mhirki in #563
  • Fix cached text embeds not getting loaded from the disk with local backend and relative path in cache_dir by @mhirki in #564
  • community fixes by @bghira in #565
  • implement Aura Diffusion training by @bghira in #553
  • reorganise and eliminate train script split by @bghira in #568
  • script layout refactor pt. 2 by @bghira in #569
  • Nuke the text encoders by sending them to "meta" by @Disty0 in #571
  • auraflow & project layout refactor by @bghira in #573
  • minor bugfixes, alternative flow-matching loss formulation by @bghira in #577
  • merge by @bghira in #582
  • bugfix: loading validation embeds on multi-gpu system (compressed cache) by @bghira in #583
  • switch default weighting mechanism to none for flow matching models by @bghira in #584
  • fix SD3 training by @bghira in #586

New Contributors

Full Changelog: v0.9.7.6...v0.9.7.7

v0.9.7.6

06 Jul 05:51
8adfe0a
Compare
Choose a tag to compare

What's Changed

Some documentation is missing for some of these new features, but it will be added soon.

Multi-caption support

  • Supported caption_strategy: parquet, textfile
    • textfiles with multiple captions split by newline
    • parquet backends can now have multiple caption columns, or, a column that contains a list of captions.

Default noise scheduler (inference)

The default value if none is supplied is now None, which uses the upstream model configuration from the repository.

Prefetch

Some minor bugfixes have gone into this, but it remains an experimental feature with uncertain gains in performance.

Minor features

  • --torch_num_threads to stop torch from spawning too many threads on big systems
  • CV2 is now used for image loading, which is also going to invoke libpng for png images, which is very chatty and spews lots of warnings, which we cannot control.

Other bugfixes

  • Reworked the area resize code and cropping logic to strip out redundancy and improve clarity.
    • Ensures we do not see any squished images for a more broad range of aspect ratios, across every cropping and resizing configuration.
    • Remove your VAE and aspect bucket caches to take advantage of this.

Changes

  • remove unneeded imports by @bghira in #544
  • prefetch: minor bugfixes, epoch tracking
  • add --torch_num_threads for very large systems
  • catch delete failure when delete_problematic_images is set by @bghira in #547
  • Fix bucket search for unseen images not containing the absolute path to the image by @bghira in #549
  • multi-caption support for textfile and parquet backend by @bghira in #527
  • Load images preferentially with CV2, fall back to PIL only if that fails by @AmericanPresidentJimmyCarter in #551
  • bugfix: bucket search for unseen images should prepend the instance data root so that the images can actually be loaded from disk
  • batch prefetch should be correctly destroyed/shutdown upon error or ctrl+c
  • VAE embed inconsistency fixed by cloning latent before write by @bghira in #552
  • Refactor save_hooks by @sayakpaul in #554
  • cv2: error checking for image load when we hit grayscale images
  • arguments: set --inference_noise_scheduler to None by default so that PixArt scheduler is uninterrupted by @bghira in #555
  • refactor area resize for code clarity and fixing non-cropped / downsampled images by @bghira in #558
  • area resize refactor by @bghira in #559

Full Changelog: v0.9.7.5...v0.9.7.6

v0.9.7.5b - bugfix for imports

28 Jun 11:48
184c3f9
Compare
Choose a tag to compare

What's Changed

Resolve a regression that caused a crash at startup due to missing commit in release branch.

Full Changelog: v0.9.7.5...v0.9.7.5b

v0.9.7.5 - compression for embed caching

27 Jun 13:19
353cf9d
Compare
Choose a tag to compare

What's Changed

  • ema: offload to cpu, update every n steps by @bghira in #517
  • ema: move correctly by @bghira in #520
  • EMA: refactor to support CPU offload, step-skipping, and DiT models
  • pixart: reduce max grad norm by default, forcibly by @bghira in #521
  • remove incorrect log line when using cpu offload for ema by @bghira in #523
  • add --dataloader_prefetch option for speed-up by @bghira in #535
  • parquet metadata: retrieve captions near-instantly at startup
  • multi-gpu: logging cleanup, performance fixes
  • settings: no invisible default minimum_image_size
  • text embed cache fix for repeatedly attempting to write files that already exist
  • text embed cache fix for AWS prefixes not being referenced correctly
  • embed cache (text & vae) compression via --compress_disk_cache by @bghira in #540

Full Changelog: v0.9.7.4...v0.9.7.5