Releases: bghira/SimpleTuner
v0.9.8.3.1 - state dict fix for final resulting safetensors
Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.
What's Changed
- state dict export for final pipeline by @bghira in #812
- lycoris: disable text encoder training (it doesn't work here, yet)
- state dict fix for the final pipeline export after training by @bghira in #813
- lycoris: final export fix, correctly save weights by @bghira in #814
- update lycoris doc by @bghira in #815
- lycoris updates by @bghira in #816
Full Changelog: v0.9.8.3...v0.9.8.3.1
v0.9.8.3 - essential fixes and improvements
What's Changed
General
- Non-BF16 capable optimisers removed in favour of a series of new Optimi options
- new crop_aspect option
closest
that usescrop_aspect_buckets
as a list of options - fewer images are discarded, minimum image size isn't set by default for you any longer
- better behaviour with mixed datasets, more equally sampling large and small sets
- caveat dreambooth training now probably wants
--data_backend_sampling=uniform
instead ofauto-weighting
- caveat dreambooth training now probably wants
- multi-caption fixes, it was always using the first caption before (whoops)
- TF32 now enabled by default for users with
configure.py
- New arguments for
--custom_transformer_model_name_or_path
to use a flat repository or local dir containing just the transformer model - InvernVL captioning script contributed by @frankchieng
- ability to change constant learning rate on resume
- fix SDXL controlnet training, allowing it to work with quanto
- DeepSpeed fixes, caveat broken validations
Flux
- New LoRA targets
ai-toolkit
andcontext-ffs
, with context-ffs behaving more like text encoder training - New LoRA training resumption support via
--init_lora
- LyCORIS support
- Novel attention masking implementation via
--flux_attention_masked_training
thanks to @AmericanPresidentJimmyCarter (#806) - Schnell
--flux_fast_schedule
fixed (still not great)
Pull Requests
- Fix --input_perturbation_steps so that it actually has an effect by @mhirki in #772
- add
ai-toolkit
option in --flux_lora_target choices by @benihime91 in #773 - Create caption_with_internvl.py by @frankchieng in #778
- Add LyCORIS training to SimpleTuner by @AmericanPresidentJimmyCarter in #776
- (#782) fix type comparison in configure script by @bghira in #783
- update path in documentation by @yggdrasil75 in #784
- Add Standard as default LoRA type by @AmericanPresidentJimmyCarter in #787
- Lora init from file by @kaibioinfo in #789
- wip: optimi by @bghira in #785
- add new lora option for context+ffs by @kaibioinfo in #795
- add auto-weighting for dataset selection with user probability modulation by @bghira in #797
- fix for sampling population smaller than request by @bghira in #802
- fix instance prompt sampling multiple prompts always taking the first… by @bghira in #801
- Fixes for --flux_fast_schedule by @mhirki in #803
- tf32, custom transformer paths, and error log for short batch, add fixed length calc by @bghira in #805
- Add attention masking to the custom helpers/flux/transformer.py by @AmericanPresidentJimmyCarter in #806
New Contributors
- @benihime91 made their first contribution in #773
- @frankchieng made their first contribution in #778
- @yggdrasil75 made their first contribution in #784
- @kaibioinfo made their first contribution in #789
Full Changelog: v0.9.8.2...v0.9.8.3
v0.9.8.2 - fixed comfyUI inference, better aspect bucketing limits
Main issues solved:
- Updated quickstart/flux documentation to have newer recommendations.
- New
pixel_area
mode forresolution_type
reduces guesswork. - bfloat16 error
- comfyUI not reading the LoRAs created this week. that is my fault, it is now back to the way of intiialising adapters we used in the prev release.
- stick with the latest release tag via
git checkout v0.9.8.2
to avoid issues with the main branch creeping up on you in the future, if necessary
- stick with the latest release tag via
- batch sizes no longer need to be perfectly aligned with the dataset
Thank you for testing the changes and helping make it better.
What's Changed
- fix sentencepiece dep, hub username missing in model card by @bghira in #727
- Make tokeniser 2 load failure permanent by @bghira in #729
- Make Docker foolproof by @komninoschatzipapas in #732
- minor fixes follow-up by @bghira in #744
- refactor train.py to reduce LoCs. by @sayakpaul in #739
- add pixel_area resolution type by @bghira in #756
- for #757 - refactor file discovery, update provider req by @bghira in #758
- Add input perturbation by @mhirki in #723
- merge follow-ups and other fixes to release branch by @bghira in #764
- Follow up from #739 by @sayakpaul in #762
- update defaults + docs by @bghira in #767
- follow-up residuals by @bghira in #768
- residuals by @bghira in #769
- fix double divide by @bghira in #770
- revert from get_peft_model to add_adapter by @bghira in #771
Full Changelog: v0.9.8.1...v0.9.8.2
v0.9.8.1 - much-improved flux training
Dreambooth results from this release
What's Changed
- Quantised Flux LoRA training (but only non-quantised models can resume training still)
- More VRAM and System memory use reductions contributed by team and community members
- CUDA 12.4 requirement bump as well as blacklisting of Python 3.12
- Dockerfile updates, allowing deployment of the latest build without errors
- More LoRA training options for Flux Dev
- Basic (crappy) Schnell training support
- Support for preserving Flux Dev's distillation or introducing CFG back into the model for improved creativity
- CFG skip logic to ensure no blurry results on undertrained LoRAs without requiring a CFG-capable base model
Detailed change list
- release: follow-ups, memory reduction, quanto LoRA training by @bghira in #638
- quanto: allow further vram reduction with bf16 base weights, reorder model loading operations to evacuate text encoders before DiT loads by @bghira in #647
- merge vram reductions & regression fixes by @bghira in #649
- CUDA 12.4; more efficient model loading, lower overall sysmem / vram usage; enforce hashed VAE object names by default; reduce default VAE batch size; change default LR scheduler to polynomial by @bghira in #660
- update something about kolor in the train.py by @chongxian in #658
- refactor saving utitilites part ii by @sayakpaul in #661
- Residues of #661 by @sayakpaul in #662
- Update Dockerfile by @komninoschatzipapas in #675
- merge: reorder model casting, fix kolors on newer diffusers, update cuda, refactor save hooks by @bghira in #666
- Flux: Purge text encoders from system RAM. by @mhirki in #694
- [Tests] basic save hook tests by @sayakpaul in #679
- Add the ability to train all nn.Linear for flux by @AmericanPresidentJimmyCarter in #707
- Add real guidance scale to flux validation (optional) by @AmericanPresidentJimmyCarter in #706
- flux: import some of kohya suggestions (wip) by @bghira in #711
- Add CFG skip for Flux lora training, fix precomputed embeds bug for C… by @AmericanPresidentJimmyCarter in #712
- fixed flux training by @bghira in #714
- final flux updates for release by @bghira in #716
New Contributors
- @chongxian made their first contribution in #658
Full Changelog: v0.9.8...v0.9.8.1
v0.9.8 - flux 24 gig training has entered the chat
Flux
It's here! Runs on 24G cards using Quanto's 8bit quantisation, or 25.7G on a Macbook system (slowly)!
If you're after accuracy, a 40G card will do Just Fine, with 80G cards being somewhat of a sweet spot for larger training efforts.
What you get:
- LoRA, full tuning (but probably just don't do that)
- Documentation to get you started fast
- Probably better for just square crop training for now - might artifact for weird resolutions
- Quantised base model unlocking the ability to safely use Adafactor, Prodigy, and other neat optimisers as a consolation prize for losing access to full bf16 training (AdamWBF16 just won't work with Quanto)
What's Changed
- trainer: simplify check by @bghira in #592
- documentation updates, apple pytorch 2.4 by @bghira in #595
- staged storage for image embed support by @bghira in #596
- fix: loading default image embed backend by @bghira in #597
- fix: loading default image embed backend by @bghira in #598
- multi-gpu console output improvements by @bghira in #599
- vae cache: hash_filenames option for image sets by @bghira in #601
- multi-gpu console output reduction by @bghira in #602
- fix for relative cache directories with NoneType being unsubscriptable by @bghira in #603
- multigpu / relative path fixes for caching by @bghira in #604
- backend for csv based datasets by @bghira in #600
- CSV data backend by @bghira in #605
- config file versioning to allow updating defaults without breaking backwards compat by @bghira in #607
- config file versioning for backwards compat by @bghira in #608
- experiment: small DiT model by @bghira in #609
- merge by @bghira in #610
- Fix crash when using jsonl files by @swkang-rp in #611
- merge by @bghira in #612
- flux training by @bghira in #614
- update base_dir to output_dir by @bghira in #615
- merge by @bghira in #616
- flux: validations should ignore any custom schedulers by @bghira in #618
- release: flux by @bghira in #617
- bugfix: correctly set hash_filenames to true or false for an initial dataset creation by @bghira in #620
- release: minor follow-up fixes by @bghira in #628
- Flux: Fix random validation errors due to some tensors being on the cpu by @mhirki in #629
- Improve config support for transformers with accelerate by @touchwolf in #630
- quanto: exploring low-precision training. by @bghira in #622
- remove all text encoders from memory correctly by @bghira in #637
New Contributors
- @swkang-rp made their first contribution in #611
- @touchwolf made their first contribution in #630
Full Changelog: v0.9.7.8...v0.9.8
v0.9.7.8 - Kwai Kolors
What's Changed
Kolors is now a first-class citizen with full training support (no ControlNet).
Full Changelog: v0.9.7.7...v0.9.7.8
v0.9.7.7 - fixed SD3 training
What's Changed
There is a new folder layout.
- All configs now belong in
config/
directory sdxl-env.sh
is nowconfig/config.env
sd21-env.sh
is just gone,train_sd2x.py
is now gone, merged intotrain.py
train_sdxl.py
is now justtrain.py
- You will have to update your
config.env
to point to the config/ subdirectory formultidatbackend.json
Aura Flow v0.1
Train LoRAs and full model (DeepSpeed required) of AuraFlow v0.1 at 1024px.
LoRA is recommended with default settings other than learning rate or LoRA rank depth and alpha. Tweaking these is still recommended.
Stable Diffusion 3
Mostly, your SD3 training should work now because we're no longer zeroing out the embeds for caption dropout. If you weren't using dropout because you're secretly a sly genius, you can continue on as usual.
- cleanup debug logs by @bghira in #561
- sd3: Fix random errors when generating validation images. (#546) by @mhirki in #562
- Fix text encoder unload reporting by @mhirki in #563
- Fix cached text embeds not getting loaded from the disk with local backend and relative path in cache_dir by @mhirki in #564
- community fixes by @bghira in #565
- implement Aura Diffusion training by @bghira in #553
- reorganise and eliminate train script split by @bghira in #568
- script layout refactor pt. 2 by @bghira in #569
- Nuke the text encoders by sending them to "meta" by @Disty0 in #571
- auraflow & project layout refactor by @bghira in #573
- minor bugfixes, alternative flow-matching loss formulation by @bghira in #577
- merge by @bghira in #582
- bugfix: loading validation embeds on multi-gpu system (compressed cache) by @bghira in #583
- switch default weighting mechanism to none for flow matching models by @bghira in #584
- fix SD3 training by @bghira in #586
New Contributors
Full Changelog: v0.9.7.6...v0.9.7.7
v0.9.7.6
What's Changed
Some documentation is missing for some of these new features, but it will be added soon.
Multi-caption support
- Supported
caption_strategy
:parquet
,textfile
- textfiles with multiple captions split by newline
- parquet backends can now have multiple caption columns, or, a column that contains a list of captions.
Default noise scheduler (inference)
The default value if none is supplied is now None
, which uses the upstream model configuration from the repository.
Prefetch
Some minor bugfixes have gone into this, but it remains an experimental feature with uncertain gains in performance.
Minor features
--torch_num_threads
to stop torch from spawning too many threads on big systems- CV2 is now used for image loading, which is also going to invoke
libpng
for png images, which is very chatty and spews lots of warnings, which we cannot control.
Other bugfixes
- Reworked the area resize code and cropping logic to strip out redundancy and improve clarity.
- Ensures we do not see any squished images for a more broad range of aspect ratios, across every cropping and resizing configuration.
- Remove your VAE and aspect bucket caches to take advantage of this.
Changes
- remove unneeded imports by @bghira in #544
- prefetch: minor bugfixes, epoch tracking
- add --torch_num_threads for very large systems
- catch delete failure when delete_problematic_images is set by @bghira in #547
- Fix bucket search for unseen images not containing the absolute path to the image by @bghira in #549
- multi-caption support for
textfile
andparquet
backend by @bghira in #527 - Load images preferentially with CV2, fall back to PIL only if that fails by @AmericanPresidentJimmyCarter in #551
- bugfix: bucket search for unseen images should prepend the instance data root so that the images can actually be loaded from disk
- batch prefetch should be correctly destroyed/shutdown upon error or ctrl+c
- VAE embed inconsistency fixed by cloning latent before write by @bghira in #552
- Refactor
save_hooks
by @sayakpaul in #554 - cv2: error checking for image load when we hit grayscale images
- arguments: set --inference_noise_scheduler to None by default so that PixArt scheduler is uninterrupted by @bghira in #555
- refactor area resize for code clarity and fixing non-cropped / downsampled images by @bghira in #558
- area resize refactor by @bghira in #559
Full Changelog: v0.9.7.5...v0.9.7.6
v0.9.7.5b - bugfix for imports
What's Changed
Resolve a regression that caused a crash at startup due to missing commit in release branch.
Full Changelog: v0.9.7.5...v0.9.7.5b
v0.9.7.5 - compression for embed caching
What's Changed
- ema: offload to cpu, update every n steps by @bghira in #517
- ema: move correctly by @bghira in #520
- EMA: refactor to support CPU offload, step-skipping, and DiT models
- pixart: reduce max grad norm by default, forcibly by @bghira in #521
- remove incorrect log line when using cpu offload for ema by @bghira in #523
- add --dataloader_prefetch option for speed-up by @bghira in #535
- parquet metadata: retrieve captions near-instantly at startup
- multi-gpu: logging cleanup, performance fixes
- settings: no invisible default minimum_image_size
- text embed cache fix for repeatedly attempting to write files that already exist
- text embed cache fix for AWS prefixes not being referenced correctly
- embed cache (text & vae) compression via
--compress_disk_cache
by @bghira in #540
Full Changelog: v0.9.7.4...v0.9.7.5