19 Aug 17:52

bghira

3172256

v0.9.8.3.1 - state dict fix for final resulting safetensors

Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.

What's Changed

state dict export for final pipeline by @bghira in #812
lycoris: disable text encoder training (it doesn't work here, yet)
state dict fix for the final pipeline export after training by @bghira in #813
lycoris: final export fix, correctly save weights by @bghira in #814
update lycoris doc by @bghira in #815
lycoris updates by @bghira in #816

Full Changelog: v0.9.8.3...v0.9.8.3.1

Contributors

bghira

Assets 2

18 Aug 22:36

bghira

v0.9.8.3

da4bc81

v0.9.8.3 - essential fixes and improvements

What's Changed

General

Non-BF16 capable optimisers removed in favour of a series of new Optimi options
new crop_aspect option closest that uses crop_aspect_buckets as a list of options
fewer images are discarded, minimum image size isn't set by default for you any longer
better behaviour with mixed datasets, more equally sampling large and small sets
- caveat dreambooth training now probably wants --data_backend_sampling=uniform instead of auto-weighting
multi-caption fixes, it was always using the first caption before (whoops)
TF32 now enabled by default for users with configure.py
New arguments for --custom_transformer_model_name_or_path to use a flat repository or local dir containing just the transformer model
InvernVL captioning script contributed by @frankchieng
ability to change constant learning rate on resume
fix SDXL controlnet training, allowing it to work with quanto
DeepSpeed fixes, caveat broken validations

Flux

New LoRA targets ai-toolkit and context-ffs, with context-ffs behaving more like text encoder training
New LoRA training resumption support via --init_lora
LyCORIS support
Novel attention masking implementation via --flux_attention_masked_training thanks to @AmericanPresidentJimmyCarter (#806)
Schnell --flux_fast_schedule fixed (still not great)

Pull Requests

Fix --input_perturbation_steps so that it actually has an effect by @mhirki in #772
add ai-toolkit option in --flux_lora_target choices by @benihime91 in #773
Create caption_with_internvl.py by @frankchieng in #778
Add LyCORIS training to SimpleTuner by @AmericanPresidentJimmyCarter in #776
(#782) fix type comparison in configure script by @bghira in #783
update path in documentation by @yggdrasil75 in #784
Add Standard as default LoRA type by @AmericanPresidentJimmyCarter in #787
Lora init from file by @kaibioinfo in #789
wip: optimi by @bghira in #785
add new lora option for context+ffs by @kaibioinfo in #795
add auto-weighting for dataset selection with user probability modulation by @bghira in #797
fix for sampling population smaller than request by @bghira in #802
fix instance prompt sampling multiple prompts always taking the first… by @bghira in #801
Fixes for --flux_fast_schedule by @mhirki in #803
tf32, custom transformer paths, and error log for short batch, add fixed length calc by @bghira in #805
Add attention masking to the custom helpers/flux/transformer.py by @AmericanPresidentJimmyCarter in #806

New Contributors

@benihime91 made their first contribution in #773
@frankchieng made their first contribution in #778
@yggdrasil75 made their first contribution in #784
@kaibioinfo made their first contribution in #789

Full Changelog: v0.9.8.2...v0.9.8.3

Contributors

kaibioinfo, mhirki, and 5 other contributors

Assets 2

15 Aug 06:00

bghira

v0.9.8.2

00bf989

v0.9.8.2 - fixed comfyUI inference, better aspect bucketing limits

Main issues solved:

Updated quickstart/flux documentation to have newer recommendations.
New pixel_area mode for resolution_type reduces guesswork.
bfloat16 error
comfyUI not reading the LoRAs created this week. that is my fault, it is now back to the way of intiialising adapters we used in the prev release.
- stick with the latest release tag via git checkout v0.9.8.2 to avoid issues with the main branch creeping up on you in the future, if necessary
batch sizes no longer need to be perfectly aligned with the dataset

Thank you for testing the changes and helping make it better.

What's Changed

fix sentencepiece dep, hub username missing in model card by @bghira in #727
Make tokeniser 2 load failure permanent by @bghira in #729
Make Docker foolproof by @komninoschatzipapas in #732
minor fixes follow-up by @bghira in #744
refactor train.py to reduce LoCs. by @sayakpaul in #739
add pixel_area resolution type by @bghira in #756
for #757 - refactor file discovery, update provider req by @bghira in #758
Add input perturbation by @mhirki in #723
merge follow-ups and other fixes to release branch by @bghira in #764
Follow up from #739 by @sayakpaul in #762
update defaults + docs by @bghira in #767
follow-up residuals by @bghira in #768
residuals by @bghira in #769
fix double divide by @bghira in #770
revert from get_peft_model to add_adapter by @bghira in #771

Full Changelog: v0.9.8.1...v0.9.8.2

Contributors

mhirki, sayakpaul, and 2 other contributors

Assets 2

11 Aug 05:01

bghira

v0.9.8.1

6e6385c

v0.9.8.1 - much-improved flux training

Dreambooth results from this release

What's Changed

Quantised Flux LoRA training (but only non-quantised models can resume training still)
More VRAM and System memory use reductions contributed by team and community members
CUDA 12.4 requirement bump as well as blacklisting of Python 3.12
Dockerfile updates, allowing deployment of the latest build without errors
More LoRA training options for Flux Dev
Basic (crappy) Schnell training support
Support for preserving Flux Dev's distillation or introducing CFG back into the model for improved creativity
- CFG skip logic to ensure no blurry results on undertrained LoRAs without requiring a CFG-capable base model

Detailed change list

release: follow-ups, memory reduction, quanto LoRA training by @bghira in #638
quanto: allow further vram reduction with bf16 base weights, reorder model loading operations to evacuate text encoders before DiT loads by @bghira in #647
merge vram reductions & regression fixes by @bghira in #649
CUDA 12.4; more efficient model loading, lower overall sysmem / vram usage; enforce hashed VAE object names by default; reduce default VAE batch size; change default LR scheduler to polynomial by @bghira in #660
update something about kolor in the train.py by @chongxian in #658
refactor saving utitilites part ii by @sayakpaul in #661
Residues of #661 by @sayakpaul in #662
Update Dockerfile by @komninoschatzipapas in #675
merge: reorder model casting, fix kolors on newer diffusers, update cuda, refactor save hooks by @bghira in #666
Flux: Purge text encoders from system RAM. by @mhirki in #694
[Tests] basic save hook tests by @sayakpaul in #679
Add the ability to train all nn.Linear for flux by @AmericanPresidentJimmyCarter in #707
Add real guidance scale to flux validation (optional) by @AmericanPresidentJimmyCarter in #706
flux: import some of kohya suggestions (wip) by @bghira in #711
Add CFG skip for Flux lora training, fix precomputed embeds bug for C… by @AmericanPresidentJimmyCarter in #712
fixed flux training by @bghira in #714
final flux updates for release by @bghira in #716

New Contributors

@chongxian made their first contribution in #658

Full Changelog: v0.9.8...v0.9.8.1

Contributors

mhirki, chongxian, and 4 other contributors

Assets 2

05 Aug 00:57

bghira

v0.9.8

67e702d

v0.9.8 - flux 24 gig training has entered the chat

Flux

It's here! Runs on 24G cards using Quanto's 8bit quantisation, or 25.7G on a Macbook system (slowly)!

If you're after accuracy, a 40G card will do Just Fine, with 80G cards being somewhat of a sweet spot for larger training efforts.

What you get:

LoRA, full tuning (but probably just don't do that)
Documentation to get you started fast
Probably better for just square crop training for now - might artifact for weird resolutions
Quantised base model unlocking the ability to safely use Adafactor, Prodigy, and other neat optimisers as a consolation prize for losing access to full bf16 training (AdamWBF16 just won't work with Quanto)

What's Changed

trainer: simplify check by @bghira in #592
documentation updates, apple pytorch 2.4 by @bghira in #595
staged storage for image embed support by @bghira in #596
fix: loading default image embed backend by @bghira in #597
fix: loading default image embed backend by @bghira in #598
multi-gpu console output improvements by @bghira in #599
vae cache: hash_filenames option for image sets by @bghira in #601
multi-gpu console output reduction by @bghira in #602
fix for relative cache directories with NoneType being unsubscriptable by @bghira in #603
multigpu / relative path fixes for caching by @bghira in #604
backend for csv based datasets by @bghira in #600
CSV data backend by @bghira in #605
config file versioning to allow updating defaults without breaking backwards compat by @bghira in #607
config file versioning for backwards compat by @bghira in #608
experiment: small DiT model by @bghira in #609
merge by @bghira in #610
Fix crash when using jsonl files by @swkang-rp in #611
merge by @bghira in #612
flux training by @bghira in #614
update base_dir to output_dir by @bghira in #615
merge by @bghira in #616
flux: validations should ignore any custom schedulers by @bghira in #618
release: flux by @bghira in #617
bugfix: correctly set hash_filenames to true or false for an initial dataset creation by @bghira in #620
release: minor follow-up fixes by @bghira in #628
Flux: Fix random validation errors due to some tensors being on the cpu by @mhirki in #629
Improve config support for transformers with accelerate by @touchwolf in #630
quanto: exploring low-precision training. by @bghira in #622
remove all text encoders from memory correctly by @bghira in #637

New Contributors

@swkang-rp made their first contribution in #611
@touchwolf made their first contribution in #630

Full Changelog: v0.9.7.8...v0.9.8

Contributors

mhirki, touchwolf, and 2 other contributors

Assets 2

6 Join discussion

21 Jul 22:14

bghira

v0.9.7.8

36754eb

v0.9.7.8 - Kwai Kolors

What's Changed

Kolors is now a first-class citizen with full training support (no ControlNet).

minor bugfixes by @bghira in #589
Kolors training support by @bghira in #574

Full Changelog: v0.9.7.7...v0.9.7.8

Contributors

bghira

Assets 2

17 Jul 20:13

bghira

v0.9.7.7

cb7eccf

v0.9.7.7 - fixed SD3 training

What's Changed

There is a new folder layout.

All configs now belong in config/ directory
sdxl-env.sh is now config/config.env
sd21-env.sh is just gone, train_sd2x.py is now gone, merged into train.py
train_sdxl.py is now just train.py
You will have to update your config.env to point to the config/ subdirectory for multidatbackend.json

Aura Flow v0.1

Train LoRAs and full model (DeepSpeed required) of AuraFlow v0.1 at 1024px.

LoRA is recommended with default settings other than learning rate or LoRA rank depth and alpha. Tweaking these is still recommended.

Stable Diffusion 3

Mostly, your SD3 training should work now because we're no longer zeroing out the embeds for caption dropout. If you weren't using dropout because you're secretly a sly genius, you can continue on as usual.

cleanup debug logs by @bghira in #561
sd3: Fix random errors when generating validation images. (#546) by @mhirki in #562
Fix text encoder unload reporting by @mhirki in #563
Fix cached text embeds not getting loaded from the disk with local backend and relative path in cache_dir by @mhirki in #564
community fixes by @bghira in #565
implement Aura Diffusion training by @bghira in #553
reorganise and eliminate train script split by @bghira in #568
script layout refactor pt. 2 by @bghira in #569
Nuke the text encoders by sending them to "meta" by @Disty0 in #571
auraflow & project layout refactor by @bghira in #573
minor bugfixes, alternative flow-matching loss formulation by @bghira in #577
merge by @bghira in #582
bugfix: loading validation embeds on multi-gpu system (compressed cache) by @bghira in #583
switch default weighting mechanism to none for flow matching models by @bghira in #584
fix SD3 training by @bghira in #586

New Contributors

@mhirki made their first contribution in #562
@Disty0 made their first contribution in #571

Full Changelog: v0.9.7.6...v0.9.7.7

Contributors

mhirki, Disty0, and bghira

Assets 2

06 Jul 05:51

bghira

v0.9.7.6

8adfe0a

v0.9.7.6

What's Changed

Some documentation is missing for some of these new features, but it will be added soon.

Multi-caption support

Supported caption_strategy: parquet, textfile
- textfiles with multiple captions split by newline
- parquet backends can now have multiple caption columns, or, a column that contains a list of captions.

Default noise scheduler (inference)

The default value if none is supplied is now None, which uses the upstream model configuration from the repository.

Prefetch

Some minor bugfixes have gone into this, but it remains an experimental feature with uncertain gains in performance.

Minor features

--torch_num_threads to stop torch from spawning too many threads on big systems
CV2 is now used for image loading, which is also going to invoke libpng for png images, which is very chatty and spews lots of warnings, which we cannot control.

Other bugfixes

Reworked the area resize code and cropping logic to strip out redundancy and improve clarity.
- Ensures we do not see any squished images for a more broad range of aspect ratios, across every cropping and resizing configuration.
- Remove your VAE and aspect bucket caches to take advantage of this.

Changes

remove unneeded imports by @bghira in #544
prefetch: minor bugfixes, epoch tracking
add --torch_num_threads for very large systems
catch delete failure when delete_problematic_images is set by @bghira in #547
Fix bucket search for unseen images not containing the absolute path to the image by @bghira in #549
multi-caption support for textfile and parquet backend by @bghira in #527
Load images preferentially with CV2, fall back to PIL only if that fails by @AmericanPresidentJimmyCarter in #551
bugfix: bucket search for unseen images should prepend the instance data root so that the images can actually be loaded from disk
batch prefetch should be correctly destroyed/shutdown upon error or ctrl+c
VAE embed inconsistency fixed by cloning latent before write by @bghira in #552
Refactor save_hooks by @sayakpaul in #554
cv2: error checking for image load when we hit grayscale images
arguments: set --inference_noise_scheduler to None by default so that PixArt scheduler is uninterrupted by @bghira in #555
refactor area resize for code clarity and fixing non-cropped / downsampled images by @bghira in #558
area resize refactor by @bghira in #559

Full Changelog: v0.9.7.5...v0.9.7.6

Contributors

sayakpaul, bghira, and AmericanPresidentJimmyCarter

Assets 2

28 Jun 11:48

bghira

v0.9.7.5b

184c3f9

v0.9.7.5b - bugfix for imports

What's Changed

Resolve a regression that caused a crash at startup due to missing commit in release branch.

remove unneeded imports by @bghira in #544

Full Changelog: v0.9.7.5...v0.9.7.5b

Contributors

bghira

Assets 2

27 Jun 13:19

bghira

v0.9.7.5

353cf9d

v0.9.7.5 - compression for embed caching

What's Changed

ema: offload to cpu, update every n steps by @bghira in #517
ema: move correctly by @bghira in #520
EMA: refactor to support CPU offload, step-skipping, and DiT models
pixart: reduce max grad norm by default, forcibly by @bghira in #521
remove incorrect log line when using cpu offload for ema by @bghira in #523
add --dataloader_prefetch option for speed-up by @bghira in #535
parquet metadata: retrieve captions near-instantly at startup
multi-gpu: logging cleanup, performance fixes
settings: no invisible default minimum_image_size
text embed cache fix for repeatedly attempting to write files that already exist
text embed cache fix for AWS prefixes not being referenced correctly
embed cache (text & vae) compression via --compress_disk_cache by @bghira in #540

Full Changelog: v0.9.7.4...v0.9.7.5

Contributors

bghira

Assets 2

Releases: bghira/SimpleTuner

v0.9.8.3.1 - state dict fix for final resulting safetensors

What's Changed

Contributors

v0.9.8.3 - essential fixes and improvements

What's Changed

General

Flux

Pull Requests

New Contributors

Contributors

v0.9.8.2 - fixed comfyUI inference, better aspect bucketing limits

What's Changed

Contributors

v0.9.8.1 - much-improved flux training

What's Changed

Detailed change list

New Contributors

Contributors

v0.9.8 - flux 24 gig training has entered the chat

Flux

What's Changed

New Contributors

Contributors

v0.9.7.8 - Kwai Kolors

What's Changed

Contributors

v0.9.7.7 - fixed SD3 training

What's Changed

There is a new folder layout.

Aura Flow v0.1

Stable Diffusion 3

New Contributors

Contributors

v0.9.7.6

What's Changed

Multi-caption support

Default noise scheduler (inference)

Prefetch

Minor features

Other bugfixes

Changes

Contributors

v0.9.7.5b - bugfix for imports

What's Changed

Contributors

v0.9.7.5 - compression for embed caching

What's Changed

Contributors