Releases: bghira/SimpleTuner
v0.9.3.2
v0.9.3.1 - follow-up improvements for fp16/fp32 removal
What's Changed
- pesky vae cache bugs by @bghira in #342
- NaN guards for VAE
- Remove fp16 fully
- Remove fp32 training options
- Remove upcast logic
- Add dreambooth guide
- Fix 'instanceprompt' caption strategy
- Disable multiprocessing by default to save memory by @bghira in #344
Full Changelog: v0.9.3...v0.9.3.1
v0.9.3 - no more autocast
- option renamed,
vae_cache_behaviour
->vae_cache_scan_behaviour
- add
--encode_during_training
to skip pre-processing of vae embeds (only - for now) - majorly reworked vae embed management code means there may be some issues at first
- debug.log in project dir root now contains the DEBUG level outputs
- add
--adam_bfloat16
for sdxl and sd2.x, essentially mandatory now. - precision level improvements and fixes for SD 2.x and SDXL training
- fp16 is no longer supported on SD 2.x or SDXL. use bf16.
- the SD 2.x VAE can still run in fp16 mode, but it's not recommended. it's bf16 by default now.
- pytorch 2.3 requirement on apple mps for access to bf16
- mps: fixes to loading torch files from disk, moving items into correct dtypes
- mps: fix multiprocessing, enable by default while preserving
--disable_multiprocessing
as a workaround- apple python uses "spawn" multiprocessing strategy but linux uses "fork", switched to fork by default
- mps: enable unet attention slicing on SD 2.x to avoid NDArray crash in MPS
- (for large datasets)
preserve_data_backend_cache
now accepts string values as well as bool, "image" and "text" to preserve just a split of the cache. - skip certain OS directories on MacOS and Jupyter notebooks
pull requests
- sd2x: num steps remaining fix
- vaecache: exit with problematic data backend id by @bghira in #332
- Feature/on demand vae cache by @bghira in #336
- stochastic bf16 impl by uptightmoose by @bghira in #337
- next by @bghira in #339
Full Changelog: v0.9.2...v0.9.3
v0.9.2 - an apple a day
What's Changed
Train LoRAs for SD 1.x/2.x/XL models on Apple hardware now.
- metadatabackend: add parquet support for metadata by @bghira in #321
- Much quicker metadata experience for extremely-large datasets.
- remove
--crop
as global argument- Use
crop
,crop_style
, andcrop_aspect
viamultidatabackend.json
- Use
- added
--disable_multiprocessing
for certain situations where it may help performance by @bghira in #328 - apple mps: various bugfixes for LoRA training, SDXL, SD 2.x
- sd2x: various bugfixes for EMA, validations noise scheduler config
- add --disable_multiprocessing for possible performance improvements on certain systems
- metadata: abstract logic into pluggable backends
- metadata: support for parquet backend, pull data directly from Pandas dataframes
- vaecache: improve and fix logic for scan_for_errors=true
- aspect bucketing: make it more robust for extremely diverse datasets by @bghira in #323
Full Changelog: v0.9.1...v0.9.2
v0.9.1 - DoRA the explorah
This release has some breaking changes for users who:
- Use
RESOLUTION_TYPE=area
(resolution_type=area
for multidatabackend config) - Use
crop=false
- Use
crop=true
andcrop_aspect=preserve
as the precision level for aspect buckets has changed.
Updating to this release is recommended, but if you're in the middle of a training run, don't update yet.
What's Changed
- prompt handler: check for index error before requesting caption from parquet backend
- vaecache: more robust handling of batched encoding
- vaecache: fix for trailing slash existing in cache_dir property resulting in excessive scans at startup
- bucket manager: cheaply remove duplicate images from the dataset during sorting
- aspect bucketing: modify precision of bucket categories to 3 decimals from 2, this might require re-caching aspect buckets and VAE outputs for users whose configuration matches the earlier description
- sd2.x: LoRA training fixes, still not quite right, but better
- DoRA: initial support from PEFT integrated for XL and legacy models.
Full Changelog: v0.9.0...v0.9.1
v0.9.0
note: these changes include all of the v0.9 improvements through all RCs, since v0.8.2.
SimpleTuner v0.9.0 Release Notes
I'm excited to announce the release of SimpleTuner v0.9.0! This release includes numerous features, improvements, and bug fixes that enhance the functionality, stability, and performance of SimpleTuner.
An experimental multi-node captioning script is included that was used to create photo-concept-bucket, a free dataset produced by my group that contains roughly 568k CogVLM image-caption pairs, incl other metadata such as dominant colours and aspect ratio.
Below is a summary of the key changes for v0.9.0.
New Features
- Multi-Dataset Sampler: Enhanced support for training with multiple datasets, enabling more flexible and varied data feeding strategies.
- Caption Filter Lists for Dataloaders: Ability to filter captions directly in the text embed layer, improving data quality for training.
- Sine Learning Rate Scheduler: Introduced a sine scheduler to optimize learning rate adjustments, starting training at
lr_end
instead oflearning_rate
. - LoRA Trainer Support: Integration of LoRA (Low-Rank Adaptation) for efficient model training for SDXL and SD 1.5/2.x.
- Advanced Caption Controls: Introduction of the parquet caption strategy, offering more efficient control over caption processing, especially for datasets with millions of images.
- Distributed VAE Encoding/Captioning: Support for distributed VAE encoding and captioning scripts, enhancing performance for large-scale datasets.
- Multi-Stage Resize for Very-Large Images: Improvements in handling very large images through multi-stage resizing, potentially reducing artifacts.
- CSV to S3 Direct Upload: Functionality to upload data directly to S3 from CSV without saving locally first, streamlining data preparation.
Improvements
- VAE Cache: Fixes and enhancements in VAE cache handling, including rebuilds in case of errors or every epoch for 'freshness'.
- Dataset repeats are now implemented, such that an embed can be seen n times before it will be considered exhausted.
- Text Embedding Cache: Optimizations in text embedding cache generation, writing, and processing for improved performance. Fixes to the threading behaviour.
- CogVLM and Diffusers: Updates including default 4bit inference for CogVLM and bump to Diffusers 0.26.0.
- AWS S3 and SD 2.x Fixes: Various fixes and enhancements for S3 data backends and Stable Diffusion 2.x support, including multi-GPU training and LoRA support fixes.
- Logging Reduction: Major reduction in debug noise for cleaner and more informative logs.
Bug Fixes
- Caption Processing: Fixes for issues related to
prepend_instance_prompt
doubling up prompt contents and handling of captions from parquet databases. - Optimizer Adjustments: Various fixes and adjustments for optimizers, including Adafactor and AdamW.
- Training State Handling: Fixes for save/load state issues, ensuring correct handling of global steps, epochs, and more.
Breaking Changes
- instance_data_dir is no longer in use - you must configure a data backend loader. See DATALOADER for more information.
- CogVLM Filename Cleaning: Disabled filename cleaning by default. Projects relying on automatic filename cleaning will need to adjust their preprocessing accordingly.
- Configuration Values: Configuration names and values have changed. Ensure to review your configuration.
Documentation and Miscellaneous
- Documentation Updates: Comprehensive updates to installation guides, tutorials, and README to reflect new features and changes.
- Kohya Config Conversion Script: Provided a script to convert Kohya basic parameters into SimpleTuner command-line arguments, facilitating easier migration and setup.
Full Changelog from v0.8.2: View the complete list of changes
We thank all contributors who have helped shape this release. Your contributions, bug reports, and feedback have been invaluable. Happy tuning!
v0.9.0-rc10
What's Changed
- feature: sine scheduler so that training begins at lr_end by @bghira in #311
- bugfix: prepend_instance_prompt was simply doubling up prompt contents by @bghira in #312
- a script for converting kohya basic params into simpletuner cmdline args by @bghira in #313
Full Changelog: v0.9.0-rc9...v0.9.0-rc10
v0.9.0-rc9
The work-in-progress Terminus checkpoint, trained with this release.
What's Changed
- sd2.x: fix multi-gpu training with wandb
- sd2.x: adafactor fixes by @bghira in #307
- remove test.img folder writes debug code by @bghira in #308
- slight fix-ups with batching and progress bars by @bghira in #309
Full Changelog: v0.9.0-rc8...v0.9.0-rc9
v0.9.0-rc8
What's Changed
Documentation fixes, improvement to training quality by removing DDIM from the setup.
Fixes for optimizers and unusual configuration value combinations.
Fixes for slow writes for text embed caching being overlooked, resulting in training not finding embeds.
- fix documentation by @bghira in #296
- Fix for zero snr_gamma
- Fix subfolder support
- Add null check for subfolders by @bghira in #300
- prodigy optimizer added
- wandb fix for multigpu training
- adafactor: fix bugs, make it work like AdamW. added --adafactor_relative_step for the truly adventurous
- fix xformers / other deps
- zsnr: do not use betas from ddim sampler, instead use ddpm directly
- adamw: fix non-8bit optimizer settings for LR not being passed in
- text embed cache: add write thread progress bar, slow write warning and delay for encoding when we hit the buffer size. by @bghira in #305
Full Changelog: v0.9.0-rc7...v0.9.0-rc8
v0.9.0-rc7 - bugfix release
What's Changed
Full Changelog: v0.9.0-rc6...v0.9.0-rc7