Releases: bghira/SimpleTuner
Releases · bghira/SimpleTuner
v0.9.5.3c img2img validations for the sdxl refiner
What's Changed
- sdxl refiner: option
--sdxl_refiner_uses_full_range
for training on all timesteps by @bghira in #401 - sdxl refiner: ability to validate on images using 20% denoise strength
- deepfloyd: stage II eval fixes
- factory should sleep when waiting for text embed write by @bghira in #402
- fixes #351 by adding --variant option by @bghira in #403
- more toolkit options for captioning: gemini pro, blip3 by @bghira in #404
Full Changelog: v0.9.5.3b...v0.9.5.3c
v0.9.5.3b - SDXL refiner training
What's Changed
- SDXL refiner training support - LoRA and full u-net. Can't reuse the text embeds from the base model, you must use a different directory.
- validations: completely refactored workflow
- huggingface hub: now can use
--push_checkpoints_to_hub
to upload all intermediary checkpoints - dropout: improve implementation to bypass any issues with tokenizer setup that might result in an incorrect embed. by @bghira in #388
- lr schedules: polynomial fixed / last_epoch being set correctly for the rest
- parquet backend will ignore missing captions
- deepfloyd: text encoder loading fixed
- sd2.x: tested, bugfixed. uncond text embeds excluded from zeroing
- huggingface hub: improved model card, --push_checkpoints_to_hub will push every saved model and validation image (tested with 168 validation prompts)
- mps: new pytorch nightly, resolves some strange issues before.
- mps: use 'auto' slice width for sd 2.x instead of null
- validations: refactored logic entirely, cleaned up and simplified to tie-in with huggingface hub uploader
- timestep schedule is segmented by train_batch_size now, ensuring we hit a more broad distribution of timestep sampling for each mini-batch by @bghira in #391
- follow-up fixes from botched v0.9.5.3 build by @bghira in #397
Full Changelog: v0.9.5.2...v0.9.5.3b
v0.9.5.2 - hugging face hub upload fixes/improvements, minor vae encoding fixes
What's Changed
- huggingface hub model upload improvement / fixes
- validations double-run fix
- json backend image size microconditioning input fix (SDXL) by @bghira in #385
- bitfit restrictions / model freezing simplification
- updates to huggingface hub integration, automatically push model card and weights
- webhooks: minor log level fixes, other improvements. ability to debug image cropping by sending them to discord.
- resize and crop fixes for json and parquet backend edge cases (VAE encode in-flight) by @bghira in #386
Full Changelog: v0.9.5.1...v0.9.5.2
v0.9.5.1
v0.9.5 - now with more robust flavour
Finetuning Terminus XL Velocity v2
What's Changed
- New cropping logic is now working across the board for parquet/json backends. Images are always cropped now, even when
cropped=false
, if necessary to maintain 8px or 64px alignment with the resulting dataset.- Resulting image sizes and aspect ratios did not change for
resolution_type=area
- Resulting image sizes and aspect ratios did change for
resolution_type=pixel
- This was necessary to avoid stretching/squeezing images when aligning to 64px interval
- Resulting image sizes and aspect ratios did not change for
- Discord webhook support, see the TUTORIAL for information.
- "Sensible defaults" are now set for
minimum_image_size
,maximum_image_size
, andtarget_downsample_size
to avoid unexpected surprises mostly when usingcrop=true
, but also for some benefits when usingcrop=false
as well. - Image upscaling restrictions have been relaxed, but it will refuse to upscale an image beyond 25%, and instead asks you to change the dataset configuration values.
- Image quality when training SDXL models has substantially improved thanks to the minimisation of the microconditioning input ranges:
Finetuning a particularly poorly-performing Terminus checkpoint with reduced high frequency patterning - Single subject dreambooth was benchmarked on SDXL with 30 diverse images, achieving great results in just 500 steps.
Commits
- Convert image to accepted format for calculate_luminance by @Beinsezii in #376
- vae cache fix for SDXL / legacy SD training
- epoch / resume step fix for a corner case where the path to the training data includes the dataset name by @bghira in #377
- when crop=false, we will crop from the intermediary size to the target size instead of squishing
- set default min_image_size, maximum_image_size, and target_downsample_size values to 100%, 150%, and 150% of the value set for resolution by @bghira in #378
- resolved bugged-out null embed when dropout is disabled
- discord webhook support
- cuda/rocm: bugfix for eval on final legacy (sd 1.5/2.1) training validations
- avoid stretching/squeezing images by always cropping to maintain 8/64px alignment
- set default values for minimum_image_size, maximum_image_size, and target_downsample_size by @bghira in #379
Full Changelog: v0.9.5-beta...v0.9.5-beta2
v0.9.5-beta - optimized training, 3x speed-up
What's Changed
This release includes an experimental rewrite of the image handling code. Please report any issues.
- flexible pixel resize to 8 or 64 px alignment, no more rounding up where unnecessary by @bghira in #368
- more deepfloyd stage II fixes for model evaluation by @bghira in #369
- AMD/ROCm support by @Beinsezii in #373
- TrainingSample: refactor and encapsulate image handling, improving performance and reliability by @bghira in #374
- fix --aspect_bucket_rounding not being applied correctly
- rebuild image sample handling to be structured object-oriented logic
- fix early epoch exit problem
- max epochs vs max steps ambiguity reduced by setting default to 0 for one of them
- fixes for LoRA text encoder save/load hooks
- optimise trainer
- 300% performance gain by removing the torch anomaly detector
- fix dataset race condition where a single image dataset was not being detected
- AMD documentation for install, dependencies thanks to Beinsezii
- fix for wandb timestep distribution chart values racing ahead of reality by @bghira in #375
Full Changelog: v0.9.4...v0.9.5-beta
v0.9.4 - the deepest of floyds
(DeepFloyd stage I LoRA trained using v0.9.4)
What's Changed
- parquet: fix aspect bucketing
- json: mild optimisation
- llava: add 1.6 support
- pillow: fix deprecations by @bghira in #350
- (#343) fix for image backend load failure by @bghira in #352
- sdxl: validations fix at the end
- more example scripts for the toolkit
--aspect_bucket_rounding
by @bghira in #359- DeepFloyd IF Stage II and II LoRA/full u-net training by @bghira in #361
- Add Dockerfile by @komninoschatzipapas in #353
- multi-res validations via
--validation_resolution=1024,1024x1536,...
- disable torch inductor by default
New Contributors
- @komninoschatzipapas made their first contribution in #353
Full Changelog: v0.9.3.4...v0.9.4
v0.9.3.4 - parquet/multi-gpu improvements
What's Changed
- bugfix: multigpu training would gradually erode aspect bucket list by saving splitted version
- add --lora_init_type argument
- multi-gpu optimisations
- parquet backend speedup by 100x by @bghira in #349
Full Changelog: v0.9.3.3...v0.9.3.4
v0.9.3.3 - faster startup
What's Changed
- multigpu fixes - logging, startup, resume, validation
- regression fixes - torch tensor dtype error during CUDA validations
- better image detection - "missing images" occur less frequently/not at all
- tested jpg/png mixed datasets
- face detection documentation updates
- higher NCCL timeout
- diffusers update to v0.27.2
- mps pytorch nightly 2.4 by @bghira in #346
Full Changelog: v0.9.3.2...v0.9.3.3