Releases: huggingface/diffusers
Patch Release v0.21.1: Fix import and config loading for `from_single_file`
- Fix model offload bug when key isn't present by @DN6 in #5030
- [Import] Don't force transformers to be installed by @patrickvonplaten in #5035
- allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
- [Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036
v0.21.0: Würstchen, Faster LoRA loading, Faster imports, T2I Adapters for SDXL, and more
Würstchen
Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.
Here is how to use the Würstchen as a pipeline:
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")
caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(
caption,
height=1024,
width=1536,
prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
prior_guidance_scale=4.0,
num_images_per_prompt=4,
).images
To learn more about the pipeline, check out the official documentation.
This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.
👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen
T2I Adapters for Stable Diffusion XL (SDXL)
T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.
In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.
Below is an how to use the StableDiffusionXLAdapterPipeline
.
First ensure, the controlnet_aux
is installed:
pip install -U controlnet_aux==0.0.7
Then we can initialize the pipeline:
import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid
# load adapter
adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")
# load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
model_id, subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id,
vae=vae,
adapter=adapter,
scheduler=euler_a,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
# load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
We then load an image to compute the lineart conditionings:
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)
Then we generate:
prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=image,
num_inference_steps=30,
adapter_conditioning_scale=0.8,
guidance_scale=7.5,
).images[0]
Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline
.
This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.
We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.
Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline
in #4696.
Faster imports
We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines
, models
, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers
.
With lazy imports:
real 0m0.417s
user 0m0.714s
sys 0m0.499s
Without lazy imports:
real 0m5.391s
user 0m5.299s
sys 0m1.273s
Faster LoRA loading
Previously, loading LoRA parameters using the load_lora_weights()
used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage
argument to the load_lora_weights()
method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True
to take the benefits.
LoRA fusing
LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.
For more information, have a look at the documentation and the original PR: #4473.
More support for LoRAs
Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.
All commits
- fix: lora sdxl tests by @sayakpaul in #4652
- Support tiled encode/decode for
AutoencoderTiny
by @Isotr0py in #4627 - Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
- add config_file to from_single_file by @zuojianghua in #4614
- Add AudioLDM 2 by @sanchit-gandhi in #4549
- [docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
- [Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
- [LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
- [LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
- Replaces
DIFFUSERS_TEST_DEVICE
backend list with trying device by @vvvm23 in #4673 - add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
- Add reference_attn & reference_adain support for sdxl by @zideliu in #4502
- [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
- rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
- Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
- Fix all docs by @patrickvonplaten in #4721
- fix bad error message when transformers is missing by @patrickvonplaten in #4714
- Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
- [Examples] fix checkpointing and casting bugs in
train_text_to_image_lora_sdxl.py
by @sayakpaul in #4632 - [AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
- [docs] add variant="fp16" flag by @realliujiaxu in #4678
- [AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
- fix dummy import for AudioLDM2 by @patil-suraj in #4741
- change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
- add a step_index counter by @yiyixuxu in #4347
- [AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
- Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
- Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
- [Tests] Fix paint by example by @patrickvonplaten in #4761
- [fix] multi t2i adapter set total_downscale_factor by @williamberman in #4621
- [Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
- [LoRA] relax lora loading logic by @sayakpaul in #4610
- [Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
- fix sdxl_lwp empty neg_prompt error issue by @xhinker in #4743
- improve setup.py by @sayakpaul in #4748
- Torch device by @patrickvonplaten in #4755
- [AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
- Convert MusicLDM by @sanchit-gandhi in #4579
- [WIP ] Proposal to address precision issues in CI by @DN6 in #4775
- fix a bug in
from_pretrained
when load optional components by @yiyixuxu in #4745 - fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
- Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
- instance_prompt->class_prompt by @williamberman in #4784
- refactor prepare_mask_and_masked_image with VaeImageProcessor by @yiyixuxu in #4444
- Allow passing a checkpoint state_dict to convert_from_ckpt (instead of just a string path) by @cmdr2 in #4653
- [SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
- [Core] Support negative conditions in SDXL by @sayakpaul in #4774
- Unet fix by @canberk17 in #4769
- [Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
- [docs] Fix syntax for compel by @stevhliu in #4794
- [Torch compile] Fix torch comp...
Patch Release 0.20.2 - Correct SDXL Inpaint Strength Default
Stable Diffusion XL's strength default was accidentally set to 1.0 when creating the pipeline. The default should be set to 0.9999 instead. This patch release fixes that.
All commits
- [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
Patch Release: Fix `torch.compile()` support for ControlNets
3eb498e#r125606630 introduced a 🐛 that broke the torch.compile()
support for ControlNets. This patch release fixes that.
All commits
- [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
- [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
v0.20.0: SDXL ControlNets with MultiControlNet, GLIGEN, Tiny Autoencoder, SDXL DreamBooth LoRA in free-tier Colab, and more
SDXL ControlNets 🚀
The 🧨 diffusers team has trained two ControlNets on Stable Diffusion XL (SDXL):
You can find all the SDXL ControlNet checkpoints here, including some smaller ones (5 to 7x smaller).
To know more about how to use these ControlNets to perform inference, check out the respective model cards and the documentation. To train custom SDXL ControlNets, you can try out our training script.
MultiControlNet for SDXL
This release also introduces support for combining multiple ControlNets trained on SDXL and performing inference with them. Refer to the documentation to learn more.
GLIGEN
The GLIGEN model was developed by researchers and engineers from University of Wisconsin-Madison, Columbia University, and Microsoft. The StableDiffusionGLIGENPipeline
can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes, if input images are given, this pipeline can insert objects described by text at the region defined by bounding boxes. Otherwise, it’ll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It’s trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.
(GIF from the official website)
Grounded inpainting
import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image
# Insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
"masterful/gligen-1-4-inpainting-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
input_image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gligen/livingroom_modern.png"
)
prompt = "a birthday cake"
boxes = [[0.2676, 0.6088, 0.4773, 0.7183]]
phrases = ["a birthday cake"]
images = pipe(
prompt=prompt,
gligen_phrases=phrases,
gligen_inpaint_image=input_image,
gligen_boxes=boxes,
gligen_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save("./gligen-1-4-inpainting-text-box.jpg")
Grounded generation
import torch
from diffusers import StableDiffusionGLIGENPipeline
from diffusers.utils import load_image
# Generate an image described by the prompt and
# insert objects described by text at the region defined by bounding boxes
pipe = StableDiffusionGLIGENPipeline.from_pretrained(
"masterful/gligen-1-4-generation-text-box", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage"
boxes = [[0.1387, 0.2051, 0.4277, 0.7090], [0.4980, 0.4355, 0.8516, 0.7266]]
phrases = ["a waterfall", "a modern high speed train running through the tunnel"]
images = pipe(
prompt=prompt,
gligen_phrases=phrases,
gligen_boxes=boxes,
gligen_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save("./gligen-1-4-generation-text-box.jpg")
Refer to the documentation to learn more.
Thanks to @nikhil-masterful for contributing GLIGEN in #4441.
Tiny Autoencoder
@madebyollin trained two Autoencoders (on Stable Diffusion and Stable Diffusion XL, respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use AutoencoderTiny
to take advantage of it.
Here’s the example usage for Stable Diffusion:
import torch
from diffusers import DiffusionPipeline, AutoencoderTiny
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")
Refer to the documentation to learn more. Refer to this material to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.
Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook
Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like LoRA, fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.
Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via bitsandbytes
), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.
Check out the Colab Notebook to learn more.
Thanks to @ethansmith2000 for improving the gradient checkpointing support in #4474.
Support of push_to_hub
for models, schedulers, and pipelines
Our models, schedulers, and pipelines now support an option of push_to_hub
via the save_pretrained()
and also come with a push_to_hub()
method. Below are some examples of usage.
Models
from diffusers import ControlNetModel
controlnet = ControlNetModel(
block_out_channels=(32, 64),
layers_per_block=2,
in_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
cross_attention_dim=32,
conditioning_embedding_out_channels=(16, 32),
)
controlnet.push_to_hub("my-controlnet-model")
# or controlnet.save_pretrained("my-controlnet-model", push_to_hub=True)
Schedulers
from diffusers import DDIMScheduler
scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
scheduler.push_to_hub("my-controlnet-scheduler")
Pipelines
from diffusers import (
UNet2DConditionModel,
AutoencoderKL,
DDIMScheduler,
StableDiffusionPipeline,
)
from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer
unet = UNet2DConditionModel(
block_out_channels=(32, 64),
layers_per_block=2,
sample_size=32,
in_channels=4,
out_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
cross_attention_dim=32,
)
scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
vae = AutoencoderKL(
block_out_channels=[32, 64],
in_channels=3,
out_channels=3,
down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
latent_channels=4,
)
text_encoder_config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=32,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
components = {
"unet": unet,
"scheduler": scheduler,
"vae": vae,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"safety_checker": None,
"feature_extractor": None,
}
pipeline = StableDiffusionPipeline(**components)
pipeline.push_to_hub("my-pipeline")
Refer to the documentation to know more.
Thanks to @Wauplin for his generous and constructive feedback (refer to this #4218) on this feature.
Better support for loading Kohya-trained LoRA checkpoints
Providing seamless support for loading Kohya-trained LoRA checkpoints from diffusers
is important for us. This is wh...
Patch release: Fix incorrect filenaming
0.19.3 is a patch release to make sure import diffusers
works without transformers
being installed.
It includes a fix of this issue.
All commits
[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in #4370
Patch Release: Support for SDXL Kohya-style LoRAs, Fix batched inference SDXL Img2Img, Improve watermarker
We still had some bugs 🐛 in 0.19.1 some bugs, notably:
SDXL (Kohya-style) LoRA
The official SD-XL 1.0 LoRA (Kohya-styled) is now supported thanks to #4287. You can try it as follows:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors")
pipe.to("cuda")
prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle"
negative_prompt = "text, watermark"
image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=25).images[0]
In addition, a couple more SDXL LoRAs are now supported:
(SDXL 0.9:)
- https://civitai.com/models/22279?modelVersionId=118556
- https://civitai.com/models/104515/sdxlor30costumesrevue-starlight-saijoclaudine-lora
- https://civitai.com/models/108448/daiton-sdxl-test
- https://filebin.net/2ntfqqnapiu9q3zx/pixelbuildings128-v1.safetensors
To know more details and the known limitations, please check out the documentation.
Thanks to @isidentical for their sincere help in the PR.
Batched inference
@bghira found that for SDXL Img2Img batched inference led to weird artifacts. That is fixed in: #4327.
Downloads
Under some circumstances SD-XL 1.0 can download ONNX weights which is corrected in #4338.
Improved SDXL behavior
#4346 allows the user to disable the watermarker under certain circumstances to improve the usability of SDXL.
All commits:
- [SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
- [ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
- [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
- [Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
Patch Release: Fix torch compile and local_files_only
In 0.19.0 some bugs 🐛 found their way into the release. We're very sorry about this 🙏
This patch releases fixes all of them.
All commits
- update Kandinsky doc by @yiyixuxu in #4301
- [Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
- Fix SDXL conversion from original to diffusers by @duongna21 in #4280
- fix a bug in StableDiffusionUpscalePipeline when
prompt
isNone
by @yiyixuxu in #4278 - [Local loading] Correct bug with local files only by @patrickvonplaten in #4318
- Release: v0.19.1 by @patrickvonplaten (direct commit on v0.19.1-patch)
v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter
SDXL 1.0
Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers
.
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image
Many additional cool features are released:
- Pipelines for
- Img2Img
- Inpainting
- Torch compile support
- Model offloading
- Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter
Refer to the documentation to know more.
New training scripts for SDXL
When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:
Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.
New pipelines for SDXL
The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:
StableDiffusionXLControlNetPipeline
StableDiffusionXLInstructPix2PixPipeline
The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.
Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline
in #4079.
The AutoPipeline API
We now support Auto
APIs for the following tasks: text-to-image, image-to-image, and inpainting:
Here is how to use one:
from diffusers import AutoPipelineForTextToImage
import torch
pipe_t2i = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")
prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")
Without any extra memory, you can then switch to Image-to-Image
from diffusers import AutoPipelineForImageToImage
pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)
image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")
Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.
Refer to the documentation to know more.
A new “combined pipeline” for the Kandinsky series
We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:
from diffusers import AutoPipelineForTextToImage
import torch
pipe = AutoPipelineForTextToImage.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0]
image.save("image.png")
The following pipelines, which can be accessed via the "Auto" pipelines were added:
- KandinskyCombinedPipeline
- KandinskyImg2ImgCombinedPipeline
- KandinskyInpaintCombinedPipeline
- KandinskyV22CombinedPipeline
- KandinskyV22Img2ImgCombinedPipeline
- KandinskyV22InpaintCombinedPipeline
To know more, check out the following pages:
🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨
NOW: mask_image
repaints white pixels and preserves black pixels.
Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.
Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:
# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)
# For PyTorch and Numpy input
mask = 1 - mask
Asymmetric VQGAN
Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:
from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
def download_image(url: str) -> Image.Image:
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")
Refer to the documentation to know more.
Thanks to @cross-attention for contributing this model in #3956.
Improved support for loading Kohya-style LoRA checkpoints
We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers
. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers
. Users can expect further improvements in the upcoming releases.
Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.
T2I Adapter
pip install matplotlib
from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline
def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
"""Converts a depth map to a color image.
Args:
value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
vmax (float, optional): vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
value_transform (Callable, optional): Apply transform funct...
Patch Release: v0.18.2
Patch release to fix:
-
torch.compile
for SD-XL for certain GPUs
-
from_single_file
for all SD models
-
- Fix broken ONNX export
-
- Fix incorrect VAE FP16 casting
-
- Deprecate loading variants that don't exist
Note:
Loading any stable diffusion safetensors or ckpt with StableDiffusionPipeline.from_single_file
or StableDiffusionmg2ImgIPipeline.from_single_file
or StableDiffusionInpaintPipeline.from_single_file
or StableDiffusionXLPipeline.from_single_file
, ...
is now almost as fast as from_pretrained(...)
and it's much more tested now.
All commits:
- Make sure torch compile doesn't access unet config by @patrickvonplaten in #4008
- [DiffusionPipeline] Deprecate not throwing error when loading non-existant variant by @patrickvonplaten in #4011
- Correctly keep vae in
float16
when using PyTorch 2 or xFormers by @pcuenca in #4019 - minor improvements to the SDXL doc. by @sayakpaul in #3985
- Remove remaining
not
in upscale pipeline by @pcuenca in #4020 - FIX
force_download
in download utility by @Wauplin in #4036 - Improve single loading file by @patrickvonplaten in #4041
- keep _use_default_values as a list type by @oOraph in #4040