Original paper: Transparent Image Layer Diffusion using Latent Transparency by Lvmin Zhang, Maneesh Agrawala
This is an unofficial port of Layer Diffuse from the SD Forge WebUI extension to Hugging Face Diffusers framework.
This port only focuses on SDXL and transparent PNG image generation.
pip install -r requirements.txt
python demo_sdxl_attn.py \
--prompt "portrait of woman in suit with messy hair, high resolution, photorealistic, uniform textureless background" \
--negative_prompt "ugly, bad, shadow, artifact, blurry"
import torch
from diffusers_extension.pipeline_stable_diffusion_xl_layer_diffuse import StableDiffusionXLLayerDiffusePipeline
pipeline = StableDiffusionXLLayerDiffusePipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
images = pipeline(
prompt="portrait of woman in suit with messy hair, high resolution, photorealistic, uniform textureless background",
negative_prompt="ugly, bad, shadow, artifact, blurry",
num_inference_steps=20,
width=1024,
height=1024,
generator=torch.Generator(device="cuda").manual_seed(42)
).images
images[0].save("sdxl_layerdiffuse_result.png")
Check out demo_sdxl_attn.py for the complete demo.
Full arguments list:
python demo_sdxl_attn.py \
--seed SEED \
--batch_size BATCH_SIZE \
--guidance_scale GUIDANCE_SCALE \
--num_inference_steps NUM_INFERENCE_STEPS \
--width WIDTH \
--height HEIGHT \
--prompt PROMPT \
--negative_prompt NEGATIVE_PROMPT \
--output_path OUTPUT_PATH \
--disable_memory_optim
I created a class StableDiffusionXLLayerDiffusePipeline
(code here) deriving from diffusers.StableDiffusionXLPipeline
.
StableDiffusionXLLayerDiffusePipeline
can still be initialized with .from_pretrained()
, as one would do for diffusers.StableDiffusionXLPipeline
.
Additionally, this new class takes care of:
- Loading the rank-256 LoRA
layer_xl_transparent_attn.safetensors
to turn SDXL into a transparent image generator- It will change the latent distribution of the model to a "transparent latent space" that can be decoded by the special VAE pipeline
- Loading
vae_transparent_decoder.safetensors
- This is an image decoder that takes SDXL VAE outputs and latent image as inputs, and outputs a real PNG image
- Overload the pipeline's
__call__()
method to automatically forward the output of SDXL (latent + image w/ uniform background) to the VAE Transparent Decoder