[Paper] [Demo in 🤗Hugging Face Space] [Code and Pre-trained Models][Colab Notebook]
by Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu
- (New) 2024/06/07 Our large-scale Rectified Flow is extended to text-to-3D and image inversion/editing! Check out the amazing work from Xiaofeng Yang et al. (paper and code)!
- 2024/05/17 Try our new few-step model PeRFlow at here!
- 2023/12/04 We updated the demo in 🤗Hugging Face Space with InstaFlow+dreamshaper-7. Image quality significantly improves! We also provide the Gradio demo for you to run locally here.
- 2023/12/04 One-step InstaFlow is compatible with pre-trained LoRAs! See here. Code is available here. (We thank individual contributor Dr. Hanshu Yan)
- 2023/12/04 ONNX support is available now! [ONNX InstaFlow] [ONNX 2-Rectified Flow] [ONNXStack UI] (We thank saddam213)
- 2023/11/23 Colab notebook is online now. Try it here. (We thank individual contributor xaviviro)
- 2023/11/22 One-step InstaFlow is compatible with pre-trained ControlNets. See here. (We thank individual contributor Dr. Hanshu Yan)
- 2023/11/22 We release the pre-trained models and inference codes here.
- 2023/09/26 We provide a demo of InstaFlow-0.9B in 🤗Hugging Face Space. Try it here.
Diffusion models have demonstrated remarkable promises in text-to-image generation. However, their efficacy is still largely hindered by computational constraints stemming from the need of iterative numerical solvers at the inference time for solving the diffusion/flow processes.
InstaFlow is an ultra-fast
, one-step
image generator that achieves image quality close to Stable Diffusion, significantly reducing the demand of computational resources. This efficiency is made possible through a recent Rectified Flow technique, which trains probability flows with straight trajectories, hence inherently requiring only a single step for fast inference.
InstaFlow has several advantages:
Ultra-Fast Inference
: InstaFlow models are one-step generators, which directly map noises to images and avoid multi-step sampling of diffusion models. On our machine with A100 GPU, the inference time is around 0.1 second, saving ~90% of the inference time compared to the original Stable Diffusion.High-Quality
: InstaFlow generates images with intricate details like Stable Diffusion, and have similar FID on MS COCO 2014 as state-of-the-art text-to-image GANs, like StyleGAN-T.Simple and Efficient Training
: The training process of InstaFlow merely involves supervised training. Leveraging pre-trained Stable Diffusion, it only takes 199 A100 GPU days to get InstaFlow-0.9B.
interpolation.mp4
One-step InstaFlow is compatible with pre-trained LoRAs. We thank individual contributor Dr. Hanshu Yan for providing and testing the Rectified Flow+LoRA pipeline!
InstaFlow seems to have higher diversity than SDXL-Turbo.
lora.mp4
One-step InstaFlow is fully compatible with pre-trained ControlNets. We thank individual contributor Dr. Hanshu Yan for providing and testing the Rectified Flow+ControlNet pipeline!
Below are One-Step Generation with InstaFlow-0.9B + ControlNet:
For an intuitive understanding, we used the same A100 server and took screenshots from the Gridio interface of random generation with different models. InstaFlow-0.9B is one-step, while SD 1.5 adopts 25-step DPMSolver. It takes around 0.3 second to download the image from the server. The text prompt is "A photograph of a snowy mountain near a beautiful lake under sunshine."
InstaFlow-0.9B | Stable Diffusion 1.5 |
---|
method_github.mov
Our pipeline consists of three steps:
- Generate (text, noise, image) triplets from pre-trained Stable Diffusion
- Apply
text-conditioned reflow
to yield 2-Rectified Flow, which is a straightened generative probability flow. - Distill from 2-Rectified Flow to get One-Step InstaFlow. Note that distillation and reflow are
orthogonal techniques
.
As captured in the video and the image, straight flows have the following advantages:
- Straight flows require fewer steps to simulate.
- Straight flows give better coupling between the noise distribution and the image distribution, thus allow successful distillation.
We provide several related links and readings here:
-
The official Rectified Flow github repo (https://github.com/gnobitab/RectifiedFlow)
-
An introduction of Rectified Flow (https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html)
-
An introduction of Rectified Flow in Chinese--Zhihu (https://zhuanlan.zhihu.com/p/603740431)
-
FlowGrad: Controlling the Output of Generative ODEs With Gradients (https://github.com/gnobitab/FlowGrad)
-
Fast Point Cloud Generation with Straight Flows (https://github.com/klightz/PSF)
-
Piecewise Rectified Flow (https://github.com/magic-research/piecewise-rectified-flow)
-
Text-to-Image Rectified Flow as Plug-and-Play Priors (https://github.com/yangxiaofeng/rectified_flow_prior)
@inproceedings{liu2023instaflow,
title={Instaflow: One step is enough for high-quality diffusion-based text-to-image generation},
author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
booktitle={International Conference on Learning Representations},
year={2024}
}
Our training scripts are modified from one of the fine-tuning examples in Diffusers. Other parts of our work also heavily relies on the 🤗 Diffusers library.