This repository contains the implementation of the paper:
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Face Synthesis
Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit,Ye Wang, Toshiaki Koike-Akino, Vishal M Patel, Tim K Marks
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
From Mitsubishi Electric Research Labs and VIU Lab, Johns Hopkins University
[Paper] | [Project Page]
Keywords: Zero Shot Generation, Conditional Face Generation, Multimodal Face generation, Text to image generation, Diffusion based Face Generation
We propose Steered Diffusion, where users can perform zero shot conditional generation using conditional generation. (a) Linear Inverse Problems. Can generate clean images linear conditions like Super resolution, Colorization and Inpainting . (b) Complex Network based conditioning. Users can give generic conditions like segmentation maps, identity image and text based generation.
- We propose steered diffusion, a general plug-and-play framework that can utilize various pre-existing models to steer an unconditional diffusion model.
- We present the first work applicable for both label-level synthesis and image-to-image translation tasks, and we demonstrate its effectiveness for various applications.
- We propose an implicit conditioning-based sampling strategy that significantly boosts the performance of conditional sampling from the unconditional diffusion models compared to previous methods.
- We introduce a new strategy that utilizes multiple steps of projected gradient descent to improve sample quality.
conda env create -f environment.yml
Please download the pretrained models using
python utils/download_models.py
You can test on any custom datasets by arranding the data in the following format. Please note that you can choose to give either one or more modalities
├── data
| ├── images
| └── masks
For testing purposes, for simplicity, the code expects a clean RGB image as input for the variable --img_path mentioned below. The degraded image is extracted from the input by the code based on the condition being tested. Please modify the code as needed to inpout a degraded image directly. Also note that, for inpainting, a separate mask file needs to be provided. For testing the conditions:
grayscale: converts a grayscale image to an RGB image
SR: super-resolution
inpaint
Identity: generates new images with the same identity as the given image
Semantics: generated new images with the same semantic face parsing map as the given image
please use the command
python steered_generate.py --config configs/diffusion_config.yml --img_path /path/to/image --mask_path /path/to/mask --condition "your condition"
For testing image editing, please use the command
python steered_generate.py --config configs/diffusion_config.yml --img_path /path/to/image --mask_path /path/to/mask --condition "editing" --editing_text " the text prompt to add to image"
The final output image will be saved in the results directory in the format "condition image|generated sample"
Test on custom dataset using:
python steered_generate_dataset.py --data_fold /path/to/data --condition "your condition" --config configs/diffusion_config.yml
Please set the flags you need for the generation.
We performed experiments on the first 300 images of CelebA-Multimodal dataset. These can be downloaded from
https://github.com/IIGROUP/MM-CelebA-HQ-Dataset
If you use our work, please use the following citation
@inproceedings{nair2023steered,
title={Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis},
author={Nair, Nithin Gopalakrishnan and Cherian, Anoop and Lohit, Suhas and Wang, Ye and Koike-Akino, Toshiaki and Patel, Vishal M and Marks, Tim K},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={20850--20860},
year={2023}
}
See CONTRIBUTING.md for our policy on contributions.
Released under AGPL-3.0-or-later
license, as found in the LICENSE.md file.
All files, except as noted below:
Copyright (c) 2023-2024 Mitsubishi Electric Research Laboratories (MERL)
SPDX-License-Identifier: AGPL-3.0-or-later
The following files:
guided_diffusion/guided_diffusion/__init__.py
,guided_diffusion/guided_diffusion/fp16_util.py
guided_diffusion/guided_diffusion/logger.py
,guided_diffusion/guided_diffusion/nn.py
guided_diffusion/guided_diffusion/respace.py
,guided_diffusion/guided_diffusion/script_util.py
guided_diffusion/guided_diffusion/unet.py
were taken without modification from https://github.com/openai/guided-diffusion (license included in LICENSES/MIT.txt):
Copyright (c) 2021 OpenAI
The following files:
losses/ssim.py
were taken without modification from https://github.com/Po-Hsun-Su/pytorch-ssim/tree/master (license included in LICENSES/MIT.txt):
Copyright (c) 2021 Po-Hsun-Su
The following files:
guided_diffusion/guided_diffusion/interp_methods.py
were taken without modification from https://github.com/assafshocher/ResizeRight/blob/master/interp_methods.py (license included in LICENSES/MIT.txt):
Copyright (c) 2020 Assaf Shocher
The following files
guided_diffusion/guided_diffusion/resize_right.py
were adapted from https://github.com/assafshocher/ResizeRight/blob/master/resize_right.py (license included in LICENSES/MIT.txt):
Copyright (c) 2023-2024 Mitsubishi Electric Research Laboratories (MERL)
Copyright (c) 2020 Assaf Shocher
The following files
guided_diffusion/guided_diffusion/gaussian_diffusion.py
were adapted from https://github.com/openai/guided-diffusion (license included in LICENSES/MIT.txt):
Copyright (c) 2023-2024 Mitsubishi Electric Research Laboratories (MERL)
Copyright (c) 2021 OpenAI
The following files
steered_diffusion.py
,steered_diffusion_dataset.py
,parser.py
were adapted from https://github.com/omriav/blended-diffusion (license included in LICENSES/MIT.txt):
Copyright (c) 2023-2024 Mitsubishi Electric Research Laboratories (MERL)
Copyright (C) 2022-2023 Omri Avrahami
The following files
utils/download_models.py
,utils/download_models_func.py
were adapted from https://github.com/Nithin-GK/UniteandConquer/blob/main/download_models.py (license included in LICENSES/Apache-2.0.txt):
# Copyright (C) 2022-2023 Nithin Gopalakrishnan Nair