Testing On custom datasets

MaxFusion: Plug & Play multimodal generation in text to image diffusion models

If you like our project, please give us a star ⭐ on GitHub for latest update.

Applications

Keywords: Multimodal Generation, Text to image generation, Plug and Play

We propose MaxFusion, a plug and play framework for multimodal generation using text to image diffusion models. (a) Multimodal generation. We address the problem of conflicting spatial conditioning for text to iamge models . (b) Saliency in variance maps. We discover that the variance maps of different feature layers expresses the strength og conditioning.

Contributions:

We tackle the need for training with paired data for multi-task conditioning using diffusion models.
We propose a novel variance-based feature merging strategy for diffusion models.
Our method allows us to use combined information to influence the output, unlike individual models that are limited to a single condition.
Unlike previous solutions, our approach is easily scalable and can be added on top of off-the-shelf models.

Environment setup

conda env create -f environment.yml

Code demo:

A notebook for differnt demo conditions is provided in demo.ipynb

Testing On custom datasets

Will be released shortly

Instructions for Interactive Demo

An intractive demo can be run locally using

python gradio_maxfusion.py

This code is reliant on:

https://github.com/google/prompt-to-prompt/

Citation

If you use our work, please use the following citation

@inproceedings{nair2025maxfusion,
  title={Maxfusion: Plug\&play multi-modal generation in text-to-image diffusion models},
  author={Nair, Nithin Gopalakrishnan and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
  booktitle={European Conference on Computer Vision},
  pages={93--110},
  year={2025},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
test_images		test_images
visualization_notebooks		visualization_notebooks
LICENSE		LICENSE
README.md		README.md
controlnet_fused_avg.py		controlnet_fused_avg.py
demo.ipynb		demo.ipynb
environment.yml		environment.yml
img_process_utils.py		img_process_utils.py
model_utils.py		model_utils.py
ptp_utils_max_merge.py		ptp_utils_max_merge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaxFusion: Plug & Play multimodal generation in text to image diffusion models

If you like our project, please give us a star ⭐ on GitHub for latest update.

Applications

Contributions:

Environment setup

Code demo:

Testing On custom datasets

Instructions for Interactive Demo

Citation

About

Releases

Packages

Languages

License

Nithin-GK/MaxFusion

Folders and files

Latest commit

History

Repository files navigation

MaxFusion: Plug & Play multimodal generation in text to image diffusion models

If you like our project, please give us a star ⭐ on GitHub for latest update.

Applications

Contributions:

Environment setup

Code demo:

Testing On custom datasets

Instructions for Interactive Demo

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages