GitHub - ChocoWu/T2I-Salad: Codes for NeurIPS 2023 paper: Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion.

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion 🔥

Overview

In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts. Inspired by human imagination intuition, we propose a novel scene-graph hallucination (SGH) mechanism for effective abstract-to-intricate T2I synthesis. SGH carries out scene hallucination by expanding the initial scene graph (SG) of the input prompt with more feasible specific scene structures, in which the structured semantic representation of SG ensures high controllability of the intrinsic scene imagination. To approach the T2I synthesis, we deliberately build an SG-based hallucination diffusion system. First, we implement the SGH module based on the discrete diffusion technique, which evolves the SG structure by iteratively adding new scene elements. Then, we utilize another continuous-state diffusion model as the T2I synthesizer, where the overt image-generating process is navigated by the underlying semantic scene structure induced from the SGH module. On the benchmark COCO dataset, our system outperforms the existing best-performing T2I model by a significant margin, especially improving on the abstract-to-intricate T2I generation.

Method

We develop an SG-based hallucination diffusion system (namely, Salad) for high-quality image synthesis. Salad is a fully diffusion-based T2I system, which mainly consists of a scene-driven T2I module and an SGH module. As shown in the following figure, we first take advantage of the SoTA latent diffusion model as our backbone T2I synthesizer, in which the overt image-generating process is controlled and navigated by the underlying semantic scene structure.

Installation

Main packages: PyTorch = 2.1.1
See requirements.txt for other packages.

Data Preparation

We use COCO and Visual Genome(VG) for training the model.

Run

Simply run main.py to traing the model.

Citation

@inproceedings{Wu0ZC23,
  author       = {Shengqiong Wu and
                  Hao Fei and
                  Hanwang Zhang and
                  Tat{-}Seng Chua},
  title        = {Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion},
  booktitle    = {Proceeding of NeurIPS},
  year         = {2023}
}

Acknowledgement

Our code is based on the respective official repositories, GLIGEN, LayoutDM, and VQ-Diffusion. We fully thank the authors to release their code.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
DATA		DATA
configs		configs
dataset		dataset
figures		figures
ldm		ldm
.gitignore		.gitignore
README.md		README.md
clip_score.py		clip_score.py
distributed.py		distributed.py
main.py		main.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion 🔥

Overview

Method

Installation

Data Preparation

Run

Citation

Acknowledgement

About

Releases

Packages

Languages

ChocoWu/T2I-Salad

Folders and files

Latest commit

History

Repository files navigation

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion 🔥

Overview

Method

Installation

Data Preparation

Run

Citation

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages