Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion 🔥
In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts. Inspired by human imagination intuition, we propose a novel scene-graph hallucination (SGH) mechanism for effective abstract-to-intricate T2I synthesis. SGH carries out scene hallucination by expanding the initial scene graph (SG) of the input prompt with more feasible specific scene structures, in which the structured semantic representation of SG ensures high controllability of the intrinsic scene imagination. To approach the T2I synthesis, we deliberately build an SG-based hallucination diffusion system. First, we implement the SGH module based on the discrete diffusion technique, which evolves the SG structure by iteratively adding new scene elements. Then, we utilize another continuous-state diffusion model as the T2I synthesizer, where the overt image-generating process is navigated by the underlying semantic scene structure induced from the SGH module. On the benchmark COCO dataset, our system outperforms the existing best-performing T2I model by a significant margin, especially improving on the abstract-to-intricate T2I generation.
We develop an SG-based hallucination diffusion system (namely, Salad) for high-quality image synthesis. Salad is a fully diffusion-based T2I system, which mainly consists of a scene-driven T2I module and an SGH module. As shown in the following figure, we first take advantage of the SoTA latent diffusion model as our backbone T2I synthesizer, in which the overt image-generating process is controlled and navigated by the underlying semantic scene structure.
- Main packages: PyTorch = 2.1.1
- See
requirements.txt
for other packages.
We use COCO and Visual Genome(VG) for training the model.
Simply run main.py
to traing the model.
@inproceedings{Wu0ZC23,
author = {Shengqiong Wu and
Hao Fei and
Hanwang Zhang and
Tat{-}Seng Chua},
title = {Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion},
booktitle = {Proceeding of NeurIPS},
year = {2023}
}
Our code is based on the respective official repositories, GLIGEN, LayoutDM, and VQ-Diffusion. We fully thank the authors to release their code.