ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Official implementation of our paper: ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

In this work, we devise a "plug-and-play" method, ExoViP, to correct the errors at both the planning and execution stages through introspective verification. We employ verification modules as "exoskeletons" to enhance current vision-language programming schemes. Specifically, our proposed verification module utilizes a mixture of three sub-verifiers to validate predictions after each reasoning step, subsequently calibrating the visual module predictions and refining the reasoning trace planned by LLMs.

Environment

Paste your OPENAI-API-KEY and OPENAPI-API-BASE to engine/.env and tasks/*.ipynb

conda env create -f environment.yaml
conda activate exovip

If the Huggingface is not available of your network, you can download all checkpoints under prev_trained_models directory

Highlights

Errors in existing methods could be summarized to two categories:

Module Error: The visual modules are not able to correctly execute the program
Planning Error: LLM can not parse the language query into a correct solvable program

We conducted a comparative analysis of the statistics derived from a random sample of 100 failure incidents before (left) and after (right) the implementation of our method.

Start

Our method has been validated on six tasks:

Compositional Image Question Answering: GQA
Referring Expression Understanding: RefCOCO/RefCOCO+/RefCOCOg
Natural Language for Visual Reasoning: NLVR
Visual Abstract Reasoning: KILOGRAM
Language-guided Image Editing: MagicBrush
Spatial-Temporal Video Reasoning: AGQA

NOTE: All the experiments are applied on subsets of these datasets, please refer to datasets

code demos

cd tasks

# GQA
gqa.ipynb

# NLVR
nlvr.ipynb

# RefCOCO(+/g)
refcoco.ipynb

# KILOGRAM
kilogram.ipynb

# MagicBrush
magicbrush.ipynb

# AGQA
agqa.ipynb

Available Modules

Examples

Acknowledgement

visprog, a neuro-symbolic system that solves complex and compositional visual tasks given natural language instructions

Citation

If you find our work helpful, please cite it.

@inproceedings{wang2024exovip,
    title={ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning},
    author={Wang, Yuxuan and Yuille, Alan and Li, Zhuowan and Zheng, Zilong},
    booktitle={The first Conference on Language Modeling (COLM)},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
baselines		baselines
datasets		datasets
docs		docs
engine		engine
prompts		prompts
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
envirionmental.yaml		envirionmental.yaml
vis_utils.py		vis_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

Environment

Highlights

Start

Available Modules

Examples

Acknowledgement

Citation

About

Releases

Packages

Contributors 3

Languages

License

bigai-nlco/ExoViP

Folders and files

Latest commit

History

Repository files navigation

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Introduction

Environment

Highlights

Start

Available Modules

Examples

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages