Skip to content

The PyTorch implementation for "DEAL: Disentangle and Localize Concept-level Explanations for VLMs" (ECCV 2024 Strong Double Blind)

License

Notifications You must be signed in to change notification settings

tangli-udel/DEAL

Repository files navigation

DEAL: Disentangle and Localize Concept-level Explanations for VLMs (ECCV 2024, Strong Double Blind)

[Paper] [Code] [Video] [DeepREAL Lab]

This repository holds the Pytorch implementation of DEAL in DEAL: Disentangle and Localize Concept-level Explanations for VLMs by Tang Li, Mengmeng Ma, and Xi Peng. If you find our code useful in your research, please consider citing:

@inproceedings{li2024deal,
 title={DEAL: Disentangle and Localize Concept-level Explanations for VLMs},
 author={Li, Tang and Ma, Mengmeng and Peng, Xi},
 booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
 year={2024}
}

Introduction

Can we trust Vision-language Models (VLMs) in their predictions? Our findings say NO! The fine-grained visual evidence behind their predictions could be wrong! Our empirical results indicate that CLIP cannot disentangle and localize fine-grained visual evidence. And this phenomenon can be observed in many popular VLMs across different benchmark datasets. However, this issue is challenging to solve. First, human annotations are missing for fine-grained visual evidence. Second, existing VLMs align image with the entire textual caption, without disentangling and localizing fine-grained visual evidence. To this end, we proposed to Disentangle and Localize (DEAL) concept-level explanations of VLMs without rely on expensive human annotations.

method

Pretrained Weights

Datasets and Requirements

This repository reproduces our results on ImageNet, CUB, EuroSAT, OxfordPets, and Food101 datasets, please download these datasets as needed. Our code is build upon Python3 and Pytorch v2.0.1 on Ubuntu 18.04. Please install all required packages by running:

pip install -r requirements.txt

Generating Concepts for Categories

quantitative

You will need to add your OpenAI API token and run the following notebook. Note that in notebook showcase our best prompt for this task, you can change to any category list as you want or modify the prompts as needed.

./deal/generate_descriptors.ipynb

OpenAI will update their API library, please modify the code accordingly if needed.

Training

Before training, please replace the paths in load.py to your own datasets.

python train.py --dataset imagenet --model ViT-B/32 --batch_size 256 --lr 5e-7 --save_path "/path/to/save/"

Note that we use adaptive batch sizes for different datasets to alleviate the ambiguity within a batch. Specifically, we use a batch size that is smaller than the number of classes in the dataset. For example, we use 128 for CUB, 64 for Food101, 32 for OxfordPets, and 8 for EuroSAT. We usually fine-tune one epoch for each of the datasets, please change the number of training steps according to your batch size.

Evaluations

The results for prediction accuracy and explanation quality: quantitative

To evaluate the prediction accuracy, please run:

./deal/evaluation.ipynb

To evaluate concept-level explanation disentanglability, please run:

./deal/exp_disentanglability.ipynb

To evaluate concept-level explanation localzability (fidelity), please run:

./deal/exp_localizability.ipynb

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

About

The PyTorch implementation for "DEAL: Disentangle and Localize Concept-level Explanations for VLMs" (ECCV 2024 Strong Double Blind)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published