DEAL: Disentangle and Localize Concept-level Explanations for VLMs (ECCV 2024, Strong Double Blind)

This repository holds the Pytorch implementation of DEAL in DEAL: Disentangle and Localize Concept-level Explanations for VLMs by Tang Li, Mengmeng Ma, and Xi Peng. If you find our code useful in your research, please consider citing:

@inproceedings{li2024deal,
 title={DEAL: Disentangle and Localize Concept-level Explanations for VLMs},
 author={Li, Tang and Ma, Mengmeng and Peng, Xi},
 booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
 year={2024}
}

Introduction

Can we trust Vision-language Models (VLMs) in their predictions? Our findings say NO! The fine-grained visual evidence behind their predictions could be wrong! Our empirical results indicate that CLIP cannot disentangle and localize fine-grained visual evidence. And this phenomenon can be observed in many popular VLMs across different benchmark datasets. However, this issue is challenging to solve. First, human annotations are missing for fine-grained visual evidence. Second, existing VLMs align image with the entire textual caption, without disentangling and localizing fine-grained visual evidence. To this end, we proposed to Disentangle and Localize (DEAL) concept-level explanations of VLMs without rely on expensive human annotations.

Pretrained Weights

Fine-tuned on ImageNet: DEAL-ImageNet-ViT-B/32
Fine-tuned on EuroSAT: DEAL-EuroSAT-ViT-B/32

Datasets and Requirements

This repository reproduces our results on ImageNet, CUB, EuroSAT, OxfordPets, and Food101 datasets, please download these datasets as needed. Our code is build upon Python3 and Pytorch v2.0.1 on Ubuntu 18.04. Please install all required packages by running:

pip install -r requirements.txt

Generating Concepts for Categories

You will need to add your OpenAI API token and run the following notebook. Note that in notebook showcase our best prompt for this task, you can change to any category list as you want or modify the prompts as needed.

./deal/generate_descriptors.ipynb

OpenAI will update their API library, please modify the code accordingly if needed.

Training

Before training, please replace the paths in load.py to your own datasets.

python train.py --dataset imagenet --model ViT-B/32 --batch_size 256 --lr 5e-7 --save_path "/path/to/save/"

Note that we use adaptive batch sizes for different datasets to alleviate the ambiguity within a batch. Specifically, we use a batch size that is smaller than the number of classes in the dataset. For example, we use 128 for CUB, 64 for Food101, 32 for OxfordPets, and 8 for EuroSAT. We usually fine-tune one epoch for each of the datasets, please change the number of training steps according to your batch size.

Evaluations

The results for prediction accuracy and explanation quality:

To evaluate the prediction accuracy, please run:

./deal/evaluation.ipynb

To evaluate concept-level explanation disentanglability, please run:

./deal/exp_disentanglability.ipynb

To evaluate concept-level explanation localzability (fidelity), please run:

./deal/exp_localizability.ipynb

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
CLIP		CLIP
descriptors		descriptors
figures		figures
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
descriptor_strings.py		descriptor_strings.py
evaluation.ipynb		evaluation.ipynb
exp_disentanglability.ipynb		exp_disentanglability.ipynb
exp_localizability.ipynb		exp_localizability.ipynb
explainer.py		explainer.py
generate_descriptors.ipynb		generate_descriptors.ipynb
load.py		load.py
load_eval.py		load_eval.py
loading_helpers.py		loading_helpers.py
loss.py		loss.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEAL: Disentangle and Localize Concept-level Explanations for VLMs (ECCV 2024, Strong Double Blind)

Introduction

Pretrained Weights

Datasets and Requirements

Generating Concepts for Categories

Training

Evaluations

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

tangli-udel/DEAL

Folders and files

Latest commit

History

Repository files navigation

DEAL: Disentangle and Localize Concept-level Explanations for VLMs (ECCV 2024, Strong Double Blind)

Introduction

Pretrained Weights

Datasets and Requirements

Generating Concepts for Categories

Training

Evaluations

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages