Skip to content

πŸ”ŽOfficial code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

Notifications You must be signed in to change notification settings

Ruiyang-061X/VL-Uncertainty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


πŸ”Ž VL-Uncertainty

Ruiyang Zhang, Hu Zhang, Zhedong Zheng*

Website | Paper | Code

πŸ”₯ News

⚑ Overview

πŸ› οΈ Install

  • Create conda environment.
conda create -n VL-Uncertainty python=3.11;

conda activate VL-Uncertainty;
  • Install denpendency.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121;

pip install transformers datasets flash-attn accelerate timm numpy sentencepiece protobuf qwen_vl_utils;

(Tested on NVIDIA H100 PCIe-80G, NVIDIA A100-PCIE-40GB, and A6000-48G)

πŸš€ Quick Start

  • Run our demo code.
python demo.py;
  • This should produce the results below. VL-Uncertainty can successfully estimate high uncertainty for wrong LVLM answer and thereby detect hallucination!
--------------------------------------------------
- Demo image: .asset/img/titanic.png
- Question: What is the name of this movie?
- GT answer: Titanic.
--------------------------------------------------
- LVLM answer: The movie in the image is "Coco."
- LVLM answer accuracy: Wrong
--------------------------------------------------
- Estimated uncertianty: 2.321928094887362
- Uncertianty threshold: 1.0
--------------------------------------------------
- Hallucination prediction: Is hallucination
- Hallucination detection: Success!
--------------------------------------------------

πŸ“ˆ Run

  • For MM-Vet (Free-form benchmark)
bash run/run_MMVet.sh;
  • For LLaVABench (Free-form benchmark)
bash run/run_LLaVABench.sh;
  • For MMMU (Mutli-choice benchmark)
bash run/run_MMMU.sh;
  • For ScienceQA (Mutli-choice benchmark)
bash run/run_ScienceQA.sh;

πŸ„ Examples

  • VL-Uncertainty successfully detects LVLM hallucination:

  • VL-Uncertainty can also assign low uncertainty for correct answer and identify it as non-hallucinatory:

  • VL-Uncertainty effectively generalizes to physical-world scenario. (The following picture is my laptop captured by iPhone)

⌨️ Code Structure

  • Code strucuture of this repostory is as follow:
β”œβ”€β”€ VL-Uncertainty/ 
β”‚   β”œβ”€β”€ .asset/
β”‚   β”‚   β”œβ”€β”€ img/
β”‚   β”‚   β”‚   β”œβ”€β”€ logo.png
β”‚   β”‚   β”‚   β”œβ”€β”€ titanic.png         # For demo
β”‚   β”œβ”€β”€ benchmark/
β”‚   β”‚   β”œβ”€β”€ LLaVABench.py           # Free-form benchmark
β”‚   β”‚   β”œβ”€β”€ MMMU.py                 # Multi-choice benchmark
β”‚   β”‚   β”œβ”€β”€ MMVet.py                # Free-form benchmark
β”‚   β”‚   β”œβ”€β”€ ScienceQA.py            # Multi-choice benchmark
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   β”œβ”€β”€ Qwen.py                 # LLM class
β”‚   β”œβ”€β”€ lvlm/
β”‚   β”‚   β”œβ”€β”€ InternVL.py             # Support 26B, 8B, and 1B
β”‚   β”‚   β”œβ”€β”€ LLaVA.py                # Support 13B, 7B
β”‚   β”‚   β”œβ”€β”€ LLaVANeXT.py            # Support 13B, 7B
β”‚   β”‚   β”œβ”€β”€ Qwen2VL.py              # Support 72B, 7B, 2B
β”‚   β”œβ”€β”€ run/
β”‚   β”‚   β”œβ”€β”€ run_LLaVABench.sh       # Benchmark VL-Uncertainty on LLaVABench
β”‚   β”‚   β”œβ”€β”€ run_MMMU.sh             # Benchmark VL-Uncertainty on MMMU
β”‚   β”‚   β”œβ”€β”€ run_MMVet.sh            # Benchmark VL-Uncertainty on MMVet
β”‚   β”‚   β”œβ”€β”€ run_ScienceQA.sh        # Benchmark VL-Uncertainty on ScienceQA
β”‚   β”œβ”€β”€ util/
β”‚   β”‚   β”œβ”€β”€ misc.py                 # Helper function
β”‚   β”‚   β”œβ”€β”€ textual_perturbation.py # Various textural perturbation
β”‚   β”‚   β”œβ”€β”€ visual_perturbation.py  # Various visual perturbation
β”‚   β”œβ”€β”€ .gitignore
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ VL-Uncertainty.py           # Include semantic-equvialent perturbation, uncertainty estimation, and hallucination detection
β”‚   β”œβ”€β”€ demo.py                     # Quick start demo

✨ Acknowledgement

πŸ“Ž Citation

If you find our work useful for your research and application, please cite using this BibTeX:

@article{zhang2024vl,
  title={VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation},
  author={Zhang, Ruiyang and Zhang, Hu and Zheng, Zhedong},
  journal={arXiv preprint arXiv:2411.11919},
  year={2024}
}