Ruiyang Zhang, Hu Zhang, Zhedong Zheng*
- 2024.12.19: π£ Source code of VL-Uncertainty is released!
- Create conda environment.
conda create -n VL-Uncertainty python=3.11;
conda activate VL-Uncertainty;
- Install denpendency.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121;
pip install transformers datasets flash-attn accelerate timm numpy sentencepiece protobuf qwen_vl_utils;
(Tested on NVIDIA H100 PCIe-80G, NVIDIA A100-PCIE-40GB, and A6000-48G)
- Run our demo code.
python demo.py;
- This should produce the results below. VL-Uncertainty can successfully estimate high uncertainty for wrong LVLM answer and thereby detect hallucination!
--------------------------------------------------
- Demo image: .asset/img/titanic.png
- Question: What is the name of this movie?
- GT answer: Titanic.
--------------------------------------------------
- LVLM answer: The movie in the image is "Coco."
- LVLM answer accuracy: Wrong
--------------------------------------------------
- Estimated uncertianty: 2.321928094887362
- Uncertianty threshold: 1.0
--------------------------------------------------
- Hallucination prediction: Is hallucination
- Hallucination detection: Success!
--------------------------------------------------
- For MM-Vet (Free-form benchmark)
bash run/run_MMVet.sh;
- For LLaVABench (Free-form benchmark)
bash run/run_LLaVABench.sh;
- For MMMU (Mutli-choice benchmark)
bash run/run_MMMU.sh;
- For ScienceQA (Mutli-choice benchmark)
bash run/run_ScienceQA.sh;
- VL-Uncertainty successfully detects LVLM hallucination:
- VL-Uncertainty can also assign low uncertainty for correct answer and identify it as non-hallucinatory:
- VL-Uncertainty effectively generalizes to physical-world scenario. (The following picture is my laptop captured by iPhone)
- Code strucuture of this repostory is as follow:
βββ VL-Uncertainty/
β βββ .asset/
β β βββ img/
β β β βββ logo.png
β β β βββ titanic.png # For demo
β βββ benchmark/
β β βββ LLaVABench.py # Free-form benchmark
β β βββ MMMU.py # Multi-choice benchmark
β β βββ MMVet.py # Free-form benchmark
β β βββ ScienceQA.py # Multi-choice benchmark
β βββ llm/
β β βββ Qwen.py # LLM class
β βββ lvlm/
β β βββ InternVL.py # Support 26B, 8B, and 1B
β β βββ LLaVA.py # Support 13B, 7B
β β βββ LLaVANeXT.py # Support 13B, 7B
β β βββ Qwen2VL.py # Support 72B, 7B, 2B
β βββ run/
β β βββ run_LLaVABench.sh # Benchmark VL-Uncertainty on LLaVABench
β β βββ run_MMMU.sh # Benchmark VL-Uncertainty on MMMU
β β βββ run_MMVet.sh # Benchmark VL-Uncertainty on MMVet
β β βββ run_ScienceQA.sh # Benchmark VL-Uncertainty on ScienceQA
β βββ util/
β β βββ misc.py # Helper function
β β βββ textual_perturbation.py # Various textural perturbation
β β βββ visual_perturbation.py # Various visual perturbation
β βββ .gitignore
β βββ README.md
β βββ VL-Uncertainty.py # Include semantic-equvialent perturbation, uncertainty estimation, and hallucination detection
β βββ demo.py # Quick start demo
- LLaVA, LLaVA-NeXT, InternVL, Qwen2-VL: Thanks a lot for those foundamental efforts!
- semantic_uncertainty: We are inspired a lot by this work!
If you find our work useful for your research and application, please cite using this BibTeX:
@article{zhang2024vl,
title={VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation},
author={Zhang, Ruiyang and Zhang, Hu and Zheng, Zhedong},
journal={arXiv preprint arXiv:2411.11919},
year={2024}
}