GitHub - AdaCheng/EgoThink: [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

🌐 Homepage | 🤗 Dataset | 🤗 Paper | 📖 arXiv | 🏆 Leaderboard

Figure 1: The main categories of EgoThink to comprehensively assess the capability of thinking from a first-person perspective.

🔔 News

[2024-10]: Our related paper VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI has been released.
[2024-09]: EgoThink and VidEgoThink is invited to be presented in ZhiDX.
[2024-04]: EgoThink is invited to be presented in ByteDance.
[2024-04]: EgoThink will be presented as a Poster (Highlight👀) in CVPR 2024.
[2024-03]: EgoThink is presented in AITIME.
[2024-02]: EgoThink has been accepted by CVPR 2024.
[2023-11]: Our paper Can Vision-Language Models Think from a First-Person Perspective? has been released.

💾 Dataset

Overview

Figure 2: Categories with fine-grained dimensions and their corresponding examples of EgoThink benchmark.

Download (Choose One of Two)

Clone our GitHub Repo.

git clone https://github.com/AdaCheng/EgoThink.git
cd data

Download in our Hugging Face Repo.

🔧 Dependencies

Here we provide the basic environment, you need to additionally install requirements for your evaluated open-source models.

conda create --name egothink python=3.10
conda activate egothink
pip install -U pip

# Install requirements
pip install -r requirements.txt

📊 Evaluation

Add New Open-Source Models

🫰 Thank you very much if you would like to contribute the code of the new model you have deployed!

create test_{new_model}.py in /models.
Add the new model in get_model() in /models/__init__.py.

# BLIP2-7B
if model_name == 'blip2-7b':
  from .test_blip2 import TestBlip2
  return TestBlip2(name='blip2_opt', model_type='pretrain_opt6.7b', config_path='/models/blip_configs/blip2_pretrain_opt6.7b.yaml', device=device)

Inference

API-based Model

Please update the API-based models' keys and base_urls between the line 23 to line 33 of file gpt_eval.py.

# dataset: Activity, Object/existence, etc.
# MODEL: GPT series models, such as gpt-4o
python gpt_eval.py \
    --model_name $MODEL \
    --annotation_path /${dataset}/annotations.json \
    --answer_path /answer/${dataset} \

Open-Source Model

# dataset: Activity, Object/existence, etc.
# MODEL: models defined in the models file
# DEVICE: GPU id, 0/1/2..., currently only single card can run
python eval.py \
    --model_name $MODEL \
    --annotation_path /${dataset}/annotations.json \
    --answer_path /answer/${dataset} \
    --batch_size 1 \
    --device $DEVICE

Evaluation

Please update the API-based models' key and base between the line 463 to line 546 of file common.py.

# data-folder: the folder name of answer.
# bench-name: Activity, Object/existence, etc.
# EVA_MODELS: a list of models to be evaluated (separated by spaces), for example "llava-13b-llama2 llava-1.5-13b llava-1.5-7b"
# $EVA_JUDGE_MODEL: gpt-4o (default), gpt-3.5-turbo, claude-2, etc.
python  gen_judgment.py \
    --data-folder /answer \
    --bench-name $dataset \
    --mode single \
    --model-list $EVA_MODELS \
    --judge-model $EVA_JUDGE_MODEL 
    --parallel 4
    --judge-file judge_prompts.jsonl

Show Results

# EVA_MODELS: a list of models to be evaluated (separated by spaces), for example "llava-13b-llama2 llava-1.5-13b llava-1.5-7b"
# $EVA_JUDGE_MODEL: gpt-4 (default), gpt-3.5-turbo, claude-2, etc.
python show_result.py \
    --input-file {data_folder}/{bench-name}/model_judgment/{judge-model}_single.jsonl \
    --judge-model $EVA_JUDGE_MODEL \
    --model-list  $EVA_MODELS \
    --mode single

🏆 Leaderboard

Update

👋 Feel free to contribute to the performance of your model by adding it to our "RESULTS SECTION" (from line 398) in index.html; we will review and merge it accordingly.

<tr style="background-color: #f8fffe;">
    <td style="text-align: left;"><b>GPT-4V(ision)</b></td>
    <td><b>65.5</b></td>
    <td>62.0</td>
    <td><b>82.0</b></td>
    <td><b>58.0</b></td>
    <td><b>59.5</b></td>
    <td style="text-decoration: underline;">86.0</td>
    <td style="text-decoration: underline;">62.0</td>
    <td><b>42.0</b></td>
    <td>48.0</td>
    <td><b>83.0</b></td>
    <td><b>55.0</b></td>
    <td><b>64.0</b></td>
    <td><b>84.0</b></td>
</tr>

Overview

The detailed Table can be found in Here.

Table 1: Combined single-answer grading scores on zero-shot setups for various dimensions. The bold indicates the best performance while the underline indicates the second-best performance. Exist, Attr, Afford, Loc, Spatial, Count, Compar, Situated, Nav and Assist represent existence, attribute, affordance, location, spatial relationship, counting, comparison, situated reasoning, navigation, and assistance.

Contact

Sijie Cheng: csj23@mails.tsinghua.edu.cn

Citation

@InProceedings{Cheng_2024_CVPR,
    author    = {Cheng, Sijie and Guo, Zhicheng and Wu, Jingwen and Fang, Kechen and Li, Peng and Liu, Huaping and Liu, Yang},
    title     = {EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {14291-14302}
}

Acknowledge

Thanks to Xiaolong Wang, Yangyang Yu, Zixin Sun, and Zhaoyang Li for their contributions to data collection and construction. We appreciate Zeyuan Yang, Szymon Tworkowski, Guan Wang, and Zonghan Yang for their support of API resources; Xinghang Li for his valuable discussion; Siyu Wang for her code base on the annotation system.

Furthermore, we appreciate the developers behind the following projects for their significant contributions to our research: Ego4D, Multi-Modality-Arena, FastChat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

🔔 News

💾 Dataset

Overview

Download (Choose One of Two)

🔧 Dependencies

📊 Evaluation

Add New Open-Source Models

Inference

Evaluation

Show Results

🏆 Leaderboard

Update

Overview

Contact

Citation

Acknowledge

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
data		data
models		models
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
common.py		common.py
eval.py		eval.py
gen_judgment.py		gen_judgment.py
gpt_eval.py		gpt_eval.py
index.html		index.html
judge_prompts.jsonl		judge_prompts.jsonl
requirement.txt		requirement.txt
show_result.py		show_result.py

License

AdaCheng/EgoThink

Folders and files

Latest commit

History

Repository files navigation

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

🔔 News

💾 Dataset

Overview

Download (Choose One of Two)

🔧 Dependencies

📊 Evaluation

Add New Open-Source Models

Inference

Evaluation

Show Results

🏆 Leaderboard

Update

Overview

Contact

Citation

Acknowledge

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages