Skip to content

๐Ÿ‘พ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

License

Notifications You must be signed in to change notification settings

PolyU-ChenLab/ETBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

E.T. Bench: Towards Open-Ended Event-Level
Video-Language Understanding

Ye Liu1,2, Zongyang Ma2,3, Zhongang Qi2, Yang Wu4, Ying Shan2, Chang Wen Chen1

1The Hong Kong Polytechnic University 2ARC Lab, Tencent PCG
3Institute of Automation, Chinese Academy of Sciences 4Tencent AI Lab

E.T. Bench (Event-Level & Time-Sensitive Video Understanding Benchmark) is a comprehensive solution for open-ended event-level video-language understanding. This project consists of the following three contributions:

  • E.T. Bench: A large-scale and high-quality benchmark for event-level and time-sensitive video understanding, comprising 7.3K samples under 12 tasks with 7K videos (251.4h total length) under 8 domains.
  • E.T. Chat: A multi-modal large language model (MLLM) that specializes in time-sensitive video-conditioned chatting. It reformulates timestamp prediction as a novel embedding matching problem.
  • E.T. Instruct 164K: A meticulously collected instruction-tuning dataset tailored for time-sensitive video understanding scenarios.

We focus on 4 essential capabilities for time-sensitive video understanding: referring, grounding, dense captioning, and complex understanding. The examples (categorized by background colors) are as follows.

๐Ÿ”ฅ News

  • 2024.09.28 โญ๏ธ Code, model, and dataset release.
  • 2024.09.27 ๐ŸŽ‰ E.T. Bench has been accepted to NeurIPS 2024 (Datasets and Benchmarks Track).

๐Ÿ† Leaderboard

Our online leaderboard is under construction. Stay tuned!

๐Ÿ”ฎ Benchmark

Please refer to the Benchmark page for details about E.T. Bench.

๐Ÿ› ๏ธ Model

Please refer to the Model page for training and testing E.T. Chat.

๐Ÿ“ฆ Dataset

Please refer to the Dataset page for downloading E.T. Instruct 164K.

๐Ÿ“– Citation

Please kindly cite our paper if you find this project helpful.

@inproceedings{liu2024etbench,
  title={E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding},
  author={Liu, Ye and Ma, Zongyang and Qi, Zhongang and Wu, Yang and Chen, Chang Wen and Shan, Ying},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

๐Ÿ’ก Acknowledgements

This project was built upon the following repositories with many thanks to their authors.

LLaVA, LAVIS, EVA, LLaMA-VID, TimeChat, densevid_eval

About

๐Ÿ‘พ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published