I'm a Ph.D. Candidate at the Department of Computing, The Hong Kong Polytechnic University.
- High-Level Video Content Analytics
- Visual Knowledge Learning
- Foundation Models
I'm a Ph.D. Candidate at the Department of Computing, The Hong Kong Polytechnic University.
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
Python 34
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.