[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
Updated
Jun 4, 2024 - Python
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
A Survey on video and language understanding.
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
The official GitHub page for the survey paper "Self-Supervised learning for Videos: A survey"
Add a description, image, and links to the video-language-pretraining topic page so that developers can more easily learn about it.
To associate your repository with the video-language-pretraining topic, visit your repo's landing page and select "manage topics."