Skip to content

alanaai/EVUD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

We introduce the Egocentric Video Understanding Dataset (EVUD), an instruction-tuning dataset for training VLMs on video captioning and question answering tasks specific to egocentric videos.

News

  • The AlanaVLM paper is now on arXiv! arXiv
  • All the checkpoints developed for this project are available on Hugging Face
  • The EVUD dataset is available on Hugging Face

Prerequisites

Create and activate virtual environment:

python -m venv env
source venv/bin/activate
pip install -r requirements.txt

Data generation

Together with our generated data released on HuggingFace, we are also releasing all the scripts to reproduce our data generation pipeline:

The generated data follows the LLaVa JSON format.

About

Egocentric Video Understanding Dataset (EVUD)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages