Skip to content

cplou99/BayesianVSLNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BayesianVSLNet - Ego4D Step Grounding Challenge CVPR24 🏆

🔜: We will release checkpoints and pre-extracted video features.

[ArXiv] [Leaderboard]

Challenge

The challenge is built over Ego4d-GoalStep dataset and code.

Goal: Given an untrimmed egocentric video, identify the temporal action segment corresponding to a natural language description of the step. Specifically, predict the (start_time, end_time) for a given keystep description.

Challenge

You will find in the leaderboard 🚀 the results in the test set for the best approaches. Our method is currently in the first place 🚀🔥.

BayesianVSLNet

We build our approach BayesianVSLNet: Bayesian temporal-order priors for test time refinement. Our model significantly improves upon traditional models by incorporating a novel Bayesian temporal-order prior during inference, which adjusts for cyclic and repetitive actions within video, enhancing the accuracy of moment predictions. Please, review the paper for further details.

Alt text

Install

git clone https://github.com/cplou99/BayesianVSLNet
pip install -r requirements.txt

Video Features

We use both Omnivore-L and EgoVLPv2 video features. They should be pre-extracted and located at ./ego4d-goalstep/step-grounding/data/features/.

Model

It is necessary to locate the EgoVLPv2 weights to extract text features in BayesianVSLNet/NaQ/VSLNet_Bayesian/model/EgoVLP_weights.

Train

cd ego4d-goalstep/step_grounding/
bash train_Bayesian.sh experiments/

Inference

cd ego4d-goalstep/step_grounding/
bash infer_Bayesian.sh experiments/

About

Ego4D Step Grounding Challenge CVPR24 Winner

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published