Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 2.92 KB

README.md

File metadata and controls

45 lines (37 loc) · 2.92 KB

Video Summarization using RL

We evaluate the use of augmentations on images in RL, when generating a summary of video. Recent work by Laskin et al. and Kostrikov et al. have shown that augmentation on pixels improve the sample efficiency of the RL algorithm and also increases the average cumulative reward obatained. Our work builds on the paper Deep RL for Unsupervised Video Summarization and evalutes the use of augmentations on pixels/images present in the video to improve the performance of the agent. We also studied the use of PPO algorithm for training the summary generation agent based on performance (total reward obtained).

Authors: Rohith, Dibyajit

Run experiments

Use the shell scripts to run each experiment. The shell script runs each experiment on 5 different seeds. The results are stored in a log file. For ex:

./train_resnet50_ppo_augs.sh

Augmentations used

  • Gaussian Blur
  • Cutout
  • Cutout Color
  • Rotate
  • Flip
  • Center Crop
  • Grayscale

Dataset and training methodology

We train our model architectures on the TVSum dataset using the cross-validation method. Following Zhou et al., we perform 5 fold cross-validation and report our average out-of-fold F1 score. We also evaluate our model the SumMe dataset to test the generalization of our models. Note that SumMe is an out-of-distribution data and our model is not trained on this dataset.

Results

We use an Enoder-Decoder model. The architecture of the decoder model for all experiments is fixed, i.e. a bidirectional LSTM. We report the F1 score (note that the agent is not explicitly trained for optimizing this metric) for different Encoder CNN architectures and RL algorithm.

TVSum Dataset

Encoder - CNN RL algorithm Trained with augmented data F1 Score (mean +- std)
Resnet50 REINFORCE False 0.57 +- 0.003
Resnet50 PPO False 0.5736 +- 0.004
Resnet50 PPO True 0.5756 +- 0.002
Resnet101 REINFORCE False 0.5695 +- 0.004
Resnet101 PPO False 0.5712 +- 0.004
Resnet101 PPO True 0.5741 +- 0.003

SumeMe Dataset

Note that the RL agent is not trained on this dataset. Therefore, this can be considered an OOD dataset. Hence, the low F1 scores.

Encoder - CNN RL algorithm Trained with augmented data F1 Score
Resnet50 PPO False 0.1780
Resnet50 PPO True 0.1887
Resnet101 PPO False 0.1817
Resnet101 PPO True 0.1830