Video Summarization using RL

We evaluate the use of augmentations on images in RL, when generating a summary of video. Recent work by Laskin et al. and Kostrikov et al. have shown that augmentation on pixels improve the sample efficiency of the RL algorithm and also increases the average cumulative reward obatained. Our work builds on the paper Deep RL for Unsupervised Video Summarization and evalutes the use of augmentations on pixels/images present in the video to improve the performance of the agent. We also studied the use of PPO algorithm for training the summary generation agent based on performance (total reward obtained).

Authors: Rohith, Dibyajit

Run experiments

Use the shell scripts to run each experiment. The shell script runs each experiment on 5 different seeds. The results are stored in a log file. For ex:

./train_resnet50_ppo_augs.sh

Augmentations used

Gaussian Blur
Cutout
Cutout Color
Rotate
Flip
Center Crop
Grayscale

Dataset and training methodology

We train our model architectures on the TVSum dataset using the cross-validation method. Following Zhou et al., we perform 5 fold cross-validation and report our average out-of-fold F1 score. We also evaluate our model the SumMe dataset to test the generalization of our models. Note that SumMe is an out-of-distribution data and our model is not trained on this dataset.

Results

We use an Enoder-Decoder model. The architecture of the decoder model for all experiments is fixed, i.e. a bidirectional LSTM. We report the F1 score (note that the agent is not explicitly trained for optimizing this metric) for different Encoder CNN architectures and RL algorithm.

TVSum Dataset

Encoder - CNN	RL algorithm	Trained with augmented data	F1 Score (mean +- std)
Resnet50	REINFORCE	False	0.57 +- 0.003
Resnet50	PPO	False	0.5736 +- 0.004
Resnet50	PPO	True	0.5756 +- 0.002
Resnet101	REINFORCE	False	0.5695 +- 0.004
Resnet101	PPO	False	0.5712 +- 0.004
Resnet101	PPO	True	0.5741 +- 0.003

SumeMe Dataset

Note that the RL agent is not trained on this dataset. Therefore, this can be considered an OOD dataset. Hence, the low F1 scores.

Encoder - CNN	RL algorithm	Trained with augmented data	F1 Score
Resnet50	PPO	False	0.1780
Resnet50	PPO	True	0.1887
Resnet101	PPO	False	0.1817
Resnet101	PPO	True	0.1830

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Video Summarization using RL

Run experiments

Augmentations used

Dataset and training methodology

Results

TVSum Dataset

SumeMe Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Video Summarization using RL

Run experiments

Augmentations used

Dataset and training methodology

Results

TVSum Dataset

SumeMe Dataset