Video Summarization using RL

We evaluate the use of augmentations on images in RL, when generating a summary of video. Recent work by Laskin et al. and Kostrikov et al. have shown that augmentation on pixels improve the sample efficiency of the RL algorithm and also increases the average cumulative reward obatained. Our work builds on the paper Deep RL for Unsupervised Video Summarization and evalutes the use of augmentations on pixels/images present in the video to improve the performance of the agent. We also studied the use of PPO algorithm for training the summary generation agent based on performance (total reward obtained).

Authors: Rohith, Dibyajit

Run experiments

Use the shell scripts to run each experiment. The shell script runs each experiment on 5 different seeds. The results are stored in a log file. For ex:

./train_resnet50_ppo_augs.sh

Augmentations used

Gaussian Blur
Cutout
Cutout Color
Rotate
Flip
Center Crop
Grayscale

Dataset and training methodology

We train our model architectures on the TVSum dataset using the cross-validation method. Following Zhou et al., we perform 5 fold cross-validation and report our average out-of-fold F1 score. We also evaluate our model the SumMe dataset to test the generalization of our models. Note that SumMe is an out-of-distribution data and our model is not trained on this dataset.

Results

We use an Enoder-Decoder model. The architecture of the decoder model for all experiments is fixed, i.e. a bidirectional LSTM. We report the F1 score (note that the agent is not explicitly trained for optimizing this metric) for different Encoder CNN architectures and RL algorithm.

TVSum Dataset

Encoder - CNN	RL algorithm	Trained with augmented data	F1 Score (mean +- std)
Resnet50	REINFORCE	False	0.57 +- 0.003
Resnet50	PPO	False	0.5736 +- 0.004
Resnet50	PPO	True	0.5756 +- 0.002
Resnet101	REINFORCE	False	0.5695 +- 0.004
Resnet101	PPO	False	0.5712 +- 0.004
Resnet101	PPO	True	0.5741 +- 0.003

SumeMe Dataset

Note that the RL agent is not trained on this dataset. Therefore, this can be considered an OOD dataset. Hence, the low F1 scores.

Encoder - CNN	RL algorithm	Trained with augmented data	F1 Score
Resnet50	PPO	False	0.1780
Resnet50	PPO	True	0.1887
Resnet101	PPO	False	0.1817
Resnet101	PPO	True	0.1830

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
folds		folds
.gitignore		.gitignore
LICENSE		LICENSE
PolicyGradient.py		PolicyGradient.py
README.md		README.md
augs.py		augs.py
dataset.py		dataset.py
dump_CNNFeatures.py		dump_CNNFeatures.py
dump_aug_CNNFeatures.py		dump_aug_CNNFeatures.py
dump_videos.py		dump_videos.py
generate_preds.py		generate_preds.py
id_to_key_map_summe.json		id_to_key_map_summe.json
id_to_key_map_tvsum.json		id_to_key_map_tvsum.json
reward.py		reward.py
test.py		test.py
tools.py		tools.py
train_resnet101.sh		train_resnet101.sh
train_resnet101_ppo.sh		train_resnet101_ppo.sh
train_resnet101_ppo_augs.sh		train_resnet101_ppo_augs.sh
train_resnet50.sh		train_resnet50.sh
train_resnet50_ppo.sh		train_resnet50_ppo.sh
train_resnet50_ppo_augs.sh		train_resnet50_ppo_augs.sh
train_transformer_lstm.sh		train_transformer_lstm.sh
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Summarization using RL

Run experiments

Augmentations used

Dataset and training methodology

Results

TVSum Dataset

SumeMe Dataset

About

Releases

Packages

Languages

License

grohith327/Video-Summarization-Using-RL

Folders and files

Latest commit

History

Repository files navigation

Video Summarization using RL

Run experiments

Augmentations used

Dataset and training methodology

Results

TVSum Dataset

SumeMe Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages