/ Video
Pytorch implementation for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos". In this work, different 'Spatiotemporal Modeling Blocks' are analyzed for the architecture illustrated at the above below.
Maintainers: Okan Köpüklü and Fabian Herzog
The structure was inspired by the project TRN-pytorch
The pretrained models can be found in our Google Drive.
Clone the repo with the following command:
git clone git@github.com:fubel/stmodeling.git
The project requirements can be found in the file requirements.txt
. To run the code, create a Python >= 3.6 virtual environment and install
the requirements with
pip install -r requirements.txt
NOTE: This project assumes that you have a GPU with CUDA support.
Download the jester dataset or something-something-v2 dataset. Decompress them into the same folder and use process_dataset.py to generate the index files for train, val, and test split. Poperly set up the train, validatin, and category meta files in datasets_video.py.
To convert the something-something-v2 dataset, you can use the extract_frames.py
from TRN-pytorch.
Assume the structure of data directories is the following:
~/stmodeling/
datasets/
jester/
rgb/
.../ (directories of video samples for Jester)
.../ (jpg color frames)
something/
rgb/
.../ (directories of video samples for Something-Something)
model/
.../(saved models for the last checkpoint and best model)
Currently the following ST Modeling blocks are implemented:
- MLP
- TRNmiltiscale
- RNN_TANH
- RNN_RELU
- LSTM
- GRU
- BLSTM
- FCN
Furthermore, the following backbone feature extractors are implemented:
- squeezenet1_1
- BNInception
Followings are some examples for training under different scenarios:
- Train 8-segment network for Jester with MLP and squeeznet backbone
python main.py jester RGB --arch squeezenet1_1 --num_segments 8 \
--consensus_type MLP --batch-size 16
- Train 16-segment network for Something-Something with TRN-multiscale and BNInception backbone
python main.py something RGB --arch BNInception --num_segments 16 \
--consensus_type TRNmultiscale --batch-size 16
@inproceedings{kopuklu2021comparative,
title={Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos},
author={K{\"o}p{\"u}kl{\"u}, Okan and Herzog, Fabian and Rigoll, Gerhard},
booktitle={International Conference on Pattern Recognition},
pages={186--202},
year={2021},
organization={Springer}
}
This project was build on top of TRN-pytorch, which itself was build on top of TSN-Pytorch. We thank the authors for sharing their code publicly.