A curated list of action recognition and related area (e.g. object recognition, pose estimation) resources, inspired by awesome-computer-vision.
- Action Tubelet Detector for Spatio-Temporal Action Localization - V. Kalogeiton et al, arXiv2017.
- Am I Done? Predicting Action Progress in Videos - F. Becattini et al, arXiv2017.
- Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection - M. Zolfaghari et al, arXiv2017.
- Generic Tubelet Proposals for Action Localization - J. He et al, arXiv2017.
- Incremental Tube Construction for Human Action Detection - H. S. Behl et al, arXiv2017.
- Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos - R. Hou et al, arXiv2017.
- Online Real time Multiple Spatiotemporal Action Localisation and Prediction - G. Singh et al, arXiv2016.
- Multi-region two-stream R-CNN for action detection - X. Peng and C. Schmid. ECCV2016. [code]
- Spot On: Action Localization from Pointly-Supervised Proposals - P. Mettes et al, ECCV2016.
- Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos - S. Saha et al, BMVC2016. [code] [project web]
- Learning to track for spatio-temporal action localization - P. Weinzaepfel et al. ICCV2015.
- Action detection by implicit intentional motion clustering - W. Chen and J. Corso, ICCV2015.
- Finding Action Tubes - G. Gkioxari and J. Malik CVPR2015. [code] [project web]
- APT: Action localization proposals from dense trajectories - J. Gemert et al, BMVC2015. [code]
- Spatio-Temporal Object Detection Proposals - D. Oneata et al, ECCV2014. [code] [project web]
- Action localization with tubelets from motion - M. Jain et al, CVPR2014.
- Spatiotemporal deformable part models for action detection - Y. Tian et al, CVPR2013. [code]
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- SST: Single-Stream Temporal Action Proposals - S. Buch et al, CVPR2017.
- R-C3D: Region Convolutional 3D Network for Temporal Activity Detection - H. Xu et al, arXiv2017.
- DAPs: Deep Action Proposals for Action Understanding - V. Escorcia et al, ECCV2016. [code] [raw data]
- Online Action Detection using Joint Classification-Regression Recurrent Neural Networks - Y. Li et al, ECCV2016. Noe: RGB-D Action Detection
- Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs - Z. Shou et al, CVPR2016. [code] Note: Aka S-CNN.
- Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos - F. Heilbron et al, CVPR2016. [code] Note: Depends on C3D, aka SparseProp.
- Actionness Estimation Using Hybrid Fully Convolutional Networks - L. Wang et al, CVPR2016. [code] Note: The code is not a complete verision. It only contains a demo, not training. [project web]
- Learning Activity Progression in LSTMs for Activity Detection and Early Detection - S. Ma et al, CVPR2016.
- End-to-end Learning of Action Detection from Frame Glimpses in Videos - S. Yeung et al, CVPR2016. [code] [project web] Note: This method uses reinforcement learning
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting - P. Mettes et al, ICMR2015.
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- Deep Temporal Linear Encoding Networks - A. Diva et al, arXiv 2016.
- Temporal Convolutional Networks: A Unified Approach to Action Segmentation and Detection - C. Lea et al, CVPR 2017. [code]
- Long-term Temporal Convolutions - G. Varol et al, TPAMI2017. [project web] [code]
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition - L. Wang et al, arXiv 2016. [code]
- Dynamic Image Networks for Action Recognition - H. Bilen et al, CVPR2016. [code] [project web]
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description - J. Donahue et al, CVPR2015. [code] [project web]
- Describing Videos by Exploiting Temporal Structure - L. Yao et al, ICCV2015. [code] note: from the same group of RCN paper “Delving Deeper into Convolutional Networks for Learning Video Representations"
- Two-Stream SR-CNNs for Action Recognition in Videos - L. Wang et al, BMVC2016.
- Real-time Action Recognition with Enhanced Motion Vector CNNs - B. Zhang et al, CVPR2016. [code]
- Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors - L. Wang et al, CVPR2015. [code]
- Convolutional Two-Stream Network Fusion for Video Action Recognition - C. Feichtenhofer et al, CVPR2016. [code]
- Learning Spatiotemporal Features with 3D Convolutional Networks - D. Tran et al, ICCV2015. [the official Caffe code] [project web] Note: Aka C3D. [Python Wrapper] Note that the official caffe does not support python wrapper. [TensorFlow], [TensorFlow + Keras], [Another TensorFlow Implemetation], [Keras C3D Project web]: [Keras code], [Pretrained weights].
- CortexNet: a Generic Network Family for Robust Visual Temporal Representations A. Canziani and E. Culurciello - arXiv2017. [code] [project web]
- Slicing Convolutional Neural Network for Crowd Video Understanding - J. Shao et al, CVPR2016. [code]
- Two-Stream (RGB and Flow) pretrained model weights
- 20BN-JESTER, 20BN-SOMETHING-SOMETHING
- AVA
- ActivityNet Note: They provide a download script and evaluation code here .
- Charades
- THUMOS14 Note: It overlaps with UCF-101 dataset.
- THUMOS15 Note: It overlaps with UCF-101 dataset.
- HOLLYWOOD2: Spatio-Temporal annotations
- UCF-101, annotation provided by THUMOS-14, and corrupted annotation list, UCF-101 corrected annotations and different version annotaions. And there are also some pre-computed spatiotemporal action detection results
- UCF-50.
- UCF-Sports, note: the train/test split link in the official website is broken. Instead, you can download it from here.
- HMDB
- J-HMDB
- LIRIS-HARL
- KTH
- MSR Action Note: It overlaps with KTH datset.
- Sports-1M - Large scale action recognition dataset.
- YouTube-8M, technical report
- YouTube-BB, technical report
- Efficiently scaling up crowdsourced video annotation - C. Vondrick et. al, IJCV2013. [code]
- The Design and Implementation of ViPER - D. Mihalcik and D. Doermann, Technical report.
- Faster R-CNN - S. Ren et al, NIPS2015. [official MatCaffe code], [PyCaffe], [TensorFlow], [Keras] - State-of-the-art object detector.
- YOLO - J. Redmon et al, CVPR2016. [official code], [TensorFLow] - Fast object detector.
- YOLO9000 - J. Redmon and A. Farhadi, CVPR2017. [official code] - State-of-the-art object detector which can detect 9000 objects in realtime.
- SSD - W. Liu et al, ECCV2016. [official PyCaffe code], [TensorFlow], [Keras] - State-of-the-art object detector with realtime processing speed.
- Mask R-CNN - K. He et al, [TensorFlow], [PyTorch] - State-of-the-art object detection/instance segmentation algorithm.
- OpenPose Library - Caffe based realtime pose estimation library from CMU.
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - Z. Cao et al, CVPR2017. [code] depends on the [caffe RT pose] - Earlier version of OpenPose from CMU.
License
To the extent possible under law, Jinwoo Choi has waived all copyright and related or neighboring rights to this work.
Please read the contribution guidelines. Then please feel free to send me pull requests or email (jinchoi@vt.edu) to add links.