Skip to content

CV-ZMH/human-action-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Real Time Multi Person Action Recognition

visitors

News

💥 Added tensorrt conversion script for reid models.

💥 Added reid models which are trained on mars and market1501 datasets.

💥 Added trained weight of siamesenet networks and training script for reid model. They are used in cosine metric learnings of deep sort pipeline.

💥 Added debug-tracker flag to demo.py script for visualizing tracker bboxes and keypoints bboxes. So, you can easily learn by visualizing how the tracker algorithm works.

Pretrained actions, total 9 classes : ['stand', 'walk', 'run', 'jump', 'sit', 'squat', 'kick', 'punch', 'wave']

Fight scene demo Fight scene debug demo
Street scene demo Street scene debug demo
Street walk demo Street walk debug demo

Table of Contents


Overview

This is the 3 steps multi-person action recognition pipeline. But it achieves real time performance with 33 FPS for whole action recognition pipeline with 1 person video. The steps include:

  1. pose estimation with trtpose
  2. people tracking with deepsort
  3. action classifier with dnn

Overview of Action Recognition Pipeline

Action classifier is used from this repo and his dataset also.

Inference Speed

Tested PC specification

  • OS: Ubuntu 18.04
  • CPU: Ryzen 5 3600 @3.766GHz
  • GPU: RTX 2060
  • CUDA: 10.2
  • TensorRT: 7.1.3.4

❗ Below table is based on a single person video. For multi person testing, the result may vary.

Pipeline Step Model Step's Model Input Size (H, W) Pytorch FPS TensorRT FPS
Pose Estimation densenet121 (256x256) 25 fps 38 fps
Pose Estimation + Tracking densenet121 + deepsort siamese reid (256x256) + (256x128) 22 fps 34 fps
Pose Estimation + Tracking densenet121 + deepsort wideresnet reid (256x256) + (256x128) 22 fps 31 fps
Pose Estimation + Tracking + Action densenet121 + deepsort siamese reid + dnn (256x256) + (256x128) + (--) 21 fps 33 fps
Pose Estimation + Tracking + Action densenet121 + deepsort wideresnet reid + dnn (256x256) + (256x128) + (--) 21 fps 30 fps

Installation

First, Python >= 3.6

Step 1 - Install Dependencies

Check this installation guide for deep learning packages installation.

Here is required packages for this project and you need to install each of these.

  1. Nvidia-driver 450
  2. Cuda-10.2 and Cudnn 8.0.5
  3. Pytorch 1.7.1 and Torchvision 0.8.2
  4. TensorRT 7.1.3
  5. ONNX 1.9.0

Step 2 - Install torch2trt

git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python3 setup.py install --plugins

Step 3 - Install trt_pose

git clone https://github.com/NVIDIA-AI-IOT/trt_pose
cd trt_pose
sudo python setup.py install

Other python packages are in requirements.txt.

Run below command to install them.

pip install -r requirements.txt

Run Quick Demo

Step 1 - Download the Pretrained Models

Action Classifier Pretrained models are already uploaded in the path weights/classifier/dnn.

  • Download the pretrained weight files to run the demo.
Model Type Name Trained Dataset Weight
Pose Estimation trtpose COCO densenet121
Tracking deepsort reid Market1501 wide_resnet
Tracking deepsort reid Market1501 siamese_net
Tracking deepsort reid Mars wide_resnet
Tracking deepsort reid Mars siamese_net
  • Then put them to these folder
    • deepsort weight to weights/tracker/deepsort/
    • trt_pose weight to weights/pose_estimation/trtpose.

Step 2 - TensorRT Conversion (Optional)

If you don't have installed tensorrt on your system, just skip this step. You just need to set pytorch model weights of the corresponding model path in this config file.

Convert trtpose model

# check the I/O weight file in configs/trtpose.yaml
cd export_models
python convert_trtpose.py --config ../configs/infer_trtpose_deepsort_dnn.yaml

‼️ Original densenet121_trtpose model is trained with 256 input size. So, if you want to convert tensorrt model with bigger input size (like 512), you need to change size parameter in configs/infer_trtpose_deepsort_dnn.yaml file.

Convert Deepsort reid model, pytorch >> onnx >> tensorRT

cd export_models
#1. torch to onnx
python convert_reid2onnx.py \
--model_path <your reid model path> \
--reid_name <siamesenet/wideresnet> \
--dataset_name <market1501/mars> \
--check

#2. onnx to tensorRT
python convert_reid2trt.py \
--onnx_path <your onnx model path> \
--mode fp16 \
--max_batch 100

#3. check your tensorrt converted model with pytorch model
python test_trt_inference.py \
--trt_model_path <your tensorrt model path> \
--torch_model_path <your pytorch model path> \
--reid_name <siamesenet/wideresnet> \
--dataset_name <market1501/mars> \

Step 3 - Run Demo.py

Arguments list of Demo.py

  • task [pose, track, action] : Inference mode for testing Pose Estimation, Tracking or Action Recognition.
  • config : inference config file path. (default=../configs/inference_config.yaml)
  • source : video file path to predict action or track. If not provided, it will use webcam source as default.
  • save_folder : save the result video folder path. Output filename format is composed of "{source video name/webcam}{pose network name}{deepsort}{reid network name}{action classifier name}.avi". If not provided, it will not save the result video.
  • draw_kp_numbers : flag to draw keypoint numbers of each person for visualization.
  • debug_track : flag to debug tracking for tracker's bbox state and current detected bbox of tracker's inner process with bboxes visualization.

‼️ Before running the demo.py, you need to change some parameters in confiigs/infer_trtpose_deepsort_dnn.yaml file.

Examples:

Then, Run action recogniiton.

cd src
# for video, use --source flag to your video path
python demo.py --task action --source ../test_data/fun_theory.mp4 --save_folder ../output --debug_track
# for webcam, no need to provid --source flag
python demo.py --task action --save_path ../output --debug_track

Run pose tracking.

# for video, use --src flag to your video path
python demo.py --task track --source ../test_data/fun_theory.mp4 --save_path ../output
# for webcam, no need to provid --source flag
python demo.py --task track --save_path ../output

Run pose estimation only.

# for video, use --src flag to your video path
python demo.py --task pose --source ../test_data/fun_theory.mp4 --save_path ../output
# for webcam, no need to provid --source flag
python demo.py --task pose --save_path ../output

Training

Train Action Classifier Model

cd src && bash ./train_trtpose_dnn_action.sh

Train reID Model for DeepSort Tracking

To train different reid network for cosine metric learning used in deepsort:

  • Download the reid dataset Mars
  • Prepare mars dataset with this command. This will split train/val from mars bbox-train folder and calculate mean & std over the train set. Use this mean & std for dataset normalization.
cd src && python prepare_mars.py --root <your dataset root> --train_percent 0.8 --bs 256
  • Modify the tune_params for multiple runs to find hyper parameter search as your need.

  • Then run below to train reid network.

cd src && python train_reid.py --config ../configs/train_reid.yaml

References

TODO

  • Add different reid network used in DeepSort
  • Add tensorrt for reid model
  • Add more pose estimation models
  • Add more tracking methods
  • Add more action recognition models