National Action Council for Minorities in Engineering(NACME) Google Applied Machine Learning Intensive (AMLI) at the University of Arkansas
Developed by:
- N'kira Brooks - New York University
- Lizbet Rivera - University of Arkansas
- Steve Liang - University of Arkansas
This project identifies and labels 27 human hand gestures and connects them to keys on a keyboard. Originally, this model was designed to use simple hand gestures to control videos and presentations; however, it can be modificed to apply the gestures to different actions.
- Make sure your computer has a GPU, otherwise the project will not run sucessfully
- You need a webcam to record your gestures to use this model.
@inproceedings{lin2019tsm,
title={TSM: Temporal Shift Module for Efficient Video Understanding},
author={Lin, Ji and Gan, Chuang and Han, Song},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2019}
}
See the [full video] of our demo on NVIDIA Jetson Nano.
[NEW!] We have updated the environment set up by using onnx-simplifier
, which makes the deployment easy. Thanks for the advice from @poincarelee!
We show how to deploy an online hand gesture recognition system on NVIDIA Jetson Nano. The model is based on MobileNetV2 backbone with Temporal Shift Module (TSM) to model the temporal relationship. It is compiled with TVM [1] for acceleration.
The model can achieve real-time recognition. Without considering the data IO time, it can achieve >70 FPS on Nano GPU.
[1] Tianqi Chen et al., TVM: An automated end-to-end optimizing compiler for deep learning, in OSDI 2018
We used an online version of Temporal Shift Module in this demo. The model design is shown below:
After compiled with TVM, our model can efficient run on low-power devices.
We show how to set up the environment on Jetson Nano, compile the PyTorch model with TVM, and perform the online demo from camera streaming.
- Get an NVIDIA Jeston Nano board (it is only $99!).
- Get a micro SD card and burn the Nano system image into it following here. Insert the card and boot the Nano. Note: you may want to get a power adaptor for a stable power supply.
- Check if OpenCv 4.X is installed (it is now included in SD card image from r32.3.1)
$ Python3
>> Import cv2
>> cv2.__version__
It should show 4.X. If not, build OpenCV 4.0.0 using this script, so that we can enable camera access (It may take a while due to the weak CPU). You also need add cv2 package to path import search path.
export PYTHONPATH=/usr/local/python
- Follow here to install PyTorch and torchvision.
- Build TVM with following commands
sudo apt install llvm # install llvm which is required by tvm
git clone -b v0.6 https://github.com/apache/incubator-tvm.git
cd incubator-tvm
git submodule update --init
mkdir build
cp cmake/config.cmake build/
cd build
#[
#edit config.cmake to change
# 32 line: USE_CUDA OFF -> USE_CUDA ON
#104 line: USE_LLVM OFF -> USE_LLVM ON
#]
cmake ..
make -j4
cd ..
cd python; sudo python3 setup.py install; cd ..
cd topi/python; sudo python3 setup.py install; cd ../..
- Install ONNX
# install onnx
sudo apt-get install protobuf-compiler libprotoc-dev
pip3 install onnx
- Install onnx-simplifier
git clone https://github.com/daquexian/onnx-simplifier
cd onnx-simplifier
# remove requirement 'onnxruntime >= 1.2.0' in setup.py, as it is not actually used
pip install .
cd ..
- export cuda toolkit binary to path
export PATH=$PATH:/usr/local/cuda/bin
- Finally, run the demo. The first run will compile the PyTorch TSM model into TVM binary first and then run it. Later run will directly execute the compiled TVM model.
python3 main.py
Press Q
or Esc
to quit. Press F
to enter/exit full-screen.
- No gesture
- Stop Sign
- Drumming Fingers
- Thumb Up
- Thumb Down
- Zooming In With Full Hand
- Zooming In With Two Fingers
- Zooming Out With Full Hand
- Zooming Out With Two Fingers
- Swiping Down
- Swiping Left
- Swiping Right
- Swiping Up
- Sliding Two Fingers Down
- Sliding Two Fingers Left
- Sliding Two Fingers Right
- Sliding Two Fingers Up
- Pulling Hand In
- Pulling Two Fingers In
Since we used the repo from MIT lab, if you have any questions, please contact the following people:
Ji Lin, jilin@mit.edu
Yaoyao Ding, yyding@mit.edu