The idea of this project is to transform any wall or surface into an interactive whiteboard just with an ordinary RGB camera and your hand. I hope you'll find it interesting !
- Jetson Xavier NX JetPack 4.4
- Raspberry Pi Camera + ArduCam (8MP IMX219 Sensor Module)
Note: The system works also on Jetson Nano, TX2
To use AI whiteboard correctly you need to find a wall or flat surface and place a camera at a distance of about 1 meter. It can be any wall/surface but the system works more accurately with the dark or light monotonous walls/surfaces. We capture an image from a camera. Then we crop this image into a square. Next, we use a hand detector[1] (YOLO[3] - deep neural network),to find a hand in the image. If there is a hand in the image, we crop that hand out of the image and feed it to a Fingertip detector[1] (modified VGG16 - deep neural network). Next, if we can detect fingertips, we use their coordinates to control the whiteboard (See the control section below).
- Jetson Xavier NX with JetPack 4.4 (CUDA 10.2, TensorRT 7.1.3, cuDNN 8.0)
- Install Tensorflow 1.15.3
2. Download AI Whiteboard project. $ git clone https://github.com/preste-ai/camera_ai_whiteboard.git
You can download needed packages via pip using the requirements.txt
file:
pip3 install -r requirements.txt
4. Download weights or TensorRT engines and put it to weights
or weights/engines
.
Note: The current TensorRT engines work correctly only on Jetson Xavier NX devices as TensorRT runs device-specific profiling during the optimization phase.If you want to use this models(engines) on others Jetson devices please convert .h5 model with h5_to_trt.py
script on your platform.
Check config.py
file and set up needed parameters.
- whiteboard_w : 200 - whiteboard width (px) (displayed on camera caputed image)
- whiteboard_h : 200 - whiteboard height (px) (displayed on camera caputed image)
- cam_w : 320 - width (px) of a captured image
- cam_h : 240 - height (px) of a captured image
- framerate : 60 - camera capture framerate (for Raspberry Pi Camera)
- zoom_koef : 2 - zoom coefficient to resize whiteboard_w and whiteboard_h
- confidence_ft_threshold : 0.5 - confidence threshold of Fingertips detector
- confidence_hd_threshold : 0.8 - confidence threshold of Hand detector
Run from a project root directory:
Jetson Devices
python3 ai_whiteboard.py --rpc --trt
- rpc : If you want to use a Raspberry Pi Camera. Default: False
- trt : If you want to use TensorRT engines. Default: False
Laptop
python3 ai_whiteboard.py
To draw | To move | To erase | To clean | To save |
---|---|---|---|---|
A custom dataset was collected and labeled (12,000 images) for training. For labeling I used CVAT.
- Train: 9,500 images
- Validation: 1000 images
- Test : 1500 images
Run from a project root directory:
python3 yolo_train.py
Run from a project root directory:
python3 yolo_test.py
The transformation takes place in 3 stages:
- Freeze graph and remove training nodes (.h5 -> .pb)
- Convert frozen graph to onnx (.pb -> .onnx)
- Convert onnx model to TensorRT engine (.onnx -> .engine)
Run from a project root directory:
python3 h5_to_trt.py --folder weights --weights_file yolo --fp 16
- folder weights : path to the folder with model
- weights_file : weights file name (without .h5)
- fp : TensorRT engine precision (16 or 32)
Metrics for Hand detection after model conversion.
In order to determine the correctness of the detection, we use the value of IOU. If the value of IOU is more than 0.5 then the detector predicts a hand correctly otherwise - no. The results are given below.
keras model before training | keras model after training | TensorRT engine (fp32) | TensorRT engine (fp16) | |
---|---|---|---|---|
Accuracy | 72.68 % | 89.14 % | 89.14 % | 89.07 % |
Precision | 84.80 % | 99.45 % | 99.45 % | 99.45 % |
Recall | 50.78 % | 77.24 % | 77.24 % | 77.10 % |
Captured image shape : 320x240 Jetson Xavier NX: power mode ID 2: 15W 6 cores
keras model | TensorRT engine (fp32) | TensorRT engine (fp16) | |
---|---|---|---|
Average FPS | 12 | 33 | 60 |
- Unified Gesture and Fingertip Detection : https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection
- TensorRT guide: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#python_topics
- YOLO9000: Better, Faster, Stronger : https://arxiv.org/abs/1612.08242