A proven way to reduce the spread of the Corona Virus is by the use of a Face Mask. Wearing face masks that cover the nose and mouth would curtail the spread of the virus. Hence, it is made mandatory to wear a mask in public places in most parts of the world. However, the face mask causes some inconveniences, such as difficulty in breathing, fogging of eyeglasses, and so on. Hence, many people wear it incorrectly, thereby exposing their nose and mouth. This is not a good practise.
It is important for people to wear face masks when they are in public places. By making use of the technology available, this project demonstrates the use of a Mask Detector which could detect people wearing masks. What's more, it can also detect people wearing the mask incorrectly. This can be used in public places, for example, at different sections of a shopping mall or store and so on. The model could detect more than one person and accurately predict whether they are wearing a face mask correctly or not. This information could be used to make them wear it or wear it properly.
On the large scale, proper use of face masks could help humanity end the pandemic earlier. I hope, this project helps to use AI and Computer Vision for the benefit of society.
This project uses a NVIDIA Jetson Nano 2GB module and a Jelly Comb 1080P HD Webcam.
The Jetson runs on the JetPack SDK. This device uses TensorRT to run the machine learning networks on the embedded platform. This project has been based on the guide Jetson-Inference. The network- DetectNet - was given in this Inference and the training of the model on the custom dataset along with labeling was done by myself.
The model used for doing the object detection was Single Shot Detector [SSD-Mobilenet]. A pre-trained model such as SSD is a great way to start as it is much better than starting from scratch. This is called Transfer Learning. SSD-Mobilenet is a popular network architecture that uses the SSD-300 Single-Shot MultiBox Detector with a Mobilenet backbone for fast and real-time inference. The PyTorch framework was used for the Transfer Learning.
The data was collected by moving around the subject across the camera's field of view and accurately marking the bounding boxes. Care was taken to cover the subject tightly. This will make sure the accuracy is good. More than 80 images were taken for each case and various light settings and various positions. This will help the model learn well. The tool to collect data can be found here. This tool helps to create a dataset in the Pascal VOC format.
The dataset collected was trained on the SSD by specifying the type of dataset, the path of the dataset, and the model directory. The model was later converted to the ONNX model. The training was done for 30 epochs. The training time was about 2 hours and the classification loss could be seen to reduce and stagnate around 18-20 epochs.
This ONNX model is to be loaded in detectnet, with few specifications, like the following:
detectnet --model=$NET/ssd-mobilenet.onnx --labels=$NET/labels.txt
--input-blob=input_0 --output-cvg=scores --output-bbox=boxes
csi://0
One needs to specify the model path, labels path, details regarding boxes, and then the camera for the live camera stream.
Here are the results of this project. It can detect people those who are wearing masks, wearing the mask improperly, or not a wearing mask. The below images show the Mask detector at action. The face is detected and bounded by a box. The algorithm also displays the confidence in the prediction within this box.