In this project the goal is to detect and segment hands inside images. In order to achieve this, the main idea was to use for the :
- detection : Sliding Window approach with a Resnet50V2 CNN pre trained able to distinguish between hands and not hands. Moreover, to avoid several overlappings a non maxima suppression is also implemented
- segmentation : A Resenet18 CNN with the a deeplab encoder/decoder architecture and, some post processing operations in order to segment properly hands inside the images
Overview :
In order to compile the code, you need to follow the upcoming steps :
- Go inside the
Project
directory i.e.cd Project
- Create a directory called
build
i.e.mkdir build
- Execute the following two commands
cmake ..
andcmake --build .
To create the dataset in order to train our model we need to follow the steps listed below:
- Dowload the EgoHands Dataset from : http://vision.soic.indiana.edu/egohands_files/egohands_data.zip
- Dowload the hand_over_face Dataset from : https://drive.google.com/file/d/1hHUvINGICvOGcaDgA5zMbzAIUv7ewDd3
- Dowload the TestSet from : https://drive.google.com/drive/folders/1ORmMRRxfLHGLKgqHG-1PKx1ZUCIAJYoa?usp=sharing
- Rename the
.zip
file downloaded for the testset intotestset.zip
- Put all of those files in the root directory
- Execute
python_scripts/build.py
i.e. :python python_scripts/build.py
orcd python_scripts
and thenpython build.py
and follow the instructions you will asked to execute some matlab code, please do it!! - Go inside the directory
dataset/dataset/
and typectrl + a
on the keyboad, and rigth click and compress as a.zip
file.
Notice that, you can skip the above steps and download immediatly the dataset.zip
file from : https://drive.google.com/file/d/1AxwsNnBCtxB2LLJ1q_N-YbTov9zbgKMh/view?usp=sharing
To create the dataset, it was done the following split :
- 10% Test Set
- 90% Traing Set and Validation Set in particular :
- 75 % Training Set
- 25 % Validation Set
In order to train the CNN you need to follow the following steps :
-
if we want to train locally the CNN we need to follow the steps lists below :
- First, you need to make sure that all the dependancies needed for the training process are satified. To do so, you need to check that in your enviroment (conda / pip), the following packages are installed :
- tensorflow (better if gpu version)
- opencv
- scikitlearn
- numpy
- matplotlib
- imutils
- If all the packages are present, then, create the directory
python_scripts/dataset/
and extract inside of it thedataset.zip
file - Next, execute
python_scripts/fine_tune_cnn.py
, a filemodel.h5
will appear insidepython_scripts/
- First, you need to make sure that all the dependancies needed for the training process are satified. To do so, you need to check that in your enviroment (conda / pip), the following packages are installed :
-
If we want to use google colab for the training process we need to follow the steps lists below :
- Upload the file
dataset.zip
previusly created (or downloaded) in your private google drive and place it in a directory calledCV
- Then, by using this link : open the notebook that must be executed
- After, opening the notebook upload on the root of the colab enviroment (
/content/
folder) the following files/folders :python_scripts/fine_tune_cnn.py
(file)python_scripts/config/
(folder and its content)
- Moreover, execute all the code in the cells of the notebook except the last one
- Finally a
model.h5
file will appear on the root of the colab enviroment, well, download such file
- Upload the file
Notice that, you can skip the training process and download immediatly the model.h5
file, from : https://drive.google.com/file/d/1vm2T1bqheUVgpB0QdJYq9mGdIplQ6f4H/view?usp=sharing
The last step is to convert the model.h5
into .pb file in order to be able to use it in OpenCV. To do so, you need to follow the below steps :
-
if you want to do it locally :
- Optional : place the
model.h5
file underpython_scripts/
(notice that, this step must be done only if you trained your model with google colab) - Execute
python_scripts/convert_model_to_opencv.py
and then insidepython_scripts/model/
there will be the filemodel.pb
that can be later used to do inference
- Optional : place the
-
if you want to do it with google colab then:
- Optional : By using this link : open the notebook that must be executed (if not already opened)
- Optional : upload the
model.h5
file in the root of the google colab enviroment (notice that this must be done if the training process was done locally) - Upload in the root of the of the google colab enviroment :
python_scripts/convert_model_to_opencv.py
(file)- Optional :
python_scripts/config/
(folder and its content) (if not already done)
- Run the last cell of the notebook, and finally download the file
model/model.pb
Notice that, you can skip this process and download immediatly the model.pb
file from : https://drive.google.com/file/d/12nhBovdFL4O7X1d0FZbZOiSmgfYZnWmj/view?usp=sharing
In order to detect hands on an image, you need to execute the C++
code i.e. ./projectGroup05
with the following, possibiles parameters :
-d
or--detect
: to activativate the detection mode-m
or--model
: to specify the path of the model for detection-i
or--image
: to specify the path of the image for which detecting hands, default value :../testset/rgb/01.jpg
-a
or--annotation
: to specify the path of the annotation for the image for which detecting hands- Optional
--opd
: to specify the output path where to store the image with the bounding boxes drawn - Optional
--opius
: to specify the output path where to store the ious results of the image
Notice that, at least one of the two optional parameters, must be included into the command line execution instruction.
Example of a command :
./projectGroup05 -d -m="path_to_model" -i="path_to_image" -a="path_to_annotations" --opd="path_save_detection_result" --opious="path_save_ious"
The segmentation process is carried out with the usage of pre-computed masks from a matlab model (that we have developed ourselves and fine tuned it) for the following reasons :
-
The process of inference through the usage of opencv library require the usage of the conversion of the model (MATLAB), into one of the supported formats, as pointed out here : https://docs.opencv.org/3.4/d6/d0f/group__dnn.html#ga3b34fe7a29494a6a4295c169a7d32422 , so, we have converted the model (.mat) file into the open neural network exchage format (.onnx), you can in fact, download the model in such a format from here : https://drive.google.com/file/d/1DBnwFXbM1EwNn0TdItu6iNYisZzA4nbZ/view?usp=sharing
-
However, when we try to use the model with cv::readNetFromOnnx or cv::readNet and then we set input with net.setInput(..), a we compute the output net.forward(..) a strange behaviour happens in particular :
- The output computed with the usage of C++ code is different from the one computed with python.
- In particular, using python, the ouput is correct while using C++ is not. The proof of such a strange result can be found in the
problems
directory in particular, because we not inventing nothing, just observe the difference of the contents between the two files :problems/python/results.txt
andproblems/c++/source/results.txt
. So, what we can conclude is that with python everything works while on C++ not. Moreover, here also the link of similar problem to ours one : https://discuss.tvm.apache.org/t/different-output-for-large-yolo-onnx-model-in-python-correct-and-c-incorrect/11537
Notice that : the mentioned problem was encountered with opencv version 4.5.x compiled from source and also version 4.6.0 pre-compiled for C++, while, the opencv version of python used was 4.6.0
Therefore, the pre-computed masks for the testset (https://drive.google.com/drive/folders/1ORmMRRxfLHGLKgqHG-1PKx1ZUCIAJYoa?usp=sharing)can be downloaded from : https://drive.google.com/file/d/1SA8AVeyaTzi3CyLQsq_KO6FPdi2RWbMj/view?usp=sharing , as mentioned here
The dataset used to train the model for segment hands can be downloaded from this link : https://drive.google.com/file/d/17MidLfiswgdpIYQDlg3EcwZP8KzbDzXc/view?usp=sharing
For training the model, follow the instructions specified inside the file : matlab_scripts/segmentation/README.MD
First of all, in order to find out what are the hands inside an image you need to either :
- for each image that you what to segment, compute the output mask with the model previously trained
- if the testset is the one that can be downloaded from here : https://drive.google.com/drive/folders/1ORmMRRxfLHGLKgqHG-1PKx1ZUCIAJYoa?usp=sharing , then, just dowload the mask directly from : https://drive.google.com/file/d/1SA8AVeyaTzi3CyLQsq_KO6FPdi2RWbMj/view?usp=sharing (please extract the content of such
.zip
file)
In order to segment hands on an image, you need to execute the C++
code i.e. ./projectGroup05
with the following, possibiles parameters :
-s
or--segment
: to activativate the segmentation mode-i
or--image
: to specify the path of the image for which detecting hands, default value :../testset/rgb/01.jpg
-a
or--annotation
: to specify the path of the annotation for the image, i.e. the path to the ground truth mask--bwr
: to specify the path where is the raw mask provided in output by the model- Optional
--ops
: to specify the output path where to store the image with hands segmented drawn - Optional
--oppa
: to specify the output path where to store the pixel accuracy results of the image - Optional
--opbwm
: to specify the output path where to store the B&W mask
Notice that, at least one of the three optional parameters, must be included into the command line execution instruction.
Example of a command :
./projectGroup05 -s -i="path_to_image" -a="path_to_mask" --bwr="path_bw_mask_raw" --ops="path_save_segmentation_result" --oppa="path_save_pixelaccuracy" --opbwm="path_save_b&w_mask"