Implementation of the Automatic Recognition with VAS Index (pain index) with the aim of demonstrating the effectiveness of the Random Forest on the problem.
This project has the task of extending and trying to improve the results obtained by our colleague Alessandro Arezzo in his work, using a different supervised learning model (Random Forest Regressor).
- Python
- Scikit-learn: It's a simple and efficient tools for predictive data analysis.
The Model is a Random Forest Regressor. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
The Datasets used are two:
- UNBC-McMASTER Shoulder Pain Expression : contains video sequences of patients' faces when they were actively and passively moving their shoulders following painful impulses. You can download it here.
- BioVid Heat Pain Database (BioVid) : is a recent dataset created to improve the reliability and objectivity of pain measurement.
You can download it here.
The project is based on two scripts called PreliminaryClustering.py
and ModelRFR.py
, which have the purpose of implementing respectively the phase suitable for extracting the relevant configurations and that relating to the management of Random Forest Regression. The script used to perform these tests is test_regression.py
, whose purpose is to be able to compare the results obtained when the value used as a threshold for neutral configurations varies and at the same time evaluate the different groupings of landmarks. This is done by scrolling through 5 groups of landmarks, representing in order the eyes, nose, mouth, the best configuration and all the landmarks. The other script implemented, called generate_model_predictor.py
, allows instead to evaluate the performance of a fixed model both the number of kernels of the GMM and the threshold to be used for the extraction of neutral configurations.
To install this project locally you need to clone this repository with the command
git clone https://github.com/LorenzoGianassi/Automatic-Recognition-VAS-Index-with-Random-Forest.git
To download the folder data\dataset you have to install Git Large File Storage at the following link : Git LFS.
The next step is to run this command:
git lfs install
If the Git LFS generate this error due to the limited bandwidth :
"This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access."
You can download the entire project as file .zip from this repo and then replace the folder dataset with the one at this downloadable link : Dataset Folder.
Then put the extracted file (dataset folder) inside the folder data
.
As regard the dependecies they are reported inside the file requirements.txt
. You can install them using the command
pip install -r requirements.txt
To run the code you can use one of this two script:
generate_model_predictor.py
: it performs the training of the model with a number of kernels of the GMM and a threshold of the neutral configurations. The number of kernels and the value of threshold used can be defined inside the fileconfig.py
.
At the end of the process a confusion matrix one for the test set and one train set are obtained. Futhermore a grapic representation of one randomly selected decision tree from the forest is generated.test_regression.py
: it performs the training of the model by cycling over a set of thresholds definded a priori inconfig.py
and a sets of definded landmarks.
At the end of process a folder will be created for each group of landmarks. Inside of them the confusion matrices of train set and test set, the graphs of the mean absulute error, the decision trees and a graph of the total mean absolute error are generated threshold by threshold. Furthermore a graph containing the mean absolute error of each set of landmarks as shown in the figure below
To set the parameters you have to change the values inside the config.py
file.
You can set the following parameters:
- type_of_database : the type of database can be set as 'BioVid' or 'original'. 'original' correspond to UNBC dataset.
- hyperparameter : if is set to 'True' the script
test_regression.py
will perform regression usingRandomizedSearchCV()
. - num_tree : if is set to 'True' the script
generate_model_predictor.py
will print the graph which report the gap between the results obtained on train and test set used to evaluate the overfit. - cross_val_protocol : type of protocol to be used to evaluate the performance of the models. The type of protocol can be set as 'Leave-One-Subject-Out', '5-fold-cross-validation' or 'Leave-One-Sequence-Out'.
- selected_lndks_idx : it specifies the indexes of the landmarks to be considered during the procedure.
- n_jobs : number of threads to use to perform Random Forest Regressor training.
- thresholds_neutral_to_test : it defines the range of threshold values to be used in the
test_regression.py
script. - n_kernels_GMM : it defines the number of kernels to be used for the Gaussian Mixture in the preliminary clustering phase.
The graphs representing the results are stored inside the
data
folder.
- Lorenzo Gianassi
- Francesco Gigli
Image and Video Analysis Project © Course held by Professor Pietro Pala - Computer Engineering Master Degree @University of Florence