Skip to content

Machine Learning algorithms for the diagnosis of Parkinson's Disease.

License

Notifications You must be signed in to change notification settings

georgios-kalomitsinis/Parkinson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parkinson

Parkinson's disease is one of the most painful, dangerous and incurable diseases that occur in older people (mainly over 50 years). It concerns the death of dopamine neurons in the brain. This neurodegeneration leads to a range of symptoms, such as coordination issues, slowness of movement, voice changes, stiffness and even progressive disability. So far, there is no cure, although there is medication that offers a significant relief of symptoms, especially in the early stages of the disease. Therefore, it is crucial to develop more sensitive diagnostic tools for detecting the disease, which is the main goal of this repository to discriminate healthy people from those with parkinson disease (PD).

Figure 1. Stages of PD.

Dataset Description

In this repository, the dataset is obtained from UCI Machine Learning Repository. This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with PD. Each column in the datset is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals.

name ASCII subject name and recording number
MDVP:Fo(Hz) Average vocal fundamental frequency
MDVP:Fhi(Hz) Maximum vocal fundamental frequency
MDVP:Flo(Hz) Minimum vocal fundamental frequency
MDVP:Jitter(%)
MDVP:Jitter(Abs)
MDVP:RAP
MDVP:PPQ
Jitter:DDP
Several measures of variation in fundamental frequency
MDVP:Shimmer
MDVP:Shimmer(dB)
Shimmer:APQ3
Shimmer:APQ5
MDVP:APQ
Shimmer:DDA
Several measures of variation in amplitude
NHR
HNR
Two measures of ratio of noise to tonal components in the voice
status Health status of the subject
(one) - Parkinson's
(zero) - healthy
RPDE
D2
Two nonlinear dynamical complexity measures
DFA Signal fractal scaling exponent
spread1
spread2
PPE
Three nonlinear measures of fundamental frequency variation

Table 1. Attribute Information.

Figure 2. PD and healthy voice instances.

Methodology

Each person has 6 or 7 voice measurements. For the evaluation of each algorithm taken into account, the dataset was divided into individuals and not at the level of voice measurements. Furthermore, the split of the dataset was performed 10 times, with different people in the train set and test set, with train_size = 0.8, where it is equivalent to 25 people. Also, The GridSearchCV procedure was applied to find the best hyperparameters of each algorithm (LeaveOneGroupOut method).

Figure 3. Workflow of the developed module.

Modelling and Evaluation

ALGORITHMS

  • Logistic regression
  • Decision Tree classifier
  • Gaussian Naive Bayes
  • Random Forest
  • Support Vector Machine
  • XGB classifier

METRICS

Due to the nature of the problem, as a medical, the goal is to reduce positive inaccuracies in the calculation. Either the precision score or the recall do not cover the purpose, as well as the accuracy. Therefore, for better results, the f1-score measure is taken into account, where a balance between precison and recall is sought even in imbalanced classes.

Name Formula
Accuracy equation
Precision equation
Recall equation
F-Score equation

Table 2. Calculated metrics where TP, TN, FP, FN corresponds to True Positives, True Negatives, False Negatives and False Positives, respectively.

Results

Figure 4. Average of the metrics of each classifier.

License

This project is licensed under the MIT License.

About

Machine Learning algorithms for the diagnosis of Parkinson's Disease.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages