Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
adi271001 authored Jul 27, 2024
1 parent c0e742b commit bcd604e
Showing 1 changed file with 91 additions and 0 deletions.
91 changes: 91 additions & 0 deletions Asthma Disease Detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Asthma Disease Detection

## Goal
The goal of this project is to build a machine learning model to accurately detect asthma disease using various classification algorithms.

## Dataset
The dataset used in this project is sourced from [Asthma Disease Dataset on Kaggle](https://www.kaggle.com/datasets/rabieelkharoua/asthma-disease-dataset). It contains data relevant to asthma disease detection.

## Description
This project involves training and evaluating multiple machine learning models to detect asthma disease. The models' performance is compared using metrics such as accuracy and ROC AUC score. Confusion matrices are also plotted for each model to visualize the performance.

## What I Had Done
1. Loaded the asthma disease dataset from Kaggle.
2. Split the dataset into training and testing sets.
3. Implemented multiple machine learning models.
4. Trained and evaluated each model.
5. Plotted confusion matrices for each model.
6. Saved the results (accuracy and ROC AUC) to a CSV file.
7. Generated a comparison plot of the models' performance.

## Models Implemented
- Logistic Regression
- Random Forest
- Gradient Boosting
- Support Vector Machine
- XGBoost
- K-Nearest Neighbors
- AdaBoost
- Extra Trees
- Bagging
- CatBoost
- LightGBM
- Naive Bayes
- Decision Tree
- Stacking Classifier

## Libraries Needed
- pandas
- matplotlib
- seaborn
- scikit-learn
- xgboost
- catboost
- lightgbm
- mlxtend

## EDA Results
Exploratory Data Analysis (EDA) revealed the following key points:
- Feature distributions and correlations
- Class imbalance in the dataset
- Identification of important features for asthma disease detection

![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___5_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___6_0.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___8_2.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___9_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___10_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___11_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___12_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___13_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___14_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___15_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___16_2.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___17_1.png?raw=true)
![eda](https://github.com/adi271001/ML-Crate/blob/Asthma-Disease/Asthma%20Disease%20Detection/Images/__results___18_0.png?raw=true)

## Performance of the Models based on Accuracy Scores
- Logistic Regression: 95.20%
- Random Forest: 95.20%
- Gradient Boosting: 94.99%
- Support Vector Machine: 95.20%
- XGBoost: 95.20%
- K-Nearest Neighbors: 95.20%
- AdaBoost: 95.20%
- Extra Trees: 95.20%
- Bagging: 94.78%
- CatBoost: 95.20%
- LightGBM: 95.20%
- Naive Bayes: 95.20%
- Decision Tree: 87.47%
- Stacking Classifier: 95.20%

## Conclusion
The Logistic Regression, Random Forest, Support Vector Machine, XGBoost, K-Nearest Neighbors, AdaBoost, Extra Trees, CatBoost, LightGBM, Naive Bayes, and Stacking Classifier all achieved the highest accuracy of 95.20%. The Decision Tree model performed the worst with an accuracy of 87.47%. These results indicate that ensemble methods and advanced gradient boosting techniques tend to perform well on this dataset.

## Signature
**Name:** Aditya D
**Github:** [https://www.github.com/adi271001](https://www.github.com/adi271001)
**LinkedIn:** [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)
**Topmate:** [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)
**Twitter:** [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)

0 comments on commit bcd604e

Please sign in to comment.