This repository contains the project report and analysis code, focusing on the influence of lifestyle habits and medical conditions on diabetes prevalence.
Diabetes is a significant chronic health condition in the United States, impacting millions of individuals annually and posing substantial economic burdens. Understanding the factors contributing to diabetes prevalence is crucial for effective prevention and intervention strategies. This project explores the relationship between lifestyle habits, medical conditions, and diabetes prevalence using data from the Behavioral Risk Factor Surveillance System (BRFSS) survey conducted by the CDC.
- Introduction: Provides an overview of diabetes and its impact on public health.
- Motivation: Discusses the need for innovative approaches to address diabetes prevalence and associated risks.
- Data Description: Describes the dataset used, its sources, and key variables.
- Data Preprocessing: Details the steps taken to clean, preprocess, and prepare the dataset for analysis.
- Exploratory Data Analysis (EDA): Visualizes and analyzes data patterns, correlations, and distributions to gain insights.
- Models Used and Performance Evaluation: Outlines the models employed (Logistic Regression, Decision Tree, K-Nearest Neighbors) and evaluates their performance in predicting diabetes risk.
- Conclusion: Summarizes findings, model preferences, computational considerations, and suggestions for further exploration.
The project provides valuable insights into the relationship between lifestyle habits, medical conditions, and diabetes prevalence. While various models demonstrated similar test accuracies, logistic regression and decision tree models were preferred due to their interpretability and computational efficiency. K-Nearest Neighbors, while effective, posed computational challenges with larger datasets. Further exploration, including ensemble methods or hyperparameter tuning, could enhance predictive capabilities and provide deeper insights into diabetes risk factors. Overall, the project contributes to the understanding of diabetes prevention and improved public health outcomes.
For detailed information and analysis code, please refer to the project documentation and code available in this repository.