The Question: What can be developed that informs Great Lakes Hospital whether a person will have a cardiovascular disease based on their characteristics, habits, or other factors?
The Audience: The results will be presented to the Marketing and Analytics team for use and distribution to the medical teams and the surrounding public.
The Data: Dataset from the CDC taken from a nationwide telephone survey in 2020 about US residents’ health status. The data included 401,958 rows and 279 columns, but was reduced to 18. The reduced data can be found on Kaggle.
The Modeling Response: heart disease = 1, no heart disease = 0
The Model: Binary supervised classification using Naive Bayes with a Recall score of 70% and ROC AUC score of 82%
The Deliverables: Jupyter notebooks of data wrangling through modeling, a final report, and a presentation
- Heart Disease Data Wrangling
- Heart Disease EDA
- Heart Disease Pre-processing and Modeling
- Final Report
- Model Metrics