This repository contains a Jupyter Notebook that explores and builds machine learning models for predicting heart disease. The dataset used is derived from the UCI Machine Learning Repository and includes various medical attributes such as age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, and more.
-
Data Loading and Exploration: The notebook begins by loading the dataset and performing exploratory data analysis (EDA) to understand the distribution of features and their relationships with the target variable.
-
Data Preprocessing: Various preprocessing steps are applied, including handling missing values, encoding categorical variables, and scaling numerical features.
-
Model Building and Evaluation: Several machine learning models are trained and evaluated, including:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
-
Model Performance: The performance of each model is assessed using metrics such as accuracy, precision, recall, and F1-score. Confusion matrices and classification reports are also provided.
To run the notebook, you need to have the following libraries installed:
pandas
numpy
matplotlib
seaborn
scikit-learn
You can install these libraries using pip
:
pip install pandas numpy matplotlib seaborn scikit-learn