This project implements a neural network model for a binary classification problem using Python and NumPy. The model is trained, validated, and tested on a cleaned heart disease dataset. Key features include gradient descent optimization, early stopping for improved generalization, and evaluation using metrics like accuracy, precision, recall, and F1 score.
The goal of this project is to demonstrate the development of a simple neural network with one hidden layer to classify data into two categories (binary classification). The project focuses on:
- Dividing the dataset into training, validation, and test sets.
- Training the model using gradient descent.
- Employing early stopping to prevent overfitting using validation loss.
- Evaluating the model on unseen test data using standard metrics.
The dataset used is a preprocessed and cleaned heart disease dataset, containing numerical features and a binary target variable (0 or 1).
- Input Features: 13 features representing various health attributes.
- Output Target: Binary classification (0 = No disease, 1 = Disease).
- Data Split:
- Training Set: 70%
- Validation Set: 15%
- Test Set: 15%
The neural network has the following structure:
- Input Layer: Accepts 13 input features.
- Hidden Layer: Contains 38 neurons with the sigmoid activation function.
- Output Layer: 1 neuron with the sigmoid activation function for binary classification.
- Gradient Descent: Optimized weights and biases using gradient descent.
- Early Stopping: Monitored validation loss to stop training when performance stopped improving, with a patience of 2000 epochs.
- Evaluation Metrics: Accuracy, precision, recall, and F1 score calculated on the test set.
After training the model with early stopping, the following results were achieved on the test set:
Metric | Value (%) |
---|---|
Accuracy | 90 |
Precision | 89 |
Recall | 93 |
F1 Score | 91 |
These results indicate that the model performs well, with a balanced trade-off between precision and recall.
- Training: Implemented forward and backward propagation using NumPy.
- Early Stopping: Monitored validation loss and saved the best weights during training.
- Testing: Evaluated the model on unseen test data and reported comprehensive metrics.
The project uses the following libraries:
- NumPy: For mathematical computations.
- Pandas: For data handling.
- scikit-learn: For preprocessing and splitting datasets.
Install dependencies via:
pip install numpy pandas scikit-learn