This project predicts credit risk by determining which customers are likely to pay their loans on time and which are not. We will employ various data science techniques and methodologies.
We will cover the following concepts:
-
🔍 Exploratory Data Analysis (EDA):
- Analyze the dataset to uncover patterns, spot anomalies, and check assumptions using statistical summaries and visualizations.
-
🛠️ Data Preprocessing:
- Clean the data, handle missing values, encode categorical variables, normalize data, and split into training and testing sets.
-
⭐ Feature Importance:
- Identify key features influencing
loan_status
to improve model performance and interpretability.
- Identify key features influencing
-
🔽 Dimensionality Reduction:
- Use techniques like PCA to reduce the number of features while retaining essential information, speeding up model training, and reducing overfitting.
-
🤖 Predictive Modeling:
- Build various models (logistic regression, decision trees, random forests, gradient boosting) to predict
loan_status
.
- Build various models (logistic regression, decision trees, random forests, gradient boosting) to predict
-
⚙️ Hyperparameter Optimization:
- Perform hyperparameter tuning with Optuna to find the best parameters for our models.
-
🧪 Model Testing:
- Evaluate models using metrics like accuracy, precision, recall, F1 score, and classification report to test performance on unseen data.
-
⛰️ Git LFS:
- Handle large files using Git LFS.
The primary goal of this project is to predict the loan_status
, determining the likelihood of customers paying their loans on time. This helps financial institutions make informed loan approval decisions and manage risk effectively.
View this project on nbviewer here.