credit_risk

Credit Risk Evaluation using Machine Learning

Abstract

This undertaking aims to challenge the traditional credit assessment process by utilizing innovative alternative data sources to determine creditworthiness for individuals with limited or no credit history. The objective is to empower these individuals by providing them with access to credit.

The approach is to use diverse data points, such as annual income, product price, population demographics, and age of phone number, to accurately evaluate an individual's creditworthiness. Consistent data is meticulously checked for using advanced data visualization and exploratory data analysis techniques.

Relevant features are identified by conducting a detailed correlation analysis of all features with the target variable, a binary indicator of credit default. Advanced data wrangling techniques are also applied to fix any issues that may arise.

The ultimate goal is to build a machine learning model that can predict credit default with a high degree of accuracy. Advanced machine learning model evaluation techniques such as cross-validation scores, accuracy, confusion matrices, precision, and recall are utilized.

Data visualization revealed some interesting observations such as more female applicants, more single applicants, more married applicants, etc. Further data analysis showed the presence of a class imbalance in the target variable, which could negatively impact the performance of the machine learning model, so it was addressed by fixing the class imbalance.

By using Extreme Gradient Boosting (XGBoost) machine learning algorithm, a highly accurate model for predicting credit default was developed. This ambitious undertaking has the potential to transform the credit assessment process and provide previously underserved individuals with access to credit.

Final Model

Extreme gradient boosting algorithm (XGBoost) after applying synthetic minority oversampling technique.
It was correct 98% of the time in identifying credit defaulters (precision).
It was able to identify 94% of the possible credit defaulters (recall).

Conclusion and real-world reflections

The best model (XGBoost with SMOTE) to identify the credit defaulter has been successfully identified using model evaluation metrics like precision and recall.
Simplicity sometimes is preferred over accuracy due to regulatory compliances that require the model predictions to be interpretable and attributable to a specific feature, this would result in choosing an interpretable model over an accurate model.
Privacy concerns related to gathering data for individuals with limited or no credit history where Adhoc data is used to assess the creditworthiness

Future Scope

Deploying the final ML model
Can model a continuous variable to generate a custom credit score.
The use of Adhoc data from the bureau will further improvise the model
A dynamic model based on geospatial information.
Use of data scraped from bank statement

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
CreditRisk_PPT_Vaddhiparthy.pdf		CreditRisk_PPT_Vaddhiparthy.pdf
CreditRisk_PPT_Vaddhiparthy.pptx		CreditRisk_PPT_Vaddhiparthy.pptx
README.md		README.md
Read_me.txt		Read_me.txt
creditrisk_Vaddhiparthy.pdf		creditrisk_Vaddhiparthy.pdf
creditrisk_vaddhiparthy.ipynb		creditrisk_vaddhiparthy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credit_risk

Abstract

Final Model

Conclusion and real-world reflections

Future Scope

About

Releases

Packages

Languages

vaddhiparthy/credit_risk

Folders and files

Latest commit

History

Repository files navigation

credit_risk

Abstract

Final Model

Conclusion and real-world reflections

Future Scope

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages