Skip to content

Latest commit

 

History

History
38 lines (30 loc) · 2.57 KB

File metadata and controls

38 lines (30 loc) · 2.57 KB

Capstone-Project--Credit Risk Analysis: Predicting Probability of Defaults Using Supervised Learning Models for Credit Risk Management.

This business problem is a supervised learning example for a credit card company. The objective is to predict the probability of default (whether the customer will pay the credit card bill or not) based on the variables provided. There are multiple variables on the credit card account, purchase and delinquency information, which can be used for modelling.

b) Need of the Study: Probability of Default modelling problems are meant for understanding the riskiness of the customers and how much credit is at stake in case the customer defaults.

c) Understanding Business Opportunity: This is an extremely critical part in any organization that lends money [both secured and unsecured loans].

When we study the Data Dictionary provided for the Probability of Default dataset, we can see that the data, which is the history of transaction of customers, has been collected considering three types of variables: Credit Card Account, Purchase and Delinquency information. Further information collected on these three variables is over a span of 2 years. Any amounts in the variables are mostly in the form of ratios between amounts in the past 24 months to amounts in the current date.

Project Tasks

Data Report

  1. Understanding how data was collected in terms of time, frequency, and methodology
  • Visual inspection of data (rows, columns, descriptive details)
  • Understanding of attributes (variable info, renaming if required)

Exploratory data analysis

Univariate and Bivariate Analysis of All Variables

Checking and removing Anomalies from the Data:

  • Removal of unwanted variables (if applicable)
  • Missing Value treatment (if applicable)
  • Outlier treatment (if required)
  • Variable transformation (if applicable)
  • Addition of new variables (if required)

Business insights from EDA

  • Checking Data Imbalance and taking appropriate measures to Balance the data using SMOTE to balance the Data
  • Business Insight using Clustering Data
  • Splitting the dataset into Train and Test Datasets

Applying various Supervised Machine Learning Models to identify Top Variables.

  • using Parametric and Non-parametric Models and checking model performances using various Performance metrics.
  • Applying Parameter Tuning where Necessary
  • Comparing Model performances using Various Metrics and Checking Fit of the model.
  • Checking Top 10 Feature Importances which are critical in identifying defaulting customers.
  • Insights and Recommendation on the basis of the model.