Lending Club is a finance company which specialises in lending various types of loans to urban customers. When the company receives a loan application, the company has to make a decision for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:
- If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company
- If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company
When a person applies for a loan, there are two types of decisions that could be taken by the company:
Loan accepted: If the company approves the loan, there are 3 possible scenarios described below:
-
Fully paid: Applicant has fully paid the loan (the principal and the interest rate)
-
Current: Applicant is in the process of paying the instalments, i.e. the tenure of the loan is not yet completed. These candidates are not labelled as 'defaulted'.
-
Charged-off: Applicant has not paid the instalments in due time for a long period of time, i.e. he/she has defaulted on the loan
Loan rejected: The company had rejected the loan (because the candidate does not meet their requirements etc.). Since the loan was rejected, there is no transactional history of those applicants with the company and so this data is not available with the company (and thus in this dataset)
This company is the largest online loan marketplace, facilitating personal loans, business loans, and financing of medical procedures. Borrowers can easily access lower interest rate loans through a fast online interface.
Like most other lending companies, lending loans to ‘risky’ applicants is the largest source of financial loss (called credit loss). Credit loss is the amount of money lost by the lender when the borrower refuses to pay or runs away with the money owed. In other words, borrowers who default cause the largest amount of loss to the lenders. In this case, the customers labelled as 'charged-off' are the 'defaulters'.
If one is able to identify these risky loan applicants, then such loans can be reduced thereby cutting down the amount of credit loss. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilise this knowledge for its portfolio and risk assessment.
To develop your understanding of the domain, you are advised to independently research a little about risk analytics (understanding the types of variables and their significance should be enough).
- Steps for EDA :
- Data Understanding
- Data Cleaning
- Univariate Analysis
- Bivariate Analysis
- Multivariate Analysis
- Conclusion
Continuous Variable:
- LOAN_AMOUNT : Loan amount greater than 15000 dollors have higher default rate
- FUNDED_AMOUNT : Funded amount greater than 15000 dollors have higher default rate
- FUNDED_AMOUNT_INVESTED : Funded amount invested greater than 15000 dollors have higher default rate
- INTEREST_RATE : As Interest rate increases the default rate increases steeply
(5, 10]-> 6.739748%
(10, 15]-> 14.820089%
(15, 20]-> 24.826918%
(20, 25]-> 38.441558% - ANNUAL_INCOME : As the annual income increase the default rate decreases
- DTI : As dti increase the default rate increases
- MONTHS_SINCE_LAST_DELINQ : Crime committed between 90 to 110 days have higher default percent
- TERM : 60 months term have a higher default rate than 36 months term
- GRADE : As the Grade decreases (A B C D E F G) default rate increases
- SUB_GRADE : As the Sub Grade decreases (A1 A2 B1 B2.....) default rate increases
- VERIFICATION STATUS : Percent of loan defaulted is higher for verifed borrowers
- PURPOSE : Small business borrowers have high default rate
- PUBLIC_BAKRUPTIES_RECORD : One or more pubilc bankruptices have higher default rate
- STATE : Percent of loan defaulted is very high for state NE and high for NV and SD
- EMPLOYEE TITLE : The following have the highest default rate among the top 20 title by frequency:
walmart -> 25.242718%
united parcel service -> 22.448980%
united states postal service -> 21.118012%
other -> 20.243085%
at&t -> 18.47826% - ZIP CODE : The following have the highest default rate among the top 20 zip code by frequency:
917xx -> 20.882353%
331xx -> 20.771513%
330xx -> 20.481928%
913xx -> 18.867925%
926xx -> 18.356164%
- INTEREST RATE AND PUBLIC BANRUPTIES RECORD : Borrowers with lower interest rate and 2 public bankurpties have defaulted
- INSTALLMENT AND PUBLIC BANRUPTIES RECORD : Borrowers with higher installment and 2 public bankurpties have defaulted
- ANNUAL INCONE AND PUBLIC BANRUPTIES RECORD : Borrowers with higher Income and 2 public bankurpties have defaulted
- FUNDED_AMOUNT_INVESTED
- INTEREST_RATE
- ANNUAL_INCOME
- DTI
- TERM
- GRADE
- SUB_GRADE
- PURPOSE
- PUBLIC_BAKRUPTIES_RECORD
- MONTHS_SINCE_LAST_DELINQ
- PURPOSE
- STATE
- EMPLOYEE TITLE
- ZIP CODE
- pandas - 1.3.4
- numpy - 1.20.3
- matplotlib - 3.4.3
- seaborn - 0.11.2
- plotly - 5.8.0
Give credit here.
- This project was group case study for an online advance course.
- https://www.geeksforgeeks.org/
- https://seaborn.pydata.org/
- https://plotly.com/
- https://pandas.pydata.org/
- https://learn.upgrad.com/
Created by [@darshil2848] - feel free to contact me!