Analyzing Important Factors in the Loan Approval Process

Overview

This project looks into the factors that hold the most weight when an applicant’s loan is approved or denied within the state of California. The motivation for this is to provide more transparency regarding the loan approval process, especially in a time where more and more Californians are looking towards taking out loans to afford buying a home. The data set used in this project is taken from the Consumer Financial Protection Bureau. Factors such as the applicant’s race, income, locale, and sex are analyzed. Both decision tree and k-means clustering machine learning models are used to predict whether given loan applications are approved or denied.

Dataset [1]

The data set used in this project comes from the Consumer Financial Protection Bureau. This data is collected under the Home Mortgage Disclosure Act (HMDA) and contains information regarding individual loan applications [1]. In particular, the data set for this project only contains information on mortgages for single family homes filed in 2017 within the state of California.

Data Analysis

We plotted the approval versus denial ratio for each given race in the figure below. For each race, the rates of approved loans versus denied loans are relatively equivalent in value. This leads to the conclusion that there is no racial bias when determining loan approval or denial.

Approval vs. denial rates for each race in the data set

Approval rate by county

Approaches

Decision Tree Model [2]

Decision tree machine learning models are non-parametric supervised models used for regression and classification problems. Our goal is to create a model that can predict whether a loan is approved or denied by learning simple decision rules inferred from the training data features. This model is able to handle both numeric variables (the income of the applicant and the loan amount) and categorical variables (the sex, the race, and the county of the applicant). Furthermore, it uses a white box model, hence we can visualize it and it can be explained easier than a black box model such as the ones used in artificial neural networks.

K-Means Model [3]

K-means machine learning models are unsupervised models, meaning that the training data that goes in to the model is not labeled. Essentially, the model does not know in our case which data values constitute approved loans, and which are denied loan applications. The model must figure out on its own how to distinguish approved loans from denied loans. Upon initializing the k-means algorithm, we determine a set number of centroids to cluster our data around, In our case, we use two centroids for approved and denied loan applications.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
graphs		graphs
Neil_Chen_Callie_Hartzog_Analyzing_Important_Factors_in_the_Loan_Approval.pdf		Neil_Chen_Callie_Hartzog_Analyzing_Important_Factors_in_the_Loan_Approval.pdf
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Important Factors in the Loan Approval Process

Overview

Dataset [1]

Data Analysis

Approaches

Results

Authors

Reference

About

Releases

Packages

Languages

neilchen1998/loan-approval-analysis

Folders and files

Latest commit

History

Repository files navigation

Analyzing Important Factors in the Loan Approval Process

Overview

Dataset [1]

Data Analysis

Approaches

Results

Authors

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages