- NumPy
- pandas
- Matplotlib
- Seaborn
This dataset contains information on 113,937 loans with 81 variables from Prosper Loan, a peer-to-peer personal loan lending company. The dataset was explored to help identify pivotal variables in loan completion.
You can download the dataset from this URL, and the dataset description file can be accessed through this link.
The dataset originally had 113,937 observations with 81 variables but was later subsetted to 55,089 observations and 17 variables.
This reduction in the dataset was done to focus on exploring variables that might help predict the outcome of a loan (completed, charged off, defaulted, canceled) to determine which loan applications should be approved.
Out of the 81 variables, only a subset of 17 variables seemed to be pivotal to the analysis objective and were selected for further exploration.
Based on the analysis, the following findings were observed:
- Borrowers with a non-available value for their listing category and employment status, and a not displayed value for their income range, are prone to default on loans.
- Homeowners and non-homeowners have a similar distribution in loan completion and defaults.
- The number of recommendations a borrower receives is positively correlated with loan completion.
- Borrowers with good debt-to-income (DTI) ratio tend to complete their loans.
- Loans with a 1-year duration have the highest completion rate with fewer defaults.
- Borrowers with a listing category of "Auto" and auto-related values such as motorcycles, boats, and RVs also have a good completion rate.
For the presentation, the following key insights will be highlighted:
- Variables suspected to influence loan outcome were selected and subsetted for analysis.
- The variable of interest, loan outcome, was introduced along with other variables that individually influence its values, such as listing category, borrower homeownership status, number of recommendations, and debt-to-income ratio.
- These variables were explored and plotted against each other using clustered bar charts, facetted histogram plots, point plots, and scatter plots. Patterns were observed, such as borrowers with good DTI and a low number of recommendations being more likely to complete their loans. Additionally, a selected group of homeowners taking loans above $25,000 without defaulting was identified.
By presenting these key insights, the audience will gain a clear understanding of the factors that contribute to loan completion and the potential predictors of loan outcomes.