GitHub - databysara/machine-learning-classification

My Linear Classification Submission

Challenge

To look at the Lending Club data set and predict whether the loan will be "Fully Paid" or "Charged Off".

What I did?

I developed a template for Linear Classification Projects including hyperlinked table of contents for easy flow.

Import and Inspect Data:

This helped me to understand the data. I researched the meaning of some of the terms like loan grade.
I prerformed an initial linear correlation with uncleaned data and then reperform the linear correlation a cople times after data cleaning.
- To manage analysis project I separated the analysis of Numerical values (Interest Rate, Annuual Income and Loan Amount) and Highly Correlated values (Interest Rate, Loan Grade and Loan Term.
At this stage I also made decisions on:
- One hot encoding:
  - How to group the data for e.g. Purpose of the loan - I grouped credit card payments and debt consolidation vs all others.
  - *IMPROVEMENT: In hindsight I should have split purpose into more groups.
  - loan grade was converted from A to G to 6 to 0.
  - histplot helped show that to normalize annual income distrubution I had to limit income to 300,000
    - *IMPROVEMENT: Use log scale

Data Cleaning

Then I actually carried out the data cleaning based on the decisions made in inspection.
Once the data was cleaned I took only those parameters that had an absolute correlation of higher than 0.1 into the regression model.

Visualization

Used confusion Matrix to visalize results
All or Most results were positive i.e. high no. of false positives and hence model is unable to predict if someone could not pay.

Model Building

Was very challenging. Was unable to find a model with AUC higher than 0.5

Recommendations/Next Steps

Try decision tree and random forest models.
Reinspect and clean data: for e.g. improve one hot encoding for purpose and other parameters.
Research whether any of other parameters are in fact dependent variables e.g. loan grade. And consider asking a different question. (If possible client should be involved in this exercise.)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
LendingClub-HighCorrelation.ipynb		LendingClub-HighCorrelation.ipynb
LendingClub-NumericalValues.ipynb		LendingClub-NumericalValues.ipynb
LinearClassificationNotes.ipynb		LinearClassificationNotes.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Linear Classification Submission

Challenge

What I did?

Import and Inspect Data:

Data Cleaning

Visualization

Model Building

Recommendations/Next Steps

About

Releases

Packages

Languages

databysara/machine-learning-classification

Folders and files

Latest commit

History

Repository files navigation

My Linear Classification Submission

Challenge

What I did?

Import and Inspect Data:

Data Cleaning

Visualization

Model Building

Recommendations/Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages