Bank-Personal-Loan-Modelling

Overview

This project is about a bank which has a growing customer base where the majority of customers are depositors with varying size of deposits. Apart from the majority of customers that are depositors, several customers are also borrowers (asset customers). The bank planned to expand the asset customers to earn more through the interest on personal loans by converting the depositors (potential customers) to personal loan customers while retaining them as depositors. A campaign that the bank ran last year for potential customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns for better target marketing that will increase the success ratio while at the same time reduce the cost of the campaign by using a model that will help them identify the potential customers who have a higher probability of purchasing the loan.

Tools Used

Python version: 3.6.9
Packages: Numpy, Matplotlib, pandas, jcopml, scikit-learn

Data Cleaning

After loaded the data, I needed to clean it up so that it was usable for the machine learning model. There are several values on the experiences feature that below 0, so I replace these values with 0 (these values also can be imputed with mean or median).

Exploratory Data Analysis

Based on the exploratory data analysis results, it is shown that the target label data (Personal_Loan) is imbalanced, so it needs weighting and scored by f1 score metrics. I also plotted three features among several other features that influenced the customer's decision education level, income, and credit card spending cost average per month that are shown below.

Model Building

After done with dataset splitting (train-test ratio = 80:20) and preprocessing (encode the categorical features), I started build the machine learning model using three algorithms

Logistic Regression = used as the baseline model due to its simplicity and capability to handle an imbalanced dataset
Random Forest Classifier = used due to its characteristics (prone overfit algorithm, low bias, high variance) so it will fit for handling a dataset that having a large number of data points and relatively small amount of features
XGBoost Classfier = used due to its characteristics (prone overfit algorithm, low bias, high variance, tree-based) so it will fit for handling a dataset that having a large number of data points and relatively small amount of features

Model Performance

The random forest classifier model shown a very well performance compared with other models. The result scored by f1 metrics due to imbalanced data.

Logistic Regression f1 score = 0.78
Random Forest Classifier f1 score = 0.94
XGBoost Classfier f1 score = 0.92

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
notebooks		notebooks
results		results
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank-Personal-Loan-Modelling

Overview

Tools Used

Data Cleaning

Exploratory Data Analysis

Model Building

Model Performance

About

Releases

Packages

Languages

azizamir/bank-personal-modeling

Folders and files

Latest commit

History

Repository files navigation

Bank-Personal-Loan-Modelling

Overview

Tools Used

Data Cleaning

Exploratory Data Analysis

Model Building

Model Performance

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages