This repository contains a Jupyter Notebook Bank Personal Loan Modelling.ipynb which demonstrates data exploration and the creation of various machine learning models using a sample dataset (Bank_Personal_Loan_Modelling.csv) that contains information about bank customers who either accepted or rejected the personal loan offer.
ID: Customer ID
Age: Age of the customer
Experience: Years of experience of the customer
Income: Annual income of the customer
ZIP Code: ZIP code of the customer's location
Family: Number of members in the customer's family
CCAvg: Average spending on credit cards per month
Education: Education level of the customer
Mortgage: Value of the house the customer lives in
Personal Loan: Whether the customer wants a personal loan or not (1=Yes, 0=No)
Securities Account: Whether the customer has a securities account with the bank or not (1=Yes, 0=No)
CD Account: Whether the customer has a certificate of deposit account or not (1=Yes, 0=No)
Online: Whether the customer uses online banking services or not (1=Yes, 0=No)
CreditCard: Whether the customer has a credit card or not (1=Yes, 0=No)
The following Python libraries are required to run this code:
pandas
numpy
matplotlib
seaborn
math
pandas_profiling
scipy
plotly
bubbly
sklearn
mlxtend
After cloning the repository, open the Jupyter Notebook Bank Personal Loan Modelling.ipynb in Jupyter and run the cells to execute the code. The notebook contains markdown cells with descriptions of the steps and the code cells are commented to explain what each line does.
In the notebook, the data is first explored using descriptive statistics, data visualization, and data cleaning techniques. This includes the creation of a function check() to check the dataframe for columns, data types, unique values, and null values, as well as the removal of the 'ID' column and cleaning the 'Experience' and 'ZIP Code' columns.
Several machine learning models are then created using scikit-learn, including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Decision Tree, Random Forest, and Support Vector Machine. Cross-validation is used to evaluate the models and the results are displayed in a table.
Finally, a visual representation of the relationship between age, experience, income, mortgage, and acceptance of personal loan offers is created using the bubbleplot() function from the bubbly library.