Maching Learning Project : A classification problem using K Nearest Neighbor(KNN), Decision Tree, Support Vector Machine, and Logistic Regression algorithms
- Overviw
- About Dataset
- Alogotithms and Technologies
- Evaluation
- Few Take Aways
- References
- Author Infos
In this Mini Project, we'll try to practice all the classification algorithms that we learned in the Machine Learning With Python course. So, we'll load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.
This dataset is about past loans. The Loan_train.csv data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:
Field | Description |
---|---|
Loan_status | Whether a loan is paid off on in collection |
Principal | Basic principal loan amount at the |
Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |
Effective_date | When the loan got originated and took effects |
Due_date | Since it’s one-time payoff schedule, each loan has one single due date |
Age | Age of applicant |
Education | Education of applicant |
Gender | The gender of applicant |
You can download the dataset Loan_train.csv by clicking here
We used:
- Python (Pandas, seaborn, matplotlib,numpy) and the amazing ML library Scikit-learn
- IBM Cloud (Waston Studio, Jupyter Notebook)
For building our models we will be using the folowing algorithms:
- K Nearest Neighbor(KNN)
- Decision Tree
- Support Vector Machine
- Logistic Regression
The report below shows the accuracies of all built models using different evaluation metrics:
Algorithm | Jaccard | F1-score | LogLoss |
---|---|---|---|
KNN | 0.67 | 0.63 | NA |
Decision Tree | 0.76 | 0.77 | NA |
SVM | 0.80 | 0.76 | NA |
Logistic Regression | 0.74 | 0.70 | 0.67 |
- LinkedIn - Nour Eddine ZEKAOUI
- You can view my verified achievement from Coursera and IBM: Coursera Certification - IBM Badge