Merge pull request #699 from adi271001/Bank-Credit-Analysis

Bank credit analysis
abhisheks008 · Jul 12, 2024 · e5e7aec · e5e7aec
2 parents b6c2408 + 73f107d
commit e5e7aec
Show file tree

Hide file tree

Showing 21 changed files with 11,298 additions and 0 deletions.
diff --git a/Bank Credit Analysis/Dataset/bank.csv b/Bank Credit Analysis/Dataset/bank.csv
diff --git a/Bank Credit Analysis/Images/__results___10_0.png b/Bank Credit Analysis/Images/__results___10_0.png
diff --git a/Bank Credit Analysis/Images/__results___11_1.png b/Bank Credit Analysis/Images/__results___11_1.png
diff --git a/Bank Credit Analysis/Images/__results___12_1.png b/Bank Credit Analysis/Images/__results___12_1.png
diff --git a/Bank Credit Analysis/Images/__results___13_0.png b/Bank Credit Analysis/Images/__results___13_0.png
diff --git a/Bank Credit Analysis/Images/__results___13_1.png b/Bank Credit Analysis/Images/__results___13_1.png
diff --git a/Bank Credit Analysis/Images/__results___14_0.png b/Bank Credit Analysis/Images/__results___14_0.png
diff --git a/Bank Credit Analysis/Images/__results___15_0.png b/Bank Credit Analysis/Images/__results___15_0.png
diff --git a/Bank Credit Analysis/Images/__results___18_0.png b/Bank Credit Analysis/Images/__results___18_0.png
diff --git a/Bank Credit Analysis/Images/__results___24_1.png b/Bank Credit Analysis/Images/__results___24_1.png
diff --git a/Bank Credit Analysis/Images/__results___25_1.png b/Bank Credit Analysis/Images/__results___25_1.png
diff --git a/Bank Credit Analysis/Images/__results___27_0.png b/Bank Credit Analysis/Images/__results___27_0.png
diff --git a/Bank Credit Analysis/Images/__results___28_0.png b/Bank Credit Analysis/Images/__results___28_0.png
diff --git a/Bank Credit Analysis/Images/__results___29_0.png b/Bank Credit Analysis/Images/__results___29_0.png
diff --git a/Bank Credit Analysis/Images/__results___30_0.png b/Bank Credit Analysis/Images/__results___30_0.png
diff --git a/Bank Credit Analysis/Images/__results___31_0.png b/Bank Credit Analysis/Images/__results___31_0.png
diff --git a/Bank Credit Analysis/Model/README.md b/Bank Credit Analysis/Model/README.md
@@ -0,0 +1,33 @@
+## 🚀 Models Implemented
+- **Random Forest**: Chosen for its robustness and ability to handle large datasets with higher accuracy.
+- **XGBoost**: Known for its performance and speed, making it suitable for complex datasets.
+- **Decision Tree**: Simple to interpret and visualize, though prone to overfitting.
+- **AdaBoost**: Effective in boosting the performance of weak classifiers.
+- **CatBoost**: Handles categorical features well and provides high accuracy.
+- **Logistic Regression**: Baseline model for classification tasks.
+- **Extra Trees**: Similar to Random Forest but with some differences in the splitting of nodes.
+- **Gaussian Naive Bayes**: Simple and effective, especially for smaller datasets.
+- **K-Nearest Neighbors**: Simple and easy to implement, but can be computationally expensive.
+- **Support Vector Machine**: Effective in high-dimensional spaces and suitable for classification tasks.
+
+## 📈 Performance of the Models based on the Accuracy Scores
+| Model                   | Train Accuracy | CV Mean Accuracy | Test Accuracy |
+|-------------------------|----------------|------------------|---------------|
+| K Nearest Neighbors     | 81.81%         | 75.38%           | 75.19%        |
+| Support Vector Machine  | 83.37%         | 82.92%           | 81.59%        |
+| Random Forest           | 99.40%         | 85.79%           | 83.70%        |
+| XGBoost                 | 100.00%        | 85.47%           | 84.42%        |
+| Decision Tree           | 87.51%         | 81.92%           | 80.25%        |
+| AdaBoost                | 84.04%         | 82.91%           | 82.58%        |
+| CatBoost                | 90.36%         | 86.58%           | 85.89%        |
+| Logistic Regression     | 82.55%         | 82.10%           | 81.68%        |
+| Extra Trees             | 98.76%         | 83.38%           | 82.22%        |
+| Gaussian Naive Bayes    | 73.92%         | 73.58%           | 74.56%        |
+
+## ✒️ Your Signature
+Aditya D
+
+GitHub: [https://www.github.com/adi271001](https://www.github.com/adi271001)  
+LinkedIn: [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)  
+Topmate: [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)  
+Twitter: [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)
diff --git a/Bank Credit Analysis/Model/bank-credit-analysis.ipynb b/Bank Credit Analysis/Model/bank-credit-analysis.ipynb
diff --git a/Bank Credit Analysis/README.md b/Bank Credit Analysis/README.md
@@ -0,0 +1,80 @@
+# Bank Credit Analysis
+
+## 🎯 Goal
+The main goal of this project is to develop machine learning models to accurately predict the likelihood of a customer subscribing to a term deposit based on their banking information and demographic details.
+
+## 🧵 Dataset
+The dataset for this project is sourced fromm [Kaggle's Bank Marketing Dataset](https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset/data).
+
+## 🧾 Description
+This project involves analyzing various features of bank customers and building machine learning models to predict whether a customer will subscribe to a term deposit. The project includes data preprocessing, exploratory data analysis (EDA), model development, and evaluation to find the most accurate predictive model.
+
+## 🧮 What I had done!
+1. **Data Collection and Preprocessing**:
+   - Collected the dataset from Kaggle.
+   - Preprocessed the data to handle missing values, encoded categorical variables, and split the dataset into training and testing sets.
+
+2. **Exploratory Data Analysis (EDA)**:
+   - Performed EDA to understand the distribution of data and identify any patterns or anomalies.
+   - ![pair plot 1](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___11_1.png?raw=true)
+   - ![distribution graph](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___13_0.png?raw=true)
+   - ![boxplot](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___15_0.png?raw=true)
+   - ![waveplot](https://github.com/adi271001/ML-Crate/assets/67856422/f6e50edc-6cc9-475b-b3bb-82869b1cba8f)
+   - ![bar plot](https://github.com/adi271001/ML-Crate/assets/67856422/55cebd86-4eec-4829-85d1-091f0ebfbc3d)
+
+3. **Model Development**:
+   - Implemented several machine learning models including Random Forest, XGBoost, Decision Tree, AdaBoost, CatBoost, Logistic Regression, Extra Trees, Gaussian Naive Bayes, K-Nearest Neighbors, and Support Vector Machine.
+   - Used grid search for hyperparameter tuning and nested cross-validation to evaluate model performance.
+
+4. **Model Evaluation**:
+   - Evaluated the models based on accuracy scores on the training and testing datasets.
+
+5. **Conclusion**:
+   - Identified the best-performing model based on accuracy scores.
+
+## 🚀 Models Implemented
+- **Random Forest**: Chosen for its robustness and ability to handle large datasets with higher accuracy.
+- **XGBoost**: Known for its performance and speed, making it suitable for complex datasets.
+- **Decision Tree**: Simple to interpret and visualize, though prone to overfitting.
+- **AdaBoost**: Effective in boosting the performance of weak classifiers.
+- **CatBoost**: Handles categorical features well and provides high accuracy.
+- **Logistic Regression**: Baseline model for classification tasks.
+- **Extra Trees**: Similar to Random Forest but with some differences in the splitting of nodes.
+- **Gaussian Naive Bayes**: Simple and effective, especially for smaller datasets.
+- **K-Nearest Neighbors**: Simple and easy to implement, but can be computationally expensive.
+- **Support Vector Machine**: Effective in high-dimensional spaces and suitable for classification tasks.
+
+## 📚 Libraries Needed
+- pandas
+- numpy
+- scikit-learn
+- xgboost
+- catboost
+
+## 📊 Exploratory Data Analysis Results
+*Include images of visualizations here*
+
+## 📈 Performance of the Models based on the Accuracy Scores
+| Model                   | Train Accuracy | CV Mean Accuracy | Test Accuracy |
+|-------------------------|----------------|------------------|---------------|
+| K Nearest Neighbors     | 81.81%         | 75.38%           | 75.19%        |
+| Support Vector Machine  | 83.37%         | 82.92%           | 81.59%        |
+| Random Forest           | 99.40%         | 85.79%           | 83.70%        |
+| XGBoost                 | 100.00%        | 85.47%           | 84.42%        |
+| Decision Tree           | 87.51%         | 81.92%           | 80.25%        |
+| AdaBoost                | 84.04%         | 82.91%           | 82.58%        |
+| CatBoost                | 90.36%         | 86.58%           | 85.89%        |
+| Logistic Regression     | 82.55%         | 82.10%           | 81.68%        |
+| Extra Trees             | 98.76%         | 83.38%           | 82.22%        |
+| Gaussian Naive Bayes    | 73.92%         | 73.58%           | 74.56%        |
+
+## 📢 Conclusion
+The best-performing model in this project is CatBoost with a CV Mean Accuracy of 86.58% and Test Accuracy of 85.89%. This model provides a good balance between training and generalization performance, making it the most suitable for predicting customer subscription to a term deposit.
+
+## ✒️ Your Signature
+Aditya D
+
+GitHub: [https://www.github.com/adi271001](https://www.github.com/adi271001)  
+LinkedIn: [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)  
+Topmate: [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)  
+Twitter: [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)
diff --git a/Bank Credit Analysis/Results/models_results.csv b/Bank Credit Analysis/Results/models_results.csv
@@ -0,0 +1,11 @@
+Model,Train Accuracy,CV Mean Accuracy,Test Accuracy
+K Nearest Neighbors,81.812073020495,75.38349628765279,75.19032691446485
+Support Vector Machine,83.3687982976817,82.92084403750302,81.59426780116435
+Random Forest,99.3952290290066,85.78782375212123,83.69905956112854
+XG Boost,100.0,85.47421745853995,84.4155844155844
+Decision Tree,87.51259939522903,81.92401529480773,80.25078369905955
+AdaBoost,84.04076604322992,82.90964582921634,82.5794894760412
+CatBoost,90.35726285138314,86.58297809605365,85.8934169278997
+Logistic Regression,82.55123754059805,82.10325563596099,81.68383340797133
+Extra Trees,98.7568596707358,83.37999567128082,82.22122704881325
+Gaussian Naive Bayes,73.9164520103035,73.58034008676259,74.56336766681594
diff --git a/Bank Credit Analysis/requirements.txt b/Bank Credit Analysis/requirements.txt
@@ -0,0 +1,10 @@
+numpy==1.24.3
+pandas==2.0.3
+matplotlib==3.7.2
+seaborn==0.12.2
+scikit-learn==1.2.2
+xgboost==1.7.6
+catboost==1.1
+pdpbox==0.3.0
+shap==0.42.1
+yellowbrick==1.5