This repository contains code and data for fraud analysis using various machine learning algorithms. The dataset used is available on Kaggle. The analysis involves the application of the following machine learning models:
- Logistic Regression
- Naive Bayes
- Decision Tree
- Random Forest
- XGBOOST
- Support Vector Machine
The dataset consists of transactional data with various features that have been used to detect fraudulent activities. The dataset can be found here.
Make sure you have the following dependencies installed:
- Python 3
- Jupyter Notebook
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Logistic Regression: Utilized for binary classification of fraudulent and non-fraudulent transactions.
- Naive Bayes: Employed for probabilistic classification, assuming independence between features.
- Decision Tree: Constructed to make decisions based on the features in the dataset.
- Random Forest: Ensemble learning method based on constructing a multitude of decision trees.
- XGBOOST: Gradient boosting framework that focuses on computational speed and model performance.
- Support Vector Machine: Used for both classification and regression analysis.
- Correlation Matrix: Visual representation of the correlation between different features in the dataset.
- Accuracy Graph: Graphical representation of the accuracy achieved by various machine learning models employed in the analysis.
The results of the analysis, including model performance metrics and insights derived from the visualization techniques, can be found in the accompanying Jupyter Notebooks.
You can use the provided Jupyter Notebooks to replicate the analysis. The notebooks include step-by-step instructions along with explanations for each phase of the analysis.
You can also see my work at Kaggle. You can also upvote and comment my work there.