Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 1.57 KB

Readme.md

File metadata and controls

24 lines (19 loc) · 1.57 KB

Credit Card Fraud Detection

Problem Statement

The goal is to detect fraudulent and non fraudulent transactions.

Dataset Description:

The datasets contains transactions made by credit cards in September 2013 by european cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions.
The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, bank cannot provide the original features and more background information about the data.
Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset.
The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning.
Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Operations carried out:

1] Data Analysis (To explore dataset)

2] Data Visualization

3] Data Cleaning

4] Data Preprocessing (Scaling, Sampling)

5] Under Sampling (To balance given unbalanced dataset)

6] Data Visualization

7] Machine Learning Models: Logistic regression, Logistic regression with various tuning parameters, Descision Tree Classifier