Synthetic Financial Data Generation for Fraud Detection

I used the credit card fraud dataset from Kaggle (published by ULB).

About Dataset: It includes information on credit card transactions carried out by cardholders in Europe during September 2013. The data pertains to a two-day period and shows that out of 284,807 transactions, 492 were fraudulent. The dataset is imbalanced as the fraud cases represent only 0.172% of all transactions.

One way to handle this data imbalance is to synthesize the data for minority class. Synthetic data generators utilize actual data to identify pertinent characteristics, relationships, and trends so as to produce sufficient amounts of synthetic data that matches the statistical characteristics of the initially collected dataset. Here, I've used Wasserstein GAN - Gradient Penalty (WGAN-GP). It is a type of generative adversarial network (GAN) that leverages the Wasserstein loss formulation in combination with a penalty on gradient norm to attain Lipschitz continuity.

The code is adapted from: AWS Machine Learning

In progress: Evaluation of the quality of synthetic dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data		Data
README.md		README.md
creating-synthetic-financial-data.ipynb		creating-synthetic-financial-data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Financial Data Generation for Fraud Detection

About

Releases

Packages

Languages

sonu-gupta/Synthetic-financial-data

Folders and files

Latest commit

History

Repository files navigation

Synthetic Financial Data Generation for Fraud Detection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages