Skip to content

This repository contains python code used to create synthetic data samples of minority class for a financial dataset. It also contains a sample of generated synthetic data.

Notifications You must be signed in to change notification settings

sonu-gupta/Synthetic-financial-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Synthetic Financial Data Generation for Fraud Detection

I used the credit card fraud dataset from Kaggle (published by ULB).

About Dataset: It includes information on credit card transactions carried out by cardholders in Europe during September 2013. The data pertains to a two-day period and shows that out of 284,807 transactions, 492 were fraudulent. The dataset is imbalanced as the fraud cases represent only 0.172% of all transactions.

One way to handle this data imbalance is to synthesize the data for minority class. Synthetic data generators utilize actual data to identify pertinent characteristics, relationships, and trends so as to produce sufficient amounts of synthetic data that matches the statistical characteristics of the initially collected dataset. Here, I've used Wasserstein GAN - Gradient Penalty (WGAN-GP). It is a type of generative adversarial network (GAN) that leverages the Wasserstein loss formulation in combination with a penalty on gradient norm to attain Lipschitz continuity.

The code is adapted from: AWS Machine Learning

In progress: Evaluation of the quality of synthetic dataset.

About

This repository contains python code used to create synthetic data samples of minority class for a financial dataset. It also contains a sample of generated synthetic data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published