GitHub - SartajBhuvaji/SDGnE: A PyPI library for Synthetic Tabular Data Generation and Evaluation

SDGnE

Synthetic Data Generation and Evaluation

Seattle University: Data Science Research

Wan Bae
Sartaj Bhuvaji
Siddheshwari Bankar

    SDGNE - Synthetic Data Generation and Evaluation

About

SDGnE (Synthetic Data Generation and Evaluation) is a Python package designed to generate synthetic data and evaluate its quality using neural network models.
This tool is intended for developers and researchers who require synthetic datasets for testing and development.
Current dittto version uses Autoencoders and SMOTE to generate synthetic data.

Getting Started

pip install sdgne

Notebooks

To get started, we have created notebook for the Autoencoder and SMOTE algorithm.

Auto Encoder

Autoencoders are a class of neural networks designed for unsupervised learning and representing features in a smaller space. They consist of an encoder and a decoder, intending to learn the input data's compressed representation (encoding). We leverage this architecture to generate synthetic data.

SMOTE

SMOTE, abbreviated as Synthetic Minority Oversampling Technique, is used to generate synthetic data from the original dataset. Over the years, several variants of SMOTE have been developed, each tailored to specific scenarios and requirements. These variants employ distinct methodologies and innovations to enhance the generation of synthetic data, thereby improving model performance by ensuring a more balanced distribution of classes. We provide a few SMOTE variants for synthetic data generation.

Comparison

In this notebook, we will compare the Single Encoder Autoencoder and the SMOTE Algorithm for synthetic data generation. We will generate synthetic data using both the algorithms and perform statistical evaluation.

Features

Data Generation: Create synthetic datasets that mimic the statistical properties of real-world data.
Neural Autoencoders: Utilize various autoencoder architectures to learn data representations.
Evaluation Metrics: Assess the quality of synthetic data using built-in evaluation metrics.
Extensibility: Easily extend the package with custom data generators and evaluators.

Links

Documentation: https://seattle-university.gitbook.io/sdgne/
PyPI: https://pypi.org/project/sdgne/

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
img		img
notebooks		notebooks
sdgne		sdgne
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
PYPI_README.md		PYPI_README.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDGnE

Synthetic Data Generation and Evaluation

About

Getting Started

Notebooks

Auto Encoder

SMOTE

Comparison

Features

Links

About

Releases 1

Languages

License

SartajBhuvaji/SDGnE

Folders and files

Latest commit

History

Repository files navigation

SDGnE

Synthetic Data Generation and Evaluation

About

Getting Started

Notebooks

Auto Encoder

SMOTE

Comparison

Features

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages