Skip to content

TonicAI/DSM_v_SMOTE_blog_post

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

How to Solve the Problem of Imbalanced Datasets: Meet Tonic Data Science Mode

Tonic's latest offering is a data synthesizer tailored specifically for data scientist's needs.

Meet Tonic Data Science Mode

Using powerful AI generative models, Tonic Data Science Mode (DSM) takes a dataset from an application database - such as PostgreSQL or Oracle - a data warehouse - such as BigQuery or Snowflake - or simply from a CSV file, and creates rows of synthetic data that align with the trends in your data to allow you to build models with greater predictive power.

In this [blog post](link to blog) we use a dataset from Kaggle to explore how data from DSM helps improve classification model performance when working with imbalanced datasets. We compare the performance of Logistic Regression, XGBoost, and CatBoost models trained on datasets augmented using DSM, SMOTE and SMOTE-NC.

Through addressing the biases associated with training models on imbalanced data by rebalancing the imbalanced class, we find that DSM-augmented data outperforms the other augmentation methods for the CatBoost and XGBoost models.


How to follow along

Head to Tonic to create your free account. Once you're logged in perform the following steps to get faking!

  1. Create a workspace


2. Configure your source data (in this example we upload a CSV)


3. Create a model with a SQL query - making sure to specify appropriate datatypes and column names


4. Set your model parameters and run your model


5. Check your synthesis report and copy your python snippet into your jupyter notebook


Recreate this experiment for yourself - head to Tonic to create your account and start your free trial!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published