How to Solve the Problem of Imbalanced Datasets: Meet Tonic Data Science Mode

Tonic's latest offering is a data synthesizer tailored specifically for data scientist's needs.

Meet Tonic Data Science Mode

Using powerful AI generative models, Tonic Data Science Mode (DSM) takes a dataset from an application database - such as PostgreSQL or Oracle - a data warehouse - such as BigQuery or Snowflake - or simply from a CSV file, and creates rows of synthetic data that align with the trends in your data to allow you to build models with greater predictive power.

In this [blog post](link to blog) we use a dataset from Kaggle to explore how data from DSM helps improve classification model performance when working with imbalanced datasets. We compare the performance of Logistic Regression, XGBoost, and CatBoost models trained on datasets augmented using DSM, SMOTE and SMOTE-NC.

Through addressing the biases associated with training models on imbalanced data by rebalancing the imbalanced class, we find that DSM-augmented data outperforms the other augmentation methods for the CatBoost and XGBoost models.

How to follow along

Head to Tonic to create your free account. Once you're logged in perform the following steps to get faking!

Create a workspace

2. Configure your source data (in this example we upload a CSV)

3. Create a model with a SQL query - making sure to specify appropriate datatypes and column names

4. Set your model parameters and run your model

5. Check your synthesis report and copy your python snippet into your jupyter notebook

Recreate this experiment for yourself - head to Tonic to create your account and start your free trial!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DSM_v_SMOTE_blog_Notebook.ipynb		DSM_v_SMOTE_blog_Notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Solve the Problem of Imbalanced Datasets: Meet Tonic Data Science Mode

Meet Tonic Data Science Mode

How to follow along

About

Releases

Packages

Contributors 2

Languages

TonicAI/DSM_v_SMOTE_blog_post

Folders and files

Latest commit

History

Repository files navigation

How to Solve the Problem of Imbalanced Datasets: Meet Tonic Data Science Mode

Meet Tonic Data Science Mode

How to follow along

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages