Skip to content

Sample datasets from the BasCat repository wrapped into modules for convenience.

Notifications You must be signed in to change notification settings

nfdi4cat/data4cat

Repository files navigation

jupyter
jupytext kernelspec
formats text_representation
ipynb,md
extension format_name format_version jupytext_version
.md
markdown
1.3
1.16.4
display_name language name
Python 3 (ipykernel)
python
python3

Usage of the data4cat module

For convenience and e.g. the usage in lectures datasets from the BasCat repository (Dataverse) where wrapped into modules. The convenience functions should enable a smooth start on how to work with published remote data. Datasets included up to now are:

  • The BasCat DinoRun dataset on synthesis to ethanol

Installation of the data4cat module

For the installation you can clone or download the repository:

git clone https://github.com/nfdi4cat/data4cat.git

cd into the directory an install data4cat:

pip install .

Or you can directly install the module from the remote source:

python -m pip install git+https://github.com/nfdi4cat/data4cat.git@main

To uninstall simply do a:

pip uninstall data4cat

With the package installed you first need to import the module:

from data4cat import dino_run

And create an instance:

dinodat = dino_run.dino_offline()

The two steps above have to be done always.

The dino_run dataset from the NFDI4Cat Dataverse instance

One dataset is the BasCat performance dataset on the syngas to ethanol reaction.

Download the dino_run dataset

In case that there is no offline version of the dataset available (e.g. after a fresh install) a copy of the dataset can be downloaded like this:

dinodat.one_shot_dumb()

Create a dataset from the offline data

You can get the data either in the form of a pandas dataframe or as a Bunch object in the style of scikit-learn datasets. You can get the original data in the following way:

original = dinodat.original_data()
original.head()

Create a subset of the offline data for the startup phase

There is a sub dataset for the startup phase with a TOS < 85 available. Again both as pandas dataframe and Bunch object.

startup = dinodat.startup_data()
startup.head()

Create a subset of the offline data for the selectivity

Especially for unsupervised learning tasks there is a subset of the data prepared that contains only the selectivity data. When asking for this subset also reactors are provided, here they are put in a clusters object.

selectivity, clusters = dinodat.selectivity()
selectivity.head()
clusters.head()

Create a subset of the offline data for the selectivity without reactor 5

In case needed when you provide the r5 argument to False it will exclude the empty reactor 5.

selectivity_wo5, clusters = dinodat.selectivity(r5=False)
selectivity_wo5.head()
clusters.head()

Create a subset of the offline data for the reaction conditions

For supervised tasks a subset of the data is provided that contains the reaction conditions as features and the selectivity to ethanol as target.

react_cond, selectivity_EtOH = dinodat.react_cond()
react_cond.head()
selectivity_EtOH.head()

Create a subset of the offline data for the reaction conditions without reactor 5

Like before the empty reactor 5 can be excluded with the r5 argument set to False.

react_cond, selectivity_EtOH = dinodat.react_cond(r5=False)
react_cond.tail()
selectivity_EtOH.tail()

About

Sample datasets from the BasCat repository wrapped into modules for convenience.

Resources

Stars

Watchers

Forks

Releases

No releases published