jupyter | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
For convenience and e.g. the usage in lectures datasets from the BasCat repository (Dataverse) where wrapped into modules. The convenience functions should enable a smooth start on how to work with published remote data. Datasets included up to now are:
- The BasCat DinoRun dataset on synthesis to ethanol
For the installation you can clone or download the repository:
git clone https://github.com/nfdi4cat/data4cat.git
cd into the directory an install data4cat:
pip install .
Or you can directly install the module from the remote source:
python -m pip install git+https://github.com/nfdi4cat/data4cat.git@main
To uninstall simply do a:
pip uninstall data4cat
With the package installed you first need to import the module:
from data4cat import dino_run
And create an instance:
dinodat = dino_run.dino_offline()
The two steps above have to be done always.
One dataset is the BasCat performance dataset on the syngas to ethanol reaction.
In case that there is no offline version of the dataset available (e.g. after a fresh install) a copy of the dataset can be downloaded like this:
dinodat.one_shot_dumb()
You can get the data either in the form of a pandas dataframe or as a Bunch object in the style of scikit-learn datasets. You can get the original data in the following way:
original = dinodat.original_data()
original.head()
There is a sub dataset for the startup phase with a TOS < 85 available. Again both as pandas dataframe and Bunch object.
startup = dinodat.startup_data()
startup.head()
Especially for unsupervised learning tasks there is a subset of the data prepared that contains only the selectivity data. When asking for this subset also reactors are provided, here they are put in a clusters object.
selectivity, clusters = dinodat.selectivity()
selectivity.head()
clusters.head()
In case needed when you provide the r5 argument to False it will exclude the empty reactor 5.
selectivity_wo5, clusters = dinodat.selectivity(r5=False)
selectivity_wo5.head()
clusters.head()
For supervised tasks a subset of the data is provided that contains the reaction conditions as features and the selectivity to ethanol as target.
react_cond, selectivity_EtOH = dinodat.react_cond()
react_cond.head()
selectivity_EtOH.head()
Like before the empty reactor 5 can be excluded with the r5 argument set to False.
react_cond, selectivity_EtOH = dinodat.react_cond(r5=False)
react_cond.tail()
selectivity_EtOH.tail()