Benchmarking method for predicting OSDA applicability in zeolite synthesis

This repository contains code and analysis relevant to the manuscript "Learning descriptors to predict organic structure-directing agent applicability in zeolite synthesis" by Alexander J. Hoffman, Mingrou Xie, and Rafael Gómez-Bombarelli. This work assesses methods to identify the best organic structure-directing agent (OSDA) to synthesize a given zeolite. There are two folders:

data: contains the cleaned dataset for all OSDA-zeolite pairs for which we could estimate binding entropies (clean_all_data.csv), a sample of this dataset used to perform the SISSO run (sisso_sample.csv—formatted to fit the input format for the SISSO code—and sisso_sample_for_analysis.csv—formatted for analysis in the notebook), the computed formation energies for the zeolites studied in this work (form_E.csv), and the expressions that were generated by SISSO from these data (sisso_expressions.txt).
notebooks: contains notebooks for analysis and regenerating figures from the manuscript.

There are 5 Python notebooks in the notebooks directory:

classifier_tests.ipynb: this notebook contains quick tests where we train random forest and neural network (NN) classifiers on the data we have for zeolite synthesis to predict whether or not these methods accurately identify existing OSDA-zeolite pairs.
data_preparation.ipynb: this notebook contains the code used to clean the data from the previous study from our group (Schwalbe-Koda et al., Science (2021)) and add the binding entropy estimations from the equations of Dauenhauer and Abdelrahman (ACS Cent. Sci. (2018)).
formation_comparison.ipynb: this notebook contains a comparison between formation energies of pure silica frameworks computed using DFT (at the PBE-D3 level) and the DREIDING forcefield.
sisso_equation_initial.ipynb: this notebook contains the code to parse and organize the outputs of a SISSO run from the data gathered in this manuscript. Specifically, it:
- parses and computes the related values for the descriptors (equations) produced by SISSO
- fits decision tree and logistic models to those equations to screen them for the best candidate descriptors
analysis.ipynb: this notebook analyzes the data for our manuscript and contains the code to reproduce the plots. Specifically, it:
- computes the literature recall area-under-the-curve (AUC) for the various metrics that we constructed in the paper ($E_{ij,T}$, $E_{\text{form},ij,T}$, $\Delta E_{\text{form},ij}$, $A_{ij,T}$, and the best SISSO descriptor $\alpha_T$)
- reproduces all of the plots in the main text of the manuscript and most of the plots in the supporting information

To run most of the notebooks, a minimal python installation is required. Most of the packages can be installed by running

pip install ase matplotlib numpy pandas scikit-learn scipy seaborn tqdm

PyTorch is required to run the classifier_tests.ipynb notebook. Please follow the installation instructions for PyTorch here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
sisso_toc.png		sisso_toc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking method for predicting OSDA applicability in zeolite synthesis

About

Releases

Packages

Contributors 2

Languages

License

learningmatter-mit/ZeoliteSynMetrics

Folders and files

Latest commit

History

Repository files navigation

Benchmarking method for predicting OSDA applicability in zeolite synthesis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages