TAPIR (Thermomechanical Advanced Polymer Informatics & Resource)

Requirements

Conda is installed

Installation

git clone https://github.com/peterpaohuang/tapir.git
conda create -c rdkit -n tapir rdkit
conda activate tapir_env
Download polymer_db.csv
Move polymer_db.csv into tapir directory
python setup.py while inside tapir_env conda environment

Initialize

from depablo_box import PDBML, model

dx = PDBML()

Understand the database

Access database as pandas dataframe

df = dx.df

List all polymers and corresponding smiles

# list both polymer names and smiles
df[["polymer_name", "smiles"]]

# list only polymer names
df["polymer_name"]

# list only smiles
df["smiles"]

# list only inchi keys
df["inchi"]

# retrieve polymer row by polymer_name
df.loc[df["polymer_name"] == polymer_name]

# retrieve polymer row by smiles
df.loc[df["smiles"] == smiles]

# retrieve polymer row by inchi key
df.loc[df["inchi"] = inchi_key]

List Descriptors

Supported Chemical Descriptors

dx.chemical_descriptors

ExactMolWt
FpDensityMorgan1
FpDensityMorgan2
FpDensityMorgan3
HeavyAtomMolWt
MolWt
etc

Supported Thermo-Physical Descriptors

dx.experimental_descriptors

Molar Volume Vm
Density ρ
Solubility Parameter δ
Molar Cohesive Energy Ecoh
Glass Transition Temperature Tg
Molar Heat Capacity Cp
Entanglement Molecular Weight Me
Index of Refraction n
Coefficient of Thermal Expansion α
Molecular Weight of Repeat unit
Van-der-Waals Volume VvW

See distribution of NaN values in database for Thermo-Physical Descriptors

dx.na_distribution()

List Machine Learning Methods

dx.ml_methods

List Conversion Formats Directly from SMILES

dx.conversion_formats

How to use

Note: currently, depablo_box is only able to handle the calculation of chemical descriptors. Experimental descriptors already exists within the database (dx.df)

Get Chemical Descriptors

descriptor_list = ["ExactMolWt", "HeavyAtomMolWt"]
polymer_identifier = "C=CC(=O)NC(C)(C)C" # can also be the polymer_name
descriptor_df = dx.get_descriptors(polymer_identifier, descriptor_list)

Generate Input Files for Quantum Chemistry Codes

Supported Conversion Formats

Protein Data Bank
Gaussian 98/03 Input

polymer_identifier = 'CC(=O)OC=C' # can also be the polymer_name
conversion_format = 'Gaussian 98/03 Input'
outpath = '/file/path/your_polymer.xyz'
dx.create_input_file(polymer_identifier, conversion_format, outpath)

Add Chemical Descriptors to dataframe

dx.add_descriptors(descriptor_list)

Plot Properties as scatterplot

dx.plot_properties(property_x="glass_transition_temperature", property_y="ExactMolWt")

Plot Many Properties as Pairplot

dx.plot_many(property_list)

Get Correlation Between Two Properties

dx.property_correlation("molar_heat_capacity", "HeavyAtomMolWt")

Plot Correlation Heatmap of Many Properties

dx.correlation_map(property_list)

Export Dataframe as CSV file

dx.export_csv(outpath)

Initialize Model Training

# input_properties must have already been added to PDBML().df
input_properties = ["molar_heat_capacity", "ExactMolWt", "HeavyAtomMolWt"]
output_property = "solubility_parameter"
na_strategy = "remove"
ml = model(df, input_properties, output_property, na_strategy=na_strategy)

Train Model

Supported Model Types

Support Vector Regression
Linear Regression
Ridge Regression
Lasso Regression
Gaussian Process Regression

model_type = "Support Vector Regression"
ml.train(model_type)

View Trained Model R^2 Score

ml.r_2

Predict on new data

new_data = [["10.5", "29", "102.1"]]
results = ml.predict(new_data)

Plot Feature Importances

Note: model type Gaussian Process Regression does not support feature importances

ml.feature_importance()

Export Trained Model as Pickle File

ml.export_fitted_model(outpath)

Load Pickle File as Trained Model

import pickle
with open(outpath, "rb") as f:
  ml = pickle.load(f)
results = ml.predict(new_data)

Scrape CROW Polymer DB for experimental thermo-physical properties

from depablo_box import polymer_scraper

Initialize scraper

scraper = polymer_scraper()

Start Scraping

scraper.start()

Once Finished, Store Scraped Data

outpath = /file/path/to/store/FILE.csv
scraper.store_data(outpath)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TAPIR (Thermomechanical Advanced Polymer Informatics & Resource)

Requirements

Installation

Initialize

Understand the database

Access database as pandas dataframe

List all polymers and corresponding smiles

List Descriptors

Supported Chemical Descriptors

Supported Thermo-Physical Descriptors

See distribution of NaN values in database for Thermo-Physical Descriptors

List Machine Learning Methods

List Conversion Formats Directly from SMILES

How to use

Get Chemical Descriptors

Generate Input Files for Quantum Chemistry Codes

Supported Conversion Formats

Add Chemical Descriptors to dataframe

Plot Properties as scatterplot

Plot Many Properties as Pairplot

Get Correlation Between Two Properties

Plot Correlation Heatmap of Many Properties

Export Dataframe as CSV file

Initialize Model Training

Train Model

Supported Model Types

View Trained Model R^2 Score

Predict on new data

Plot Feature Importances

Export Trained Model as Pickle File

Load Pickle File as Trained Model

Scrape CROW Polymer DB for experimental thermo-physical properties

Initialize scraper

Start Scraping

Once Finished, Store Scraped Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

TAPIR (Thermomechanical Advanced Polymer Informatics & Resource)

Requirements

Installation

Initialize

Understand the database

Access database as pandas dataframe

List all polymers and corresponding smiles

List Descriptors

Supported Chemical Descriptors

Supported Thermo-Physical Descriptors

See distribution of NaN values in database for Thermo-Physical Descriptors

List Machine Learning Methods

List Conversion Formats Directly from SMILES

How to use

Get Chemical Descriptors

Generate Input Files for Quantum Chemistry Codes

Supported Conversion Formats

Add Chemical Descriptors to dataframe

Plot Properties as scatterplot

Plot Many Properties as Pairplot

Get Correlation Between Two Properties

Plot Correlation Heatmap of Many Properties

Export Dataframe as CSV file

Initialize Model Training

Train Model

Supported Model Types

View Trained Model R^2 Score

Predict on new data

Plot Feature Importances

Export Trained Model as Pickle File

Load Pickle File as Trained Model

Scrape CROW Polymer DB for experimental thermo-physical properties

Initialize scraper

Start Scraping

Once Finished, Store Scraped Data