Machine learning method for predicting physicochemical properties of light oil fractions
This repository contains the GauL-HDAD-derived algorithm for representing chemical mixtures as a numeric vector and neural networks for mixture property prediction.
This package is written in python and can be run from command prompt. This requires the installation of several python packages, which are listed below. It is recommended to use Anaconda for managing packages (https://docs.anaconda.com/anaconda/install/).
- Python 3.8 or higher
- NumPy
conda install numpy
- RDKit
conda install -c conda-forge rdkit
- Scikit-learn
conda install -c conda-forge scikit-learn
- joblib
conda install -c anaconda joblib
- TensorFlow 2
pip install tensorflow
- matplotlib
conda install -c conda-forge matplotlib
The package can then be installed by cloning this repository.
The data that is used in the original work is available in the folder Data
. Five files are included:
labeled_library.pickle
A library with all C4 to C12 PIONA molecules and their experimental data (if available)mei_input.pickle
The input values (lumped naphtha samples) from Mei et al.mei_output.pickle
The output values (boiling points and mixture properties) from Mei et al.pyl_input.pickle
The input values (lumped naphtha samples) from Pyl et al.pyl_output.pickle
The output values (boiling points and mixture properties) from Pyl et al.
Changing the input and output sources is possible in the file input.py
.
This repository contains a folder named pretrained
. In that folder, pretrained pure compound property models are available and the gaussian mixture models for the molecular representations are premade using all molecules in Data/labeled_library.pickle
.
Using the pretrained model speeds up training, since only the boiling points and the desired mixture property has to be trained.
Training mixture properties using the pretrained model is possible via following command:
python train.py pretrained <property>
Replace with the mixture property that you want to predict.
Currently available properties:
- Using data from Pyl et al.:
Specific Gravity: sg >>>python train.py pretrained sg
- Using data from Mei et al.:
Liquid Density: d20 or density >>>python train.py pretrained d20
orpython train.py pretrained density
Dynamic Viscosity: mu or viscosity >>>python train.py pretrained mu
orpython train.py pretrained viscosity
Surface Tension: st or surface tension >>>python train.py pretrained st
orpython train.py pretrained "surface tension"
All results will be found in the pretrained
folder.
It is also possible to train all models yourself. Due to the large number of hydrocarbons, the creation of Gaussian mixture models will take several hours.
You can train the models simply using python train.py <folder> <property>
Replace with the name of the folder where you want to store your results.
Replace with the desired property, as stated above.
When using this prediction model for your own publication, please cite the original papers:
Learning Molecular Representations for Thermochemistry Prediction of Cyclic Hydrocarbons and Oxygenates
Dobbelaere, M.R.; Plehiers, P.P.; Van de Vijver, R.; Stevens, C.V.; Van Geem, K.M.
J. Phys. Chem. A 2021, 125, 23, 5166–5179
Machine Learning for Physicochemical Property Prediction of Complex Hydrocarbon Mixtures
Dobbelaere, M.R.; Ureel, Y.; Vermeire, F.H.; Stevens, C.V.; Van Geem, K.M.
Submitted to Industrial and Engineering Chemistry Research, 2022