This repository contains code, datasets, and results from the paper:
K. Zdybał, A. Parente, J. C. Sutherland - Improving reduced-order models through nonlinear decoding of projection-dependent outputs, Patterns, 4, (2023) 100859
To cite this publication:
@article{zdybal2023improving,
title={Improving reduced-order models through nonlinear decoding of projection-dependent outputs},
author={Zdybał, Kamila and Parente, Alessandro and Sutherland, James C},
journal={Patterns},
volume={4},
pages = {100859},
issn = {},
year={2023},
publisher={Cell Press},
doi={https://doi.org/10.1016/j.patter.2023.100859},
}
Large datasets are increasingly abundant in various scientific and engineering disciplines. Multiple physical variables are frequently gathered into one dataset, leading to high data dimensionality. Visualizing and understanding multivariate datasets, and building data-driven models based on the collected variables can be achieved through dimensionality reduction. However, in many reduction techniques to date, there is no guarantee that the reduced data representation will posses certain desired topological qualities. We show that the quality of reduced data representations can be significantly improved by informing data projections by target quantities of interest (QoIs), some of which are functions of the projection itself. The target QoIs are often known to researchers as variables that should be well represented on a projection. Those can include closure terms required in modeling, important physical variables other than the state variables, or class labels in the case of categorical data. Our approach of computing improved data representations can find application in all areas of science and engineering that aim to reduce the dimensionality of multivariate datasets, as well as in fundamental research of representation learning. This work can have particular relevance in efficient data visualization and in efficient modeling of dynamical systems with many degrees of freedom.
Datasets used in this study are stored in the data/
directory. These include multivariate combustion datasets for:
- Steady laminar flamelet, hydrogen
- Steady laminar flamelet, syngas
- Steady laminar flamelet, methane
- Steady laminar flamelet, ethylene
- Zero-dimensional reactor, syngas
The main results can be reproduced using scripts contained in the scripts/
directory. The chronology of running these scripts is as follows:
QoIAwareProjection-train.py
QoIAwareProjection-VarianceData.py
QoIAwareProjection-kernel-regression-2D.py
andQoIAwareProjection-kernel-regression-3D.py
Scripts 1. and 2. can take a long time to run. Script 2. is parallelized and it is highly recommended that it is run on multiple CPUs. We have completed our computations running this script on 64CPUs, where looping over 100 random seeds for a single dataset takes about 20 hours to complete.
The results for the synthetic dataset from Fig. 2. can be run on multiple CPUs using the following scripts:
illustrative-example-linear-reconstruction-from-a-subspace.py
illustrative-example-nonlinear-reconstruction-from-a-subspace.py
illustrative-example-costs.py
Our open-source Python library, PCAfold, is required. Specifically, the user will need the class QoIAwareProjection
. More information can be found in this illustrative tutorial. We recommend a Python stack with Python>=3.8
and the latest versions of all the necessary modules.
For results reproducibility, we use fixed random seeds for neural network initialization and training. The exact values for random seeds can be retrieved from the code provided.
Once the main results are obtained using the scripts from the scripts/
directory, the following Jupyter notebooks can be used to post-process results and generate figures:
- This Jupyter notebook can be used to reproduce results from Fig. 1B and from the Graphical abstract.
- This Jupyter notebook can be used to reproduce results from Fig. 2.
- This Jupyter notebook can be used to reproduce results from Fig. 3A.
- This Jupyter notebook can be used to reproduce results from Fig. 3B.
- This Jupyter notebook can be used to reproduce results from Fig. 3C.
- This Jupyter notebook can be used to reproduce results from Fig. 4A.
- This Jupyter notebook can be used to reproduce results from Fig. 4B-C and Fig. 4F.
- This Jupyter notebook can be used to reproduce results from Fig. 4D-F.
- This Jupyter notebook can be used to reproduce results from Figs. S1-S2.