Merge PR #139 from Squadula

Chaospy Linear Regression class
redmod-team · Jun 8, 2022 · e4e3f5a · e4e3f5a
2 parents cfef0aa + 712ab16
commit e4e3f5a
Show file tree

Hide file tree

Showing 19 changed files with 861 additions and 82 deletions.
diff --git a/.gitignore b/.gitignore
@@ -49,3 +49,4 @@ kernels_base.f90
 a.out
 *.S
 *.npy
+.idea
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,18 @@
+repos:
+- repo: local
+  hooks:
+  - id: jupyter-clear-output
+    name: jupyter-clear-output
+    files: \.ipynb$
+    stages: [commit]
+    language: system
+    entry: python3 -m nbconvert --ClearOutputPreprocessor.enabled=True --inplace
+- repo: https://github.com/pre-commit/pre-commit-hooks
+  rev: v2.3.0
+  hooks:
+  - id: end-of-file-fixer
+  - id: trailing-whitespace
+- repo: https://github.com/psf/black
+  rev: 22.3.0
+  hooks:
+  - id: black-jupyter
diff --git a/.zenodo.json b/.zenodo.json
@@ -1,5 +1,5 @@
 {
-   "description":"<p>proFit is a collection of tools for studying parametric dependencies of black-box simulation codes or experiments and construction of reduced order response models over input parameter space.</p><p>proFit can be fed with a number of data points consisting of different input parameter combinations and the resulting output of the simulation under investigation. It then fits a response-surface through the point cloud using Gaussian process regression (GPR) models. This probabilistic response model allows to predict (interpolate) the output at yet unexplored parameter combinations including uncertainty estimates. It can also tell you where to put more training points to gain maximum new information (experimental design) and automatically generate and start new simulation runs locally or on a cluster. Results can be explored and checked visually in a web frontend.</p><p>Telling proFit how to interact with your existing simulations is easy and requires no changes in your existing code. Current functionality covers starting simulations locally or on a cluster via <a href=\"https://slurm.schedmd.com\">Slurm</a>, subsequent surrogate modelling using <a href=\"https://github.com/SheffieldML/GPy\">GPy</a>, <a href=\"https://github.com/scikit-learn/scikit-learn\">scikit-learn</a>, as well as an active learning algorithm to iteratively sample at interesting points and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud and surrogate is based on <a href=\"https://github.com/plotly/dash\">plotly/dash</a>.</p><p>Features include: <ul><li>Compute evaluation points (e.g. from a random distribution) to run simulation</li><li>Template replacement and automatic generation of run directories</li><li>Starting parallel runs locally or on the cluster (SLURM)</li><li>Collection of result output and postprocessing</li><li>Response-model fitting using GPR</li><li>Active learning to reduce number of samples needed</li><li>MCMC to find a posterior parameter distribution (similar to active learning)</li><li>Graphical user interface to explore the results</li></ul></p>",
+   "description":"<p>proFit is a collection of tools for studying parametric dependencies of black-box simulation codes or experiments and construction of reduced order response models over input parameter space.</p><p>proFit can be fed with a number of data points consisting of different input parameter combinations and the resulting output of the simulation under investigation. It then fits a response-surface through the point cloud using Gaussian process regression (GPR) models. This probabilistic response model allows to predict (interpolate) the output at yet unexplored parameter combinations including uncertainty estimates. It can also tell you where to put more training points to gain maximum new information (experimental design) and automatically generate and start new simulation runs locally or on a cluster. Results can be explored and checked visually in a web frontend.</p><p>Telling proFit how to interact with your existing simulations is easy and requires no changes in your existing code. Current functionality covers starting simulations locally or on a cluster via <a href=\"https://slurm.schedmd.com\">Slurm</a>, subsequent surrogate modelling using <a href=\"https://github.com/SheffieldML/GPy\">GPy</a>, <a href=\"https://github.com/scikit-learn/scikit-learn\">scikit-learn</a>, as well as an active learning algorithm to iteratively sample at interesting points and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud and surrogate is based on <a href=\"https://github.com/plotly/dash\">plotly/dash</a>.</p><p>Features include: <ul><li>Compute evaluation points (e.g. from a random distribution) to run simulation</li><li>Template replacement and automatic generation of run directories</li><li>Starting parallel runs locally or on the cluster (SLURM)</li><li>Collection of result output and postprocessing</li><li>Response-model fitting using Gaussian Process Regression and Linear Regression</li><li>Active learning to reduce number of samples needed</li><li>MCMC to find a posterior parameter distribution (similar to active learning)</li><li>Graphical user interface to explore the results</li></ul></p>",
   "license":"MIT",
    "title":"proFit: Probabilistic Response Model Fitting with Interactive Tools",
    "creators":[

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -2,23 +2,25 @@
 Contributions to proFit are always welcome.
 
 ## Resources
-* official repository on [github](https://github.com/redmod-team/profit)
+* repository on [github](https://github.com/redmod-team/profit)
   * [issue tracker](https://github.com/redmod-team/profit/issues)
   * pull requests
   * feature planning via *Projects*
-  * automatic testing
+  * automation via *Actions*
   * managing releases
-* official documentation on [readthedocs.io](https://profit.readthedocs.io/en/latest)
+* documentation on [readthedocs.io](https://profit.readthedocs.io/en/latest)
 * meeting log on [HedgeDoc](https://pad.gwdg.de/lOiz56TIS4e5E9-92q-2MQ?view)
-* internal documentation and ideas at the github wiki (for now. should be moved to readthedocs in the future)
+* archived on [zenodo.org](https://zenodo.org/record/6563463#.Yo9PjiNBxQo) with DOI `10.5281/zenodo.3580488`
+* python package published on [PyPI](https://pypi.org/project/profit/)
+* test coverage on [coveralls.io](https://coveralls.io/github/redmod-team/profit)
 
-to gain access to internal documentation and regular meetings please get in touch with Christopher Albert 
-(albert@alumni.tugraz.at)
+to gain access to internal documentation and regular meetings please get in touch with Christopher Albert
+(albert@tugraz.at)
 
 ## Issues
-If you encounter any bugs, problems or specific missing features, please open an issue on 
+If you encounter any bugs, problems or specific missing features, please open an issue on
 [github](https://github.com/redmod-team/profit/issues). Please state the bug / problem / enhancement clearly and provide
-context information. 
+context information.
 
 ## Organization and planning
 Three types of [project boards](https://github.com/redmod-team/profit/projects) are used with the main repository.
@@ -27,62 +29,80 @@ Three types of [project boards](https://github.com/redmod-team/profit/projects)
 
 **Tasks** for prioritization and tracking issues and pull requests. Each version has it's own board.
 
-**Testing** for gathering observations in preparation of a new release. 
+**Testing** for gathering observations in preparation of a new release.
 
 ## git
 proFit uses *git* as VCS. The upstream repository is located at https://github.com/redmod-team/profit. Currently the
 upstream repository contains only one branch.
 
-Contributors should fork this repository, push changes to their fork and make *pull requests* to merge them back into 
+Contributors should fork this repository, push changes to their fork and make *pull requests* to merge them back into
 upstream. Before merging, the pull request should be reviewed by a maintainer and any changes can be discussed. For
 larger projects use *draft pull requests* to show others your current progress. They can also be tracked in the relevant
 *Projects*, added to release schedules and features can be discussed easily.
 
-To try out new features which have not been merged yet, you can just add any fork to your repository with 
+To try out new features which have not been merged yet, you can just add any fork to your repository with
 `git remote add <name> <url>` and merge it locally. Do not push this merge to your fork.
 
 The default method to resolve conflicts is to rebase the personal fork onto `upstream` before merging.
 
 Please also use *interactive rebase* to squash intermediate commits if you use many small commits.
 
-### Jupyter Notebooks
-Make sure to clear output of Jupyter notebooks before committing them to the git repository.
+### Pre-Commit Hooks
+Starting with the development for `v0.6`, proFit uses *pre-commit* to ensure consistent formatting with
+[black](https://github.com/psf/black) and clean jupyter notebooks.
+Pre-commit is configured with `.pre-commit-config.yaml` and needs to be activated with `pre-commit install`.
+To run the hooks on all files (e.g. after adding new hooks), use `pre-commit run --all-files`.
+
+## Installing
+Install proFit from your git repository using the editable install: `pip install -e .[docs,dev]`.
 
 ## Documentation
 The project documentation is maintained in the `doc` directory of the repository. proFit uses *Sphinx* and *rst* markup.
-The documentation is automatically built from the repository and hosted on 
+The documentation is automatically built from the repository on every commit and hosted on
 [readthedocs.io](https://profit.readthedocs.io/en/latest).
 
-Code documentation is generated automatically using [Sphinx AutoAPI](https://github.com/readthedocs/sphinx-autoapi). 
+Code documentation is generated automatically using [Sphinx AutoAPI](https://github.com/readthedocs/sphinx-autoapi).
 Please describe your code using docstrings, preferably following the *Google Docstring Format*. For typing please use
 the built in type annotations (PEP 484 / PEP 526).
 
-### creating documentation
-Creating the documentation requires additional packages (specify `docs` during `pip install`)
+To build the documentation locally, run `make html` inside the `doc` folder to create the output HTML in `_build`.
+This requires the additonal dependencies `docs`
 
-Running `make html` inside the `doc` folder creates output HTML file in `_build`.
+## Versioning
+proFit follows semantic versioning (`MAJOR.MINOR.PATCH`).
+Each version comes with a *git tag* and a *Release* on GitHub.
+proFit is still in development, comes with no gurantees of backwards compatability and is therefore using versions `v0.x`.
+The minor version is incremented when significant features have been implemented or projects completed.
+Each minor version is tracked with a *Project* in GitHub and receives release notes when published.
+It is good practice to create a release candidate `v0.xrc` before a release.
+The release candidate should be tagged as *pre-release* in GitHub. It will not be shown per default on *PyPI* and *readthedocs*.
 
-## Packaging
-proFit is still in development. There is currently no stable release. It is planned to manage releases on *github* and 
-publish them on *PyPi*.
+Releases are created with the GitHub interface and trigger workflows to automate the publishing and packaging. In particular:
+* A python package is created an uploaded to [PyPI](https://pypi.org/project/profit/)
+* The repository is archived and a new version added to [zenodo](https://zenodo.org/record/6563463#.Yo9PjiNBxQo)
+* A new version of the documentation is created on [readthedocs](https://profit.readthedocs.io)
+
+Before creating a version, check the metadata in `setup.cfg` (for the python package) and `.zenodo.json` (for zenodo).
 
-proFit uses the new build system, as specified by PEP 517. The build system is defined in `pyproject.toml` and uses the 
-default `setuptools` and `wheels`.
-Package metadata and requirements are specified in `setup.cfg`. Building the *fortran* backend requires a 
-`setup.py` file and `numpy` installed during the build process.
+proFit infers it's version from the installed metadata or the local git repository and displays it when called.
+
+## Packaging
+proFit uses the new python build system, as specified by PEP 517.
+The build system is defined in `pyproject.toml` and uses the default `setuptools` and `wheels`.
+Package metadata and requirements are specified in `setup.cfg`.
+Building the *fortran* backend requires a `setup.py` file and `numpy` installed during the build process.
 
-Upon publishing a new release in *GitHub*, a workflow should automatically upload the package to *PyPI*.
+Upon publishing a new release in *GitHub*, a workflow automatically builds and uploads the package to *PyPI*.
 To create a release manually follow this [guide](https://packaging.python.org/tutorials/packaging-projects/).
-The new version is automatically added to [zenodo](https://zenodo.org/record/4849489), make sure to update the metadata in `.zenodo.json`.
 
 ## Testing
-proFit uses `pytest` for automatic testing. A pull request on *GitHub* triggers automatic testing with the supported
-python versions. 
+proFit uses `pytest` for automatic testing. A pull request on *GitHub* triggers automatic testing with the supported python versions.
+The *GitHub* action also determines the test coverage and uploads it to [coveralls](https://coveralls.io/github/redmod-team/profit).
 
 ## Coding
 ### Dependencies
-Some calls to proFit should be completed very fast, but our many dependencies can slow down the startup time 
-significantly. Therefore be careful where you import big packages like `GPy` or `sklearn`. Consider using import 
+Some calls to proFit should be completed very fast, but our many dependencies can slow down the startup time
+significantly. Therefore be careful where you import big packages like `GPy` or `sklearn`. Consider using import
 statements at the start of the function.
 
 Investigating the tree of imported packages can be done graphically with `tuna`:

diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ and surrogate is based on [plotly/dash](https://github.com/plotly/dash).
 * Template replacement and automatic generation of run directories
 * Starting parallel runs locally or on the cluster (SLURM)
 * Collection of result output and postprocessing
-* Response-model fitting using GPR
+* Response-model fitting using Gaussian Process Regression and Linear Regression
 * Active learning to reduce number of samples needed
 * MCMC to find a posterior parameter distribution (similar to active learning)
 * Graphical user interface to explore the results

diff --git a/doc/variables.rst b/doc/variables.rst
@@ -4,23 +4,27 @@ Variables
 =========
 
 * Input variables
-    Values according to a (random) distribution or successively inserted through
-    active learning.
-
-    Possible distributions:
-
-    * Halton sequence (quasi-random, space filling)
-    * Uniform (random)
-    * Log-uniform (random)
-    * Normal (random)
-    * Linear vector
-    * Constant value
-* Independent variables
+    Drawn from random distributions:
+
+    * `Halton` sequence (quasi-random, space filling)
+    * `Uniform` distribution
+    * `LogUniform`: uniformly distributed in log-space
+    * `Normal` distribution
+
+    Fixed values:
+
+    * `Linear`: linearly spaced values
+    * `Constant` (excluded from the fit)
+
+    Special:
+
+    * `ActiveLearning`: succesively inserted according to a specified optimization strategy
+* `Independent` variables
     The user can bind an independent variable to an output variable, if the simulation outputs a (known) vector over linear supporting points. This
     dimension is then not considered during fitting, in contrast to full multi-
     output models. This lowers necessary computing resources and can even
     enhance the quality of the fit, since complexity of the model is reduced.
-* Output variables
+* `Output` variables
     Default output is a scalar value, but with the attachment of independent
     variables, it becomes a vector. In the config file, also several output variables
     can be defined independently, which leads to multi-output surrogates during
@@ -48,7 +52,7 @@ Definition of variables inside the `profit.yaml` configuration file.
         a2: Uniform(0, 1)  # Same as 'a1'
         b: Normal(0, 1e-2)  # Normal distribution with 0 mean and 1e-2 standard deviation.
         c1: 0.2  # Constant value.
-        c2: Constant(0.2)  # Same as 'c'.
+        c2: Constant(0.2)  # Same as 'c1'.
         d: LogUniform(1e-4, 0.1)  # LogUniform distribution.
         e: Halton(0, 3)  # Quasi-random Halton sequence.
         h: Linear(-1, 1)  # Linear vector with size of 'ntrain'.
-Original file line number
+Diff line change
@@ Expand Up / @@ -49,3 +49,4 @@ kernels_base.f90 @@
     a.out
     *.S
     *.npy
+    .idea