Skip to content

Commit

Permalink
Merge PR #139 from Squadula
Browse files Browse the repository at this point in the history
Chaospy Linear Regression class
  • Loading branch information
Rykath authored Jun 8, 2022
2 parents cfef0aa + 712ab16 commit e4e3f5a
Show file tree
Hide file tree
Showing 19 changed files with 861 additions and 82 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ kernels_base.f90
a.out
*.S
*.npy
.idea
18 changes: 18 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
repos:
- repo: local
hooks:
- id: jupyter-clear-output
name: jupyter-clear-output
files: \.ipynb$
stages: [commit]
language: system
entry: python3 -m nbconvert --ClearOutputPreprocessor.enabled=True --inplace
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black-jupyter
2 changes: 1 addition & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"description":"<p>proFit is a collection of tools for studying parametric dependencies of black-box simulation codes or experiments and construction of reduced order response models over input parameter space.</p><p>proFit can be fed with a number of data points consisting of different input parameter combinations and the resulting output of the simulation under investigation. It then fits a response-surface through the point cloud using Gaussian process regression (GPR) models. This probabilistic response model allows to predict (interpolate) the output at yet unexplored parameter combinations including uncertainty estimates. It can also tell you where to put more training points to gain maximum new information (experimental design) and automatically generate and start new simulation runs locally or on a cluster. Results can be explored and checked visually in a web frontend.</p><p>Telling proFit how to interact with your existing simulations is easy and requires no changes in your existing code. Current functionality covers starting simulations locally or on a cluster via <a href=\"https://slurm.schedmd.com\">Slurm</a>, subsequent surrogate modelling using <a href=\"https://github.com/SheffieldML/GPy\">GPy</a>, <a href=\"https://github.com/scikit-learn/scikit-learn\">scikit-learn</a>, as well as an active learning algorithm to iteratively sample at interesting points and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud and surrogate is based on <a href=\"https://github.com/plotly/dash\">plotly/dash</a>.</p><p>Features include: <ul><li>Compute evaluation points (e.g. from a random distribution) to run simulation</li><li>Template replacement and automatic generation of run directories</li><li>Starting parallel runs locally or on the cluster (SLURM)</li><li>Collection of result output and postprocessing</li><li>Response-model fitting using GPR</li><li>Active learning to reduce number of samples needed</li><li>MCMC to find a posterior parameter distribution (similar to active learning)</li><li>Graphical user interface to explore the results</li></ul></p>",
"description":"<p>proFit is a collection of tools for studying parametric dependencies of black-box simulation codes or experiments and construction of reduced order response models over input parameter space.</p><p>proFit can be fed with a number of data points consisting of different input parameter combinations and the resulting output of the simulation under investigation. It then fits a response-surface through the point cloud using Gaussian process regression (GPR) models. This probabilistic response model allows to predict (interpolate) the output at yet unexplored parameter combinations including uncertainty estimates. It can also tell you where to put more training points to gain maximum new information (experimental design) and automatically generate and start new simulation runs locally or on a cluster. Results can be explored and checked visually in a web frontend.</p><p>Telling proFit how to interact with your existing simulations is easy and requires no changes in your existing code. Current functionality covers starting simulations locally or on a cluster via <a href=\"https://slurm.schedmd.com\">Slurm</a>, subsequent surrogate modelling using <a href=\"https://github.com/SheffieldML/GPy\">GPy</a>, <a href=\"https://github.com/scikit-learn/scikit-learn\">scikit-learn</a>, as well as an active learning algorithm to iteratively sample at interesting points and a Markov-Chain-Monte-Carlo (MCMC) algorithm. The web frontend to interactively explore the point cloud and surrogate is based on <a href=\"https://github.com/plotly/dash\">plotly/dash</a>.</p><p>Features include: <ul><li>Compute evaluation points (e.g. from a random distribution) to run simulation</li><li>Template replacement and automatic generation of run directories</li><li>Starting parallel runs locally or on the cluster (SLURM)</li><li>Collection of result output and postprocessing</li><li>Response-model fitting using Gaussian Process Regression and Linear Regression</li><li>Active learning to reduce number of samples needed</li><li>MCMC to find a posterior parameter distribution (similar to active learning)</li><li>Graphical user interface to explore the results</li></ul></p>",
"license":"MIT",
"title":"proFit: Probabilistic Response Model Fitting with Interactive Tools",
"creators":[
Expand Down
82 changes: 51 additions & 31 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,25 @@
Contributions to proFit are always welcome.

## Resources
* official repository on [github](https://github.com/redmod-team/profit)
* repository on [github](https://github.com/redmod-team/profit)
* [issue tracker](https://github.com/redmod-team/profit/issues)
* pull requests
* feature planning via *Projects*
* automatic testing
* automation via *Actions*
* managing releases
* official documentation on [readthedocs.io](https://profit.readthedocs.io/en/latest)
* documentation on [readthedocs.io](https://profit.readthedocs.io/en/latest)
* meeting log on [HedgeDoc](https://pad.gwdg.de/lOiz56TIS4e5E9-92q-2MQ?view)
* internal documentation and ideas at the github wiki (for now. should be moved to readthedocs in the future)
* archived on [zenodo.org](https://zenodo.org/record/6563463#.Yo9PjiNBxQo) with DOI `10.5281/zenodo.3580488`
* python package published on [PyPI](https://pypi.org/project/profit/)
* test coverage on [coveralls.io](https://coveralls.io/github/redmod-team/profit)

to gain access to internal documentation and regular meetings please get in touch with Christopher Albert
(albert@alumni.tugraz.at)
to gain access to internal documentation and regular meetings please get in touch with Christopher Albert
(albert@tugraz.at)

## Issues
If you encounter any bugs, problems or specific missing features, please open an issue on
If you encounter any bugs, problems or specific missing features, please open an issue on
[github](https://github.com/redmod-team/profit/issues). Please state the bug / problem / enhancement clearly and provide
context information.
context information.

## Organization and planning
Three types of [project boards](https://github.com/redmod-team/profit/projects) are used with the main repository.
Expand All @@ -27,62 +29,80 @@ Three types of [project boards](https://github.com/redmod-team/profit/projects)

**Tasks** for prioritization and tracking issues and pull requests. Each version has it's own board.

**Testing** for gathering observations in preparation of a new release.
**Testing** for gathering observations in preparation of a new release.

## git
proFit uses *git* as VCS. The upstream repository is located at https://github.com/redmod-team/profit. Currently the
upstream repository contains only one branch.

Contributors should fork this repository, push changes to their fork and make *pull requests* to merge them back into
Contributors should fork this repository, push changes to their fork and make *pull requests* to merge them back into
upstream. Before merging, the pull request should be reviewed by a maintainer and any changes can be discussed. For
larger projects use *draft pull requests* to show others your current progress. They can also be tracked in the relevant
*Projects*, added to release schedules and features can be discussed easily.

To try out new features which have not been merged yet, you can just add any fork to your repository with
To try out new features which have not been merged yet, you can just add any fork to your repository with
`git remote add <name> <url>` and merge it locally. Do not push this merge to your fork.

The default method to resolve conflicts is to rebase the personal fork onto `upstream` before merging.

Please also use *interactive rebase* to squash intermediate commits if you use many small commits.

### Jupyter Notebooks
Make sure to clear output of Jupyter notebooks before committing them to the git repository.
### Pre-Commit Hooks
Starting with the development for `v0.6`, proFit uses *pre-commit* to ensure consistent formatting with
[black](https://github.com/psf/black) and clean jupyter notebooks.
Pre-commit is configured with `.pre-commit-config.yaml` and needs to be activated with `pre-commit install`.
To run the hooks on all files (e.g. after adding new hooks), use `pre-commit run --all-files`.

## Installing
Install proFit from your git repository using the editable install: `pip install -e .[docs,dev]`.

## Documentation
The project documentation is maintained in the `doc` directory of the repository. proFit uses *Sphinx* and *rst* markup.
The documentation is automatically built from the repository and hosted on
The documentation is automatically built from the repository on every commit and hosted on
[readthedocs.io](https://profit.readthedocs.io/en/latest).

Code documentation is generated automatically using [Sphinx AutoAPI](https://github.com/readthedocs/sphinx-autoapi).
Code documentation is generated automatically using [Sphinx AutoAPI](https://github.com/readthedocs/sphinx-autoapi).
Please describe your code using docstrings, preferably following the *Google Docstring Format*. For typing please use
the built in type annotations (PEP 484 / PEP 526).

### creating documentation
Creating the documentation requires additional packages (specify `docs` during `pip install`)
To build the documentation locally, run `make html` inside the `doc` folder to create the output HTML in `_build`.
This requires the additonal dependencies `docs`

Running `make html` inside the `doc` folder creates output HTML file in `_build`.
## Versioning
proFit follows semantic versioning (`MAJOR.MINOR.PATCH`).
Each version comes with a *git tag* and a *Release* on GitHub.
proFit is still in development, comes with no gurantees of backwards compatability and is therefore using versions `v0.x`.
The minor version is incremented when significant features have been implemented or projects completed.
Each minor version is tracked with a *Project* in GitHub and receives release notes when published.
It is good practice to create a release candidate `v0.xrc` before a release.
The release candidate should be tagged as *pre-release* in GitHub. It will not be shown per default on *PyPI* and *readthedocs*.

## Packaging
proFit is still in development. There is currently no stable release. It is planned to manage releases on *github* and
publish them on *PyPi*.
Releases are created with the GitHub interface and trigger workflows to automate the publishing and packaging. In particular:
* A python package is created an uploaded to [PyPI](https://pypi.org/project/profit/)
* The repository is archived and a new version added to [zenodo](https://zenodo.org/record/6563463#.Yo9PjiNBxQo)
* A new version of the documentation is created on [readthedocs](https://profit.readthedocs.io)

Before creating a version, check the metadata in `setup.cfg` (for the python package) and `.zenodo.json` (for zenodo).

proFit uses the new build system, as specified by PEP 517. The build system is defined in `pyproject.toml` and uses the
default `setuptools` and `wheels`.
Package metadata and requirements are specified in `setup.cfg`. Building the *fortran* backend requires a
`setup.py` file and `numpy` installed during the build process.
proFit infers it's version from the installed metadata or the local git repository and displays it when called.

## Packaging
proFit uses the new python build system, as specified by PEP 517.
The build system is defined in `pyproject.toml` and uses the default `setuptools` and `wheels`.
Package metadata and requirements are specified in `setup.cfg`.
Building the *fortran* backend requires a `setup.py` file and `numpy` installed during the build process.

Upon publishing a new release in *GitHub*, a workflow should automatically upload the package to *PyPI*.
Upon publishing a new release in *GitHub*, a workflow automatically builds and uploads the package to *PyPI*.
To create a release manually follow this [guide](https://packaging.python.org/tutorials/packaging-projects/).
The new version is automatically added to [zenodo](https://zenodo.org/record/4849489), make sure to update the metadata in `.zenodo.json`.

## Testing
proFit uses `pytest` for automatic testing. A pull request on *GitHub* triggers automatic testing with the supported
python versions.
proFit uses `pytest` for automatic testing. A pull request on *GitHub* triggers automatic testing with the supported python versions.
The *GitHub* action also determines the test coverage and uploads it to [coveralls](https://coveralls.io/github/redmod-team/profit).

## Coding
### Dependencies
Some calls to proFit should be completed very fast, but our many dependencies can slow down the startup time
significantly. Therefore be careful where you import big packages like `GPy` or `sklearn`. Consider using import
Some calls to proFit should be completed very fast, but our many dependencies can slow down the startup time
significantly. Therefore be careful where you import big packages like `GPy` or `sklearn`. Consider using import
statements at the start of the function.

Investigating the tree of imported packages can be done graphically with `tuna`:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ and surrogate is based on [plotly/dash](https://github.com/plotly/dash).
* Template replacement and automatic generation of run directories
* Starting parallel runs locally or on the cluster (SLURM)
* Collection of result output and postprocessing
* Response-model fitting using GPR
* Response-model fitting using Gaussian Process Regression and Linear Regression
* Active learning to reduce number of samples needed
* MCMC to find a posterior parameter distribution (similar to active learning)
* Graphical user interface to explore the results
Expand Down
32 changes: 18 additions & 14 deletions doc/variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,27 @@ Variables
=========

* Input variables
Values according to a (random) distribution or successively inserted through
active learning.

Possible distributions:

* Halton sequence (quasi-random, space filling)
* Uniform (random)
* Log-uniform (random)
* Normal (random)
* Linear vector
* Constant value
* Independent variables
Drawn from random distributions:

* `Halton` sequence (quasi-random, space filling)
* `Uniform` distribution
* `LogUniform`: uniformly distributed in log-space
* `Normal` distribution

Fixed values:

* `Linear`: linearly spaced values
* `Constant` (excluded from the fit)

Special:

* `ActiveLearning`: succesively inserted according to a specified optimization strategy
* `Independent` variables
The user can bind an independent variable to an output variable, if the simulation outputs a (known) vector over linear supporting points. This
dimension is then not considered during fitting, in contrast to full multi-
output models. This lowers necessary computing resources and can even
enhance the quality of the fit, since complexity of the model is reduced.
* Output variables
* `Output` variables
Default output is a scalar value, but with the attachment of independent
variables, it becomes a vector. In the config file, also several output variables
can be defined independently, which leads to multi-output surrogates during
Expand Down Expand Up @@ -48,7 +52,7 @@ Definition of variables inside the `profit.yaml` configuration file.
a2: Uniform(0, 1) # Same as 'a1'
b: Normal(0, 1e-2) # Normal distribution with 0 mean and 1e-2 standard deviation.
c1: 0.2 # Constant value.
c2: Constant(0.2) # Same as 'c'.
c2: Constant(0.2) # Same as 'c1'.
d: LogUniform(1e-4, 0.1) # LogUniform distribution.
e: Halton(0, 3) # Quasi-random Halton sequence.
h: Linear(-1, 1) # Linear vector with size of 'ntrain'.
Expand Down
Loading

0 comments on commit e4e3f5a

Please sign in to comment.