Update examples and grammar

CM2S · Feb 8, 2024 · 3d6d8f6 · 3d6d8f6
1 parent 2aa68b3
commit 3d6d8f6
Show file tree

Hide file tree

Showing 4 changed files with 64 additions and 58 deletions.
diff --git a/examples/sample_curve_fitting/description.md b/examples/sample_curve_fitting/description.md
@@ -1,4 +1,4 @@
-## sample_curve_fitting example
+## Curve fitting example - analytical curve
 
 A simple analytical curve fitting problem is included to demonstrate how to use `piglot`.
 In this case, we aim to fit a quadratic expression of the type $f(x) = a x^2$, using as a reference, a numerically generated reference from the expression $f(x) = 2 x^2$ (provided in the `examples/sample_curve_fitting/reference_curve.txt` file).
@@ -35,7 +35,7 @@ objective:
 ```
 The generated response with the label `case_1` is compared with our reference response, given from the file `reference_curve.txt`
 
-To run this example, open a terminal inside the `piglot` repository, enter the `examples/sample_curve_fitting` directory and run piglot with the given configuration file
+To run this example, open a terminal inside the `piglot` repository, enter the `examples/sample_curve_fitting` directory and run `piglot` with the given configuration file
 ```bash
 cd examples/sample_curve_fitting
 piglot config.yaml
@@ -48,15 +48,15 @@ Best loss:  8.85050592e-08
 Best parameters
 - a:     1.999508
 ```
-As you can see, piglot correctly identifies the `a` parameter close to the expected value of 2, and the error of the fitting is in the order of $10^{-8}$.
-In addition to these outputs, `piglot` creates an output directory, with the same name of the configuration file (minus the extension), where it stores the optimisation data.
+As you can see, `piglot` correctly identifies the `a` parameter close to the expected value of 2, and the error of the fitting is in the order of $10^{-8}$.
+In addition to these outputs, `piglot` creates an output directory, with the same name as the configuration file (minus the extension), where it stores the optimisation data.
 
 To visualise the optimisation results, use the `piglot-plot` utility.
 In the same directory, run
 ```bash
 piglot-plot best config.yaml
 ```
-Which will display the best observed value for the optimisation problem.
+Which will display the best-observed value for the optimisation problem.
 You should see the following output in the terminal
 ```
 Best run:
@@ -67,7 +67,7 @@ Name: 18, dtype: object
 Hash: 2313718f75bc0445aa71df7d6d4e50ba82ad593d65f3762efdcbed01af338e30
 Objective:  8.85050592e-08
 ```
-The script will also plot the best observed response, and its comparison with the reference response: 
+The script will also plot the best observed response, and its comparison with the reference response:
 ![Best case plot](../../docs/source/simple_example/best.svg)
 
 Now, try running (this may take some time)
@@ -76,13 +76,13 @@ piglot-plot animation config.yaml
 ```
 This generates an animation for all the function evaluations that have been made throughout the optimisation procedure.
 You can find the `.gif` file(s) inside the output directory, which should give something like:
-![Best case plot](../../docs/source/simple_example/animation.gif)
+![Animation](../../docs/source/simple_example/animation.gif)
 
 
 ### Using Python scripts
 
 Another way of using `piglot` is via its package and Python modules.
-This approach may offer increase flexibility in the setup of the optimisation problem, at the cost of increased complexity and verbosity.
+This approach may offer increased flexibility in the setup of the optimisation problem, at the cost of increased complexity and verbosity.
 A sample script equivalent to the configuration file for the problem described in [the previous section](#using-configuration-files) is provided in `examples/sample_curve_fitting/config.py`, given by:
 ```python
 import os

diff --git a/examples/sample_curve_fitting_composite/description.md b/examples/sample_curve_fitting_composite/description.md
@@ -1,44 +1,61 @@
-## sample_curve_fitting_composite example
+## Curve fitting example - composite setting
 
-A simple analytical curve fitting problem using a composite strategy is included to demonstrate how to use `piglot` in the composite setting.
-In this case, we aim to fit a quadratic expression of the type $f(x) = a x^2$, using as a reference, a numerically generated reference from the expression $f(x) = 2 x^2$ (provided in the `examples/sample_curve_fitting/reference_curve.txt` file).
-We want to find the value for $a$ that better fits our reference (it should be 2).
+In curve fitting problems with a reference response, we can exploit the function composition of the objective function to drastically improve the optimisation.
+This technique has been widely explored in Bayesian optimisation (as proposed in [Astudillo and Frazier (2019)](https://doi.org/10.48550/arXiv.1906.01537)) and, for the curve fitting problem, in [Cardoso Coelho et al. (2023)](https://dx.doi.org/10.2139/ssrn.4674421).
+In `piglot`, this strategy is available out of the box and can be easily enabled.
+We now display an example of the application of this technique, based on the [simple curve fitting example](../sample_curve_fitting/description.md) (please check out that example before diving into the composite setting).
+In this example, we are heavily relying on Bayesian optimisation (if you are unfamiliar with the topic, we highly recommend checking [this [tutorial](https://doi.org/10.48550/arXiv.1807.02811) before proceeding).
 
-We run 10 iterations using the `botorch` optimiser (our interface for Bayesian optimisation), and set the parameter `a` for optimisation with bounds `[0,4]` and initial value 1.
-Our optimisation objective is the fitting of an analytical curve, with the expression `<a> * x ** 2`.
-The notation `<a>` indicates that this parameter should be optimised.
-We also define a parameterisation using the variable $x$, where we sample the function between `[-5,5]` with 100 points.
+**Note:** Composite optimisation is not supported by most optimisers.
+Currently, only Bayesian optimisation with BoTorch supports the full version of the composite objective.
 
-The particularity of this example is that a composite strategy is used to fit the response.
-The advantages of this composite Bayesian optimisation are demonstrated in [Coelho et al.]([docs/source/simple_example/best.svg](https://dx.doi.org/10.2139/ssrn.4674421)) and are two-fold: (i) more accurate posteriors for the loss function and (ii) reduced loss of information during the computation of the reduction function.
-
-In short, in the composite setting the loss function $\mathcal{L}\left(\bm{\theta}\right)$ is written as
-$
+We start by defining our objective function (or *loss* in the curve fitting scenario) $\mathcal{L}\left(\bm{\theta}\right)$ as
+$$
     \mathcal{L}\left(\bm{\theta}\right)
     =
     \hat{\mathcal{L}}\left(\bm{e}\left(\bm{\theta}\right)\right)
     =
     \dfrac{1}{N}
     \sum_{i=1}^{N}
-    \left[e_i\left(\bm{\theta}\right)\right]^2,
-$
-where $\bm{e}$ is a vector containing the pointwise errors at every reference point, $\hat{\mathcal{L}}\left(\bullet\right)$ is the scalar reduction function applied (NMSE in this case) and $N$ is the number of reference points.
-Thus, the problem can be stated as the minimisation of a composite function $\hat{\mathcal{L}}\left(\bm{e}\left(\bm{\theta}\right)\right)$, where $\hat{\mathcal{L}}\left(\bullet\right)$ is known and $\bm{e}\left(\bm{\theta}\right)$ is unknown.
+    \left[e_i\left(\bm{\theta}\right)\right]^2
+$$
+where $\bm{e}$ is a vector containing the pointwise errors at every reference point, $\hat{\mathcal{L}}\left(\bullet\right)$ is the scalar reduction function applied (MSE in this case) and $N$ is the number of reference points.
+In other words, we are minimising the average squared error between each point of the reference and the prediction.
+Thus, the problem can be stated as the minimisation of a composite function $\hat{\mathcal{L}}\left(\bm{e}\left(\bm{\theta}\right)\right)$, where $\hat{\mathcal{L}}\left(\bullet\right)$ is known (and we can compute gradients) and $\bm{e}\left(\bm{\theta}\right)$ is "unknown" (comes from our black-box solver).
+
+When using Bayesian optimisation, we build a Gaussian process (GP) regression model of our observations.
+This surrogate model is used to choose the next potential points to evaluate.
+In the simple curve fitting example, the model is built on the loss function values, that is:
+$$
+    \mathcal{L}\left(\bm{\theta}\right)
+    \sim
+    \mathcal{GP}
+    \left(
+        \mu_i\left(\bm{\theta}\right),
+        k_i\left(\bm{\theta},\bm{\theta}'\right)
+    \right)
+$$
+According to this, the *objective function value* is assumed to follow a Gaussian distribution with known mean and variance for each point $\bm{\theta}$ in the parameter space.
+We then use this GP to build and optimise our acquisition functions of choice, as usual in Bayesian optimisation.
 
-Consider the minimisation of $\mathcal{L}\left(\bm{\theta}\right) = \hat{\mathcal{L}}\left(\bm{e}\left(\bm{\theta}\right)\right)$.
-Within this setting, we start by replacing the single-output Gaussian Process on the loss $\mathcal{L}\left(\bm{\theta}\right)$ with a multi-output Gaussian Process for the error of each reference point $e_i$, that is,
-$
+However, in the composite setting, we know that $\mathcal{L}\left(\bm{\theta}\right) = \hat{\mathcal{L}}\left(\bm{e}\left(\bm{\theta}\right)\right)$.
+Therefore, we can instead build a surrogate model for the error at each reference point $e_i\left(\bm{\theta}\right)$:
+$$
     e_i\left(\bm{\theta}\right)
     \sim
     \mathcal{GP}
     \left(
         \mu_i\left(\bm{\theta}\right),
         k_i\left(\bm{\theta},\bm{\theta}'\right)
-    \right).
-$
-Naturally, this requires feeding the optimiser with the entire response instead of a single scalar value.
-At this stage, each GP is assumed independent and the correlations between the outputs are not considered; that is, each value $e_i$ is assumed as independent and uniquely defined by the set of parameters $\bm{\theta}$.
+    \right)
+$$
+Thus, we are saying that, for a given $\bm{\theta}$, the *prediction error at each reference point* follows a Gaussian distribution with known mean and variance.
+However, we need a posterior probablity distribution for the values of $\mathcal{L}\left(\bm{\theta}\right)$ to use our Bayesian optimisation tools, which we cannot derive for generic functions $\hat{\mathcal{L}}\left(\bullet\right)$!
+The solution to this problem is Monte Carlo sampling - we draw samples from the posteriors of $e_i\left(\bm{\theta}\right)$ and then evaluate them through $\hat{\mathcal{L}}\left(\bullet\right)$.
+The resulting sample values should follow the posterior distribution for $\mathcal{L}\left(\bm{\theta}\right)$ and we can use them to compute approximate acquisition function values.
+We leverage BoTorch's quasi-Monte Carlo acquisition functions for this task, and you can read more details on the entire procedure in [Cardoso Coelho et al. (2023)](https://dx.doi.org/10.2139/ssrn.4674421).
 
+Putting all the mathematical details aside, it is extremely simple to use the composite setting in `piglot`.
 The configuration file (`examples/sample_curve_fitting_composite/config.yaml`) for this example is:
 ```yaml
 iters: 10
@@ -63,27 +80,16 @@ objective:
     'reference_curve.txt':
       prediction: ['case_1']
 ```
-The composite strategy is activated by setting ```composite: True```.
-
-To run this example, open a terminal inside the `piglot` repository, enter the `examples/sample_curve_fitting_composite` directory and run piglot with the given configuration file
-```bash
-cd examples/sample_curve_fitting_composite
-piglot config.yaml
-```
-You should see an output similar to
+The composite strategy is activated by setting `composite: True`.
+Running this example, you should see an output similar to
 ```
 BoTorch: 100%|████████████████████████████████████████| 10/10 [00:01<00:00,  7.94it/s, Loss: 5.6009e-08]
 Completed 10 iterations in 1s
 Best loss:  5.60089334e-08
 Best parameters
 - a:     1.999685
 ```
-It is observed that piglot correctly identifies the `a` parameter close to the expected value of 2, and the error of the fitting is in the order of $10^{-8}$.
-As this example is quite simple, there is no great advantage of using the composite strategy, as the simple Bayesian optimisation is already able of finding accurate solutions within few function evaluations.
-
-
-To visualise the optimisation results, use the `piglot-plot` utility.
-In the same directory, run for instance
-```bash
-piglot-plot best config.yaml
-```
+It is observed that `piglot` correctly identifies the `a` parameter close to the expected value of 2, and the error of the fitting is in the order of $10^{-8}$.
+While this objective function is sufficiently simple that the simple and composite strategies behave similarly, for problems with multiple parameters and complex objective functions, it is well recognised that the composite setting drastically outperforms the direct optimisation of the objective function (check [Cardoso Coelho et al. (2023)](https://dx.doi.org/10.2139/ssrn.4674421) for examples).
+However, note that the computational cost per iteration is significantly higher in the composite setting and, crucially, it is proportional to the number of points in the reference response.
+Check the tutorial on the [reduction of reference points](../reference_reduction_composite/description.md) for additional details on this topic and for strategies to mitigate this issue.
diff --git a/examples/sample_curve_fitting_stochastic/description.md b/examples/sample_curve_fitting_stochastic/description.md
@@ -1,7 +1,7 @@
 ## Stochastic curve fitting example
 
 A simple analytical curve fitting problem with noise in the input data is included to demonstrate how to use `piglot` to optimise stochastic objectives.
-Like the in the [sample curve fitting example](../sample_curve_fitting/description.md), we are trying to fit using a numerically generated reference response from the expression $f(x) = 2 x^2$ (provided in the `examples/sample_curve_fitting_stochastic/reference_curve.txt` file).
+Like in the [sample](../sample_curve_fitting/description.md) curve fitting example](../sample_curve_fitting/description.md), we are trying to fit using a numerically generated reference response from the expression $f(x) = 2 x^2$ (provided in the `examples/sample_curve_fitting_stochastic/reference_curve.txt` file).
 
 **Note:** Stochastic optimisation is not supported by most optimisers.
 Currently, only Bayesian optimisation with BoTorch supports the full version of the stochastic objective.
@@ -16,7 +16,7 @@ $$
 $$
 We can compute the scalar loss function with respect to the parameter optimise for each curve individually: $\mathcal{L}_1(a)$ and $\mathcal{L}_2(a)$.
 With these, we want to optimise a stochastic function $\mathcal{L}(a)$ which, for every parameter $a$, is assumed to follow a normal distribution with known mean $\mu(a)$ and variance $\sigma^2(a)$, that is, $\mathcal{L}(a) \sim \mathcal{N}\left(\mu(a), \sigma^2(a)\right)$.
-These quantities are constructed using mean of the two functions $\mathcal{L}_1(a)$ and $\mathcal{L}_2(a)$ and its respective standard error:
+These quantities are constructed using the mean of the two functions $\mathcal{L}_1(a)$ and $\mathcal{L}_2(a)$ and its respective standard error:
 $$
 \begin{aligned}
 &\mu(a) = \dfrac{1}{N}\sum_{i=1}^{N} \mathcal{L}_i(a) = \dfrac{\mathcal{L}_1(a)+\mathcal{L}_2(a)}{2} \\
@@ -26,7 +26,7 @@ $$
 Note that we use the (squared) standard error of the two functions as the variance of our stochastic model.
 This allows us to establish confidence intervals for the mean and, importantly, is the standard approach to simulate observation noise in our function evaluations.
 
-**Note:** This procedure optimises the mean of the *loss functions*, not the mean response. If you wish to optimise the later, check the example on [composite stochastic optimisation](../sample_curve_fitting_stochastic_composite/description.md).
+**Note:** This procedure optimises the mean of the *loss functions*, not the mean response. If you wish to optimise the latter, check the example on [composite stochastic optimisation](../sample_curve_fitting_stochastic_composite/description.md).
 
 The good news: `piglot` automatically handles computing these quantities for supported objectives.
 The configuration file (`examples/sample_curve_fitting_stochastic/config.yaml`) for this example is:
@@ -104,6 +104,6 @@ Best loss:  1.73143625e-01
 Best parameters
 - a:     1.199068
 ```
-However the optimiser is not aware of the variance of the objective function.
+However, the optimiser is not aware of the variance of the objective function.
 Similarly, if you run `piglot-plot`, you should only expect a single figure with both responses:
 ![Best case plot, with the individual responses](../../docs/source/simple_stochastic_example/best_0.svg)
diff --git a/examples/sample_curve_fitting_stochastic_composite/description.md b/examples/sample_curve_fitting_stochastic_composite/description.md
@@ -16,7 +16,7 @@ These terms were computed from the individual loss functions of $f_1$ and $f_2$.
 
 However, under composition, we instead assume that, for a fixed $a$, the *response* at every point $x$ of the curve follows a normal distribution with known mean $\mu(x)$ and variance $\sigma^2(x)$, that is, $f(x) \sim \mathcal{N}\left(\mu(x), \sigma^2(x)\right)$.
 This is fundamentally different from the previous approach.
-In the simple (non-composite) strategy, we assume the scalar *objective function* (or loss, in the curve fitting case) is Gaussian; on the composite fashion, it is the *response* that is Gaussian.
+In the simple (non-composite) strategy, we assume the scalar *objective function* (or loss, in the curve fitting case) is Gaussian; in the composite fashion, it is the *response* that is Gaussian.
 We can then compute the statistical quantities in a similar fashion:
 $$
 \begin{aligned}
@@ -70,7 +70,7 @@ Best parameters
 ```
 Piglot identifies the `a` parameter as 1.326, and the error of the fitting is in the order of $10^{-6}$.
 Unlike the non-composite case, the fitting error is significantly smaller.
-Recal that, this time, we are optimising the mean response of the two functions, which gives a theoretical value of $a=4/3\approx 1.333$ for the optimum.
+Recall that, this time, we are optimising the mean response of the two functions, which gives a theoretical value of $a=4/3\approx 1.333$ for the optimum.
 Finally, plotting the best case with `piglot-plot` yields:
 ```
 Best run:
@@ -82,6 +82,6 @@ Name: 18, dtype: object
 Hash: fc36fad0fc55278da3c16dbfd9a257e42c2d81361e8650236013ab6f6426c104
 Objective:  5.28041030e-05
 ```
-![Best case plot](../../docs/source/composite_stochastic_example/best_0.svg)
-![Best case plot](../../docs/source/composite_stochastic_example/best_1.svg)
-![Best case plot](../../docs/source/composite_stochastic_example/best_2.svg)
+![Best case plot, with the individual responses](../../docs/source/composite_stochastic_example/best_0.svg)
+![Best case plot, with mean, median and standard deviation](../../docs/source/composite_stochastic_example/best_1.svg)
+![Best case plot, with the confidence interval for the mean](../../docs/source/composite_stochastic_example/best_2.svg)