diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/.nojekyll @@ -0,0 +1 @@ + diff --git a/404.html b/404.html new file mode 100644 index 00000000..9eb81746 --- /dev/null +++ b/404.html @@ -0,0 +1,89 @@ + + +
+ + + + +YEAR: 2021 +COPYRIGHT HOLDER: Imperial College of Science, Technology and Medicine ++ +
This vignette outlines errors that might be generated when parsing +monty DSL code, with more explanation about the error and how they can +be avoided. Don’t read this top to bottom as it’s quite boring! However, +if we get errors that benefit from more explanation about why they’ve +been thrown then we’ll expand on the contents here and arrange for these +to be linked from the thrown error directly.
+The error numbers are arbitrary after the first digit. The first +digit will correspond to different phases of the parsing:
+E1xx
- errors during parsing of individual
+expressionsE2xx
- errors when considering the system as a
+wholeE101
+We found an expression that is neither an assignment (with
+<-
) or a stochastic relationship (with
+~
)
Example:
+a + 1
+E102
+Invalid left hand side of a relationship operator (~
).
+Currently the left hand side must be symbol, though this will be relaxed
+in the future once we support an array syntax. However, you may not use
+things like numbers or function calls on the left hand side.
Example:
+1 <- 2
+f(x) <- g(y)
+E103
+Your distribution call failed to parse. This can fail for many
+reasons, and the details of the failure come from
+monty::monty_dsl_parse_distribution
Example reasons for failure include the rhs being:
+compare(x) ~ 1
+compare(x) ~ sqrt(2)
)compare(x) ~ Normal(0, 1, 2)
)The details for the failure will be included in the body of the error +message.
+E201
+Duplicated relationships (with ~
).
Example:
+a ~ Normal(0, 1)
+b ~ Uniform(0, 1)
+a ~ Exponential(1) # <= error here
+Relationships must be unique because each represents a parameter, and +a parameter can’t be represented by two different distributions.
+E202
+Duplicated assignments (with <-
). This is similar to
+E201
Example:
+a <- 1
+b <- 2
+a <- 3 # <= error here
+Assignments must be unique within the DSL code because this makes it +straightforward to trace usage through the dependency graph and from +this create a gradient function using automatic differentiation.
+This restriction means that you cannot reassign a value either, so +this is an error:
+a <- 1
+b <- 10
+a <- a + b # <= error here
+E203
+A relationship (with ~
) is shadowing a previous
+assignment. So after assigning to a variable you have declared that the
+same symbol refers to a parameter.
Example:
+a <- 1
+a ~ Normal(0, 1) # <= error here
+E204
+An assignment (with <-
) is shadowing a previous
+relationship.
Example:
+a ~ Normal(0, 1)
+a <- 10
+E205
+Variables are used out of order. If you are using odin this is a big +departure - at the moment you must declare your expressions (assignments +and relationships) in order. However, because we forbid multiple +assignment we may relax this in the future, but no existing programs +should be changed.
+E206
+Failed to differentiate the model. This error will only be seen where +it was not possible to differentiate your model but you requested that a +gradient be available. Not all functions supported in the DSL can +currently be differentiated by monty; if you think that yours should be, +please let us know.
+monty
includes a simple probabilistic domain-specific
+language (DSL) that is inspired by stan
and Statistical Rethinking. It is designed
+to make some tasks a bit easier, particularly when defining priors for
+your model. We expect that this DSL is not sufficiently advanced to
+represent most interesting models but it may get more clever and
+flexible in the future. In particular we do not expect the DSL to be
+useful in writing likelihood functions for comparison to data; we expect
+that if your model is simple enough for this you would be better off
+using stan
or some similarly flexible system.
In chapter 4 of Statistical Rethinking, we build a regression model +of height with parameters +, + +and +. +We can define the model for the prior probability of this model in monty +by running
+
+prior <- monty_dsl({
+ alpha ~ Normal(178, 20)
+ beta ~ Normal(0, 10)
+ sigma ~ Uniform(0, 50)
+})
This will define a new monty_model()
object that
+represents the prior, but with all the bits that we might need depending
+on how we want to use it:
We have model parameters
+
+prior$parameters
+#> [1] "alpha" "beta" "sigma"
These are defined in the order that they appear in your definition
+(so alpha
is first and sigma
is last)
We can compute the domain for your model:
+
+prior$domain
+#> [,1] [,2]
+#> alpha -Inf Inf
+#> beta -Inf Inf
+#> sigma 0 50
We can draw samples from the model if we provide a
+monty_rng
object
+rng <- monty_rng$new()
+theta <- prior$direct_sample(rng)
+theta
+#> [1] 167.700794 -5.085348 8.899977
We can compute the (log) density at a point in parameter space
+
+prior$density(theta)
+#> [1] -11.31011
The computed properties for the model are:
+
+prior$properties
+#>
+#> ── <monty_model_properties> ────────────────────────────────────────────────────
+#> • has_gradient: `TRUE`
+#> • has_direct_sample: `TRUE`
+#> • is_stochastic: `FALSE`
+#> • has_parameter_groups: `FALSE`
+#> • has_observer: `FALSE`
+#> • allow_multiple_parameters: `TRUE`
Sometimes it will be useful to perform calculations in the code; you +can do this with assignments. Most trivially, giving names to numbers +may help make code more understandable:
+
+m <- monty_dsl({
+ mu <- 10
+ sd <- 2
+ a ~ Normal(mu, sd)
+})
You can also use this to do things like:
+
+m <- monty_dsl({
+ a ~ Normal(0, 1)
+ b ~ Normal(0, 1)
+ mu <- (a + b) / 2
+ c ~ Normal(mu, 1)
+})
Where c
is drawn from a normal distribution with a mean
+that is the average of a
and b
.
You can also pass in a list of data with values that should be +available in the DSL code. For example, our first example:
+
+prior <- monty_dsl({
+ alpha ~ Normal(178, 20)
+ beta ~ Normal(0, 10)
+ sigma ~ Uniform(0, 50)
+})
Might be written as
+
+fixed <- list(alpha_mean = 170, alpha_sd = 20,
+ beta_mean = 0, beta_sd = 10,
+ sigma_max = 50)
+prior <- monty_dsl({
+ alpha ~ Normal(alpha_mean, alpha_sd)
+ beta ~ Normal(beta_mean, beta_sd)
+ sigma ~ Uniform(0, sigma_max)
+}, fixed = fixed)
Values you pass in this way are fixed (hence the +name!) and cannot be modified after the model object is created.
+This vignette is an introduction to some of the ideas in
+mcstate2
. As the package is not ready for users to use it,
+this is more a place for us to jot down ideas about what it will do.
We’re going to change the package name in the future (dropping the
+2
at least, but possibly also changing it entirely). It’s
+not yet clear to us if this will be a package that users will use or
+something that the rest of our stack will target.
Draw samples from a model using Markov Chain Monte Carlo methods. To +do this you will need:
+mcstate_model()
+object and minimally knows about the names of its parameter vector
+(which is an unstructured real-valued vector) and can compute a log
+probability density. It may also be able to compute a gradient of this
+log density, or sample directly from parameter space (e.g., if it
+represents a prior distribution).mcstate_sampler_random_walk()
, which implements a simple
+Metropolis algorithm random walk.The system is designed to be composable; you can work in a Bayesian +way by defining a model representing a likelihood and another model +representing a prior and then pick a sampler based on the capabilities +of the model, and pick a runner based on the capabilities your +computer.
+The mcstate_model()
interface is designed to be very
+flexible but not user-friendly. We expect to write a higher-level
+interface to help work with this, and describe how to write wrappers for
+models implemented in other packages (so you might write a model in dust or odin and an adaptor would make
+it easy to work with the tools provided by mcstate2
to
+start making inferences with your model).
Before starting the example, it’s worth noting that there are far +better tools out there to model this sort of thing (stan, bugs, jags, R +itself - really anything). The aim of this section is to derive a simple +model that may feel familiar. The strength of the package is when +performing inference with custom models that can’t be expressed in these +high level interfaces.
+
+head(data)
+#> height weight
+#> 1 162.5401 45.92805
+#> 2 159.9566 51.19368
+#> 3 156.1808 44.56841
+#> 4 168.4164 60.36933
+#> 5 158.6978 52.14180
+#> 6 154.7666 44.66696
+plot(height ~ weight, data)
A simple likelihood, following the model formulation in “Statistical +Rethinking” chapter 3; height is modelled as normally distributed +departures from a linear relationship with weight.
+
+likelihood <- mcstate_model(
+ list(
+ parameters = c("a", "b", "sigma"),
+ density = function(x) {
+ a <- x[[1]]
+ b <- x[[2]]
+ sigma <- x[[3]]
+ mu <- a + b * data$weight
+ sum(dnorm(data$height, mu, sigma, log = TRUE))
+ }))
The prior we’ll make much nicer to work with in the future, but here
+we construct the density by hand as a sum of normally distributed priors
+on a
and b
, and a weak uniform prior on
+sigma
. We provide a direct_sample
function
+here so that we can draw samples from the prior distribution
+directly.
+prior <- local({
+ a_mu <- 178
+ a_sd <- 100
+ b_mu <- 0
+ b_sd <- 10
+ sigma_min <- 0
+ sigma_max <- 50
+ mcstate_model(
+ list(
+ parameters = c("a", "b", "sigma"),
+ density = function(x) {
+ a <- x[[1]]
+ b <- x[[2]]
+ sigma <- x[[3]]
+ dnorm(a, a_mu, a_sd, log = TRUE) +
+ dnorm(b, b_mu, b_sd, log = TRUE) +
+ dunif(sigma, sigma_min, sigma_max, log = TRUE)
+ },
+ direct_sample = function(rng) {
+ c(rng$normal(1, a_mu, a_sd),
+ rng$normal(1, b_mu, b_sd),
+ rng$uniform(1, sigma_min, sigma_max))
+ },
+ domain = rbind(c(-Inf, Inf), c(-Inf, Inf), c(0, Inf))
+ ))
+})
The posterior distribution is the combination of these two models
+(indicated with a +
because we’re adding on a log-scale, or
+because we are using prior
and
+posterior
; you can use mcstate_model_combine()
+if you prefer).
+posterior <- likelihood + prior
Constructing a sensible initial variance-covariance matrix is a bit +of a trick, and using an adaptive sampler reduces the pain here. These +values are chosen to be reasonable starting points.
+
+vcv <- rbind(c(4.5, -0.088, 0.076),
+ c(-0.088, 0.0018, -0.0015),
+ c(0.076, -0.0015, 0.0640))
+sampler <- mcstate_sampler_random_walk(vcv = vcv)
Now run the sampler. We’ve started from a good starting point to make +this simple sampler converge quickly:
+
+samples <- mcstate_sample(posterior, sampler, 5000, initial = c(114, 0.9, 3),
+ n_chains = 4)
We don’t yet have tools for working with the samples objects, but we +can see the density over time easily enough:
+
+matplot(t(samples$density), type = "l", lty = 1,
+ xlab = "log posterior density", ylab = "sample", col = "#00000055")
And plots of our estimated parameters:
+ + +This vignette is an introduction to some of the ideas in
+monty
. Please note that the interface is not yet stable and
+some function names, arguments and ideas will change before the first
+generally usable version.
Draw samples from a model using Markov Chain Monte Carlo methods. To +do this you will need:
+monty_model()
+object and minimally knows about the names of its parameter vector
+(which is an unstructured real-valued vector) and can compute a log
+probability density. It may also be able to compute a gradient of this
+log density, or sample directly from parameter space (e.g., if it
+represents a prior distribution).monty_sampler_random_walk()
, which implements a simple
+Metropolis algorithm random walk.The system is designed to be composable; you can work in a Bayesian +way by defining a model representing a likelihood and another model +representing a prior and then pick a sampler based on the capabilities +of the model, and pick a runner based on the capabilities your +computer.
+The monty_model()
interface is designed to be very
+flexible but not user-friendly. We expect to write a higher-level
+interface to help work with this, and describe how to write wrappers for
+models implemented in other packages (so you might write a model in dust or odin and an adaptor would make
+it easy to work with the tools provided by monty
to start
+making inferences with your model).
Before starting the example, it’s worth noting that there are far +better tools out there to model this sort of thing (stan, bugs, jags, R +itself - really anything). The aim of this section is to derive a simple +model that may feel familiar. The strength of the package is when +performing inference with custom models that can’t be expressed in these +high level interfaces.
+
+head(data)
+#> height weight
+#> 1 162.5401 45.92805
+#> 2 159.9566 51.19368
+#> 3 156.1808 44.56841
+#> 4 168.4164 60.36933
+#> 5 158.6978 52.14180
+#> 6 154.7666 44.66696
+plot(height ~ weight, data)
A simple likelihood, following the model formulation in “Statistical +Rethinking” chapter 3; height is modelled as normally distributed +departures from a linear relationship with weight.
+
+fn <- function(a, b, sigma, data) {
+ mu <- a + b * data$weight
+ sum(dnorm(data$height, mu, sigma, log = TRUE))
+}
We can wrap this density function in a monty_model
. The
+data
argument is “fixed” - it’s not part of the statistical
+model, so we’ll pass that in as the fixed
argument:
+likelihood <- monty_model_function(fn, fixed = list(data = data))
+likelihood
+#>
+#> ── <monty_model> ───────────────────────────────────────────────────────────────
+#> ℹ Model has 3 parameters: 'a', 'b', and 'sigma'
+#> ℹ See `?monty_model()` for more information
We construct a prior for this model using the monty DSL
+(vignette("dsl")
), using normally distributed priors on
+a
and b
, and a weak uniform prior on
+sigma
.
+prior <- monty_dsl({
+ a ~ Normal(178, 100)
+ b ~ Normal(0, 10)
+ sigma ~ Uniform(0, 50)
+})
+prior
+#>
+#> ── <monty_model> ───────────────────────────────────────────────────────────────
+#> ℹ Model has 3 parameters: 'a', 'b', and 'sigma'
+#> ℹ This model:
+#> • can compute gradients
+#> • can be directly sampled from
+#> • accepts multiple parameters
+#> ℹ See `?monty_model()` for more information
The posterior distribution is the combination of these two models
+(indicated with a +
because we’re adding on a log-scale, or
+because we are using prior
and
+posterior
; you can use monty_model_combine()
+if you prefer).
+posterior <- likelihood + prior
+posterior
+#>
+#> ── <monty_model> ───────────────────────────────────────────────────────────────
+#> ℹ Model has 3 parameters: 'a', 'b', and 'sigma'
+#> ℹ This model:
+#> • can be directly sampled from
+#> ℹ See `?monty_model()` for more information
Constructing a sensible initial variance-covariance matrix is a bit +of a trick, and using an adaptive sampler reduces the pain here. These +values are chosen to be reasonable starting points.
+
+vcv <- rbind(c(4.5, -0.088, 0.076),
+ c(-0.088, 0.0018, -0.0015),
+ c(0.076, -0.0015, 0.0640))
+sampler <- monty_sampler_random_walk(vcv = vcv)
Now run the sampler. We’ve started from a good starting point to make +this simple sampler converge quickly:
+
+samples <- monty_sample(posterior, sampler, 2000, initial = c(114, 0.9, 3),
+ n_chains = 4)
+#> Sampling [▁▁▁▁] ■ | 0% ETA: 41s
+#> ✔ Sampled 8000 steps across 4 chains in 833ms
+#>
We don’t aim to directly provide tools for visualising and working +with samples, as this is well trodden ground in other packages. However, +we can directly plot density over time:
+
+matplot(samples$density, type = "l", lty = 1,
+ xlab = "log posterior density", ylab = "sample", col = "#00000055")
And plots of our estimated parameters:
+
+par(mfrow = c(1, 3))
+plot(density(samples$pars["a", , ]), main = "a")
+abline(v = 114, col = "red")
+plot(density(samples$pars["b", , ]), main = "b")
+abline(v = 0.9, col = "red")
+plot(density(samples$pars["sigma", , ]), main = "sigma")
+abline(v = 3, col = "red")
If you have coda
installed you can convert these samples
+into a coda
mcmc.list
using
+coda::as.mcmc.list()
, and if you have
+posterior
installed you can convert into a
+draws_df
using posterior::as_draws_df()
, from
+which you can probably use your favourite plotting tools.
See vignette("samplers")
for more information.
This vignette will describe the samplers available in monty.
+ +This example shows HMC outperforming a random walk on a two
+dimensional banana-shaped function. Our model takes two parameters
+alpha
and beta
, and is based on two successive
+simple draws, with the one conditional on the other, so
+
+and
+,
+with
+
+the standard deviation of the conditional draw.
We include this example within the package; here we create a model +with +
+
+m <- monty_example("banana", sigma = 0.5)
+m
+#>
+#> ── <monty_model> ───────────────────────────────────────────────────────────────
+#> ℹ Model has 2 parameters: 'alpha' and 'beta'
+#> ℹ This model:
+#> • can compute gradients
+#> • can be directly sampled from
+#> • accepts multiple parameters
+#> ℹ See `?monty_model()` for more information
We can plot a visualisation of its density by computing the density +over a grid. Normally this is not possible of course:
+
+a <- seq(-2, 6, length.out = 1000)
+b <- seq(-2, 2, length.out = 1000)
+z <- outer(a, b, function(alpha, beta) {
+ exp(monty_model_density(m, rbind(alpha, beta)))
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
In this particular case we can also easily generate samples, so we +know what a good sampler will produce:
+
+rng <- monty_rng$new()
+s <- vapply(seq(200), function(x) m$direct_sample(rng), numeric(2))
+image(a, b, z, xlab = "alpha", ylab = "beta")
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
It is also possible to compute the 95% confidence interval of the +distribution using the relationship between the standard bivariate +normal distribution and the banana shaped distribution as defined above. +We can check that roughly 10 samples (out of 200) are out of this 95% CI +contour.
+
+theta <- seq(0, 2 * pi, length.out = 10000)
+z95 <- local({
+ sigma <- 0.5
+ r <- sqrt(qchisq(.95, df = 2))
+ x <- r * cos(theta)
+ y <- r * sin(theta)
+ cbind(x^2 + y * sigma, x)
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+lines(z95[, 1], z95[, 2])
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
It is not generally possible to directly sample from a density +(otherwise MCMC and similar methods would not exist!). In these cases we +need to use a sampler based on the density and if available possibly the +gradient of the density.
+We can start with a basic random-walk sampler:
+
+sampler_rw <- monty_sampler_random_walk(vcv = diag(2) * 0.01)
+res_rw <- monty_sample(m, sampler_rw, 1000)
+#> Sampling ■ | 0% ETA: 3s
+#> ✔ Sampled 1000 steps across 1 chains in 49ms
+#>
+plot(t(drop(res_rw$pars)), type = "l", col = "#0000ff66",
+ xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
As we can see this is not great, exhibiting strong random walk +behaviour as it slowly explores the surface (this is over 1,000 steps). +Another way to view this is the parameters varying over steps:
+matplot(t(drop(res_rw$pars)), lty = 1, type = "l", col = c(2, 4),
+ xlab = "Step", ylab = "Value")
+We can probably improve the samples here by finding a better variance +covariance matrix (VCV), but a single VCV will not hold well over the +whole surface because it is not very similar to a multivariate normal +(that is, the appropriate VCV will change depending on the position in +parameter space)
+Let’s try the Hamiltonian Monte Carlo (HMC) sampler, which uses the +gradient to move efficiently in parameter space:
+
+sampler_hmc <- monty_sampler_hmc(epsilon = 0.1, n_integration_steps = 10)
+res_hmc <- monty_sample(m, sampler_hmc, 1000)
+plot(t(drop(res_hmc$pars)), type = "l", col = "#0000ff33",
+ xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
or viewed over steps:
+
+matplot(t(drop(res_hmc$pars)), lty = 1, type = "l", col = c(2, 4),
+ xlab = "Step", ylab = "Value")
Clearly better!
+This vignette outlines some ways of working with samples generated
+from running an MCMC with monty_sample
. We’ll flesh this
+out as we develop nicer ways of working with samples, and mostly exists
+so that we can point to it from other documentation, including from
+other packages (e.g., from odin2
).
All samplers and runners produce samples with the same basic
+structure. This structure is documented (here) so you are free to use
+this structure if you want to just dive in and manipulate things. Please
+treat output as read-only; extract data all you want, but make a copy
+and don’t change any value within the samples structure if you are going
+to pass it back into a monty
function, as we assume that
+they have not been modified.
Below, samples
is a poorly mixed result of samples with
+2 parameters, 2000 samples, and 4 chains. It was the result of running
+monty_sample()
on the banana model from
+vignette("samplers")
, and has class
+monty_samples
.
+samples
+#>
+#> ── <monty_samples: 2 parameters x 2000 samples x 4 chains> ─────────────────────
+#> ℹ Parameters: 'alpha' and 'beta'
+#> ℹ Conversion to other types is possible:
+#> → ! posterior::as_draws_array() [package installed, but not loaded]
+#> → ! posterior::as_draws_df() [package installed, but not loaded]
+#> → ! coda::as.mcmc.list() [package installed, but not loaded]
+#> ℹ See `?monty_sample()` and `vignette("samples")` for more information
Each monty_samples
object contains an element
+pars
which contains sampled parameters
+dim(samples$pars)
+#> [1] 2 2000 4
posterior
’s draw
+objects
+We implement methods for posterior::as_draws_array
and
+posterior::as_draws_df
, which you can use to convert to
+formats that you might be familiar with from use with other statistical
+packages. This only preserves the parameters ($pars
) from
+above.
+samples_df <- posterior::as_draws_df(samples)
With this, you can access summary methods already implemented +elsewhere:
+
+posterior::summarise_draws(samples_df)
+#> # A tibble: 2 × 10
+#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
+#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 alpha 2.27 0.645 3.06 1.39 -0.612 7.73 1.85 5.79 43.5
+#> 2 beta 0.608 0.364 1.39 1.02 -2.05 2.78 2.10 5.32 11.7
With these objects you should be able to use any of the plotting
+functions in bayesplot
, for
+example, MCMC
+visual diagnostics, without much trouble.
We also support conversion to draws_array
; let us know
+if you need the other conversion functions (as_draws_list
,
+as_draws_rvars
, as_draws_matrix
).
coda
’s mcmc.list
+objects
+The coda
package has utilities for working with MCMC,
+and many other packages are compatible with its mcmc.list
+object type (e.g., the ggmcmc
+package). We provide a method for coda
’s
+as.mcmc.list
, if that package is available:
+samples_coda <- coda::as.mcmc.list(samples)
+coda::effectiveSize(samples_coda)
+#> alpha beta
+#> 39.92519 68.47306