bayesian.Rmd

```{r chunk_setup-bayes, include=FALSE, eval=TRUE}
knitr::opts_chunk$set(
  # cache
  cache.rebuild = F,
  cache         = T
)
```


# Bayesian Approaches

<img src="img/priorpost.png" style="display:block; margin: 0 auto; width:50%">

With mixed models we've been thinking of coefficients as coming from a distribution (normal).  While we have what we are calling 'fixed' effects, the distinguishing feature of the mixed model is the addition of this random component.  Now consider a standard regression model, i.e. no clustering.  You can actually do the same thing, i.e. assume the coefficients are not fixed, but random.  In this sense, the goal is to understand that distribution, and focus on it, rather than just the summary of it, like the mean.  However, the mean (or other central tendency) of that distribution can be treated like you've been doing the fixed effects in your standard models.

Thus you can use how you've been thinking about the random effects in mixed models as a natural segue to the Bayesian approach for any model, where all parameters are random draws from a distribution.  Using Bayesian versions of your favorite models takes no more syntactical effort than your standard models.  The following is a standard linear regression and a mixed model in the <span class="pack">brms</span> package, but would likewise be the same for <span class="pack" style = "">rstanarm</span>, two very popular packages for Bayesian estimation that use Stan under the hood.

```{r syntax, eval=FALSE}
brms::brm(gpa ~ occasion, data = gpa)
brms::brm(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

rstanarm::stan_lm(gpa ~ occasion, data = gpa)
rstanarm::stan_lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
```

So running the Bayesian models is not only as easy, the syntax is identical!  Furthermore, just like mixed models allowed you to understand your data more deeply, the Bayesian models have the potential to do the same. Even the probabilities and intervals make more sense compared to our usual `frequentist` approaches.  With <span class="pack">rstanarm</span> and especially <span class="pack">brms</span>, you can do fairly complex models, taking you further than the standard mixed model packages, all without learning how to code the models explicitly in Stan, the probabilistic programming language that both are based on.  However, when you get to that point, the modeling possibilities are only limited by your imagination.

You will have to learn a new inferential framework, as well as some of the nuances of the Markov Chain Monte Carlo (MCMC) approach.  But you may be surprised to find that the basics come more easily than you would anticipate.  Using tools like <span class="pack">brms</span> and related make it easier than ever to dive into Bayesian data analysis, and you've already been in a similar mindset with mixed models, so try it out some time.  I have an [introduction to Baysian analysis with Stan](https://m-clark.github.io/bayesian-basics/), and a bit more on the Bayesian approach and mixed models in this [document](https://m-clark.github.io/docs/mixedModels/mixedModels.html#mixed_model_7:_bayesian_mixed_model).


## Priors

The following information about priors assumes some background knowledge of Bayesian analysis, particularly for regression models. The Stan development group offers recommendations [here](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations), so refer to it often.  Note that Stan does not require conjugacy, in contrast to tools such as BUGS/JAGS.  This frees one up to use other prior distributions as they see fit. Generally though, using some normal distribution for the fixed effects, and the package defaults for variance components, should suffice for the standard models we've been discussing.


### Fixed effects

For fixed effect regression coefficients, normal and student t would be the most common prior distributions, but the default <span class="pack">brms</span> (and <span class="pack">rstanarm</span>) implementation does not specify any, and so defaults to a uniform/improper prior, which is a poor choice.  You will want to set this for your models. Note that scaling numeric predictors benefits here just like it does with <span class="pack">lme4</span>, and makes specifying the prior easier as well, since you have a better idea about the range of plausible values for the estimate.


### Variance components

In Bayesian linear mixed models, the random effects are estimated parameters, just like the fixed effects (and thus are not BLUPs). The benefit to this is that getting interval estimates for them, or predictions using them, is as easy as anything else.  Typically priors for variance components are half-t or similar, as the values can only be positive, but beyond that, e.g. intercept and slope correlations, you can again just rely on the package defaults.

To make this more explicit, let's say we have a situation with random intercepts and slopes, with variances 1 and .1 respectively, with a .3 correlation.  The random effects, say for 10 clusters, would come from a multivariate distribution as follows.

```{r demo-re}
re_cov = matrix(c(1, .3, .3, .1), ncol = 2)
re_cov

mvtnorm::rmvnorm(10, mean = c(0, 0), sigma = re_cov)
```

The priors in the model for the variance components would regard this correlation matrix, and the estimated random effects from that would be explicitly added to the linear predictor, as we showed in the [beginning][Initial depiction].


## Demonstration

Let's return to our GPA model. I will add the priors for the fixed effects, and an option to speed computation by parallelizing the chains.

```{r brms-gpa, results='hide'}
library(brms)

pr = prior(normal(0, 1), class = 'b')

bayesian_mixed = brm(
  gpa ~ occasion + (1 + occasion | student), 
  data  = gpa,
  prior = pr,
  cores = 4
)
```

Before getting too far, we can see more explicitly what our priors were. Though the result looks complicated, a. you didn't have to worry about setting everything and these are likely fine for your situations, and b., it's not as bad as it seems once you start to get the hang of things. But for example, along with the prior we just set, we have student-t priors for the variances (on the standard deviation scale).

```{r brms-gpa-prior-summary}
prior_summary(bayesian_mixed)
```

Now for the model summary.

```{r brms-gpa-summary}
summary(bayesian_mixed)
```


Compare to our previous results.

```{r old-results}
summary(gpa_mixed, cor= F)
```

Aside from additional diagnostic information, the Bayesian results are essentially the same, but now we can continue to explore the model. The <span class="pack" style = "">brms</span> package tries to use the same function names as <span class="pack" style = "">lme4</span> where possible, so <span class="func" style = "">ranef</span>, <span class="func" style = "">fixef</span>, <span class="func" style = "">VarCorr</span>, etc. are still in play.  However, you can still use my <span class="pack" style = "">mixedup</span> functions for standard models, which will return tidy data frames.

```{r brms-ranef}
# examine random effects with the usual functions, not too tidy
# ranef(bayesian_mixed)
mixedup::extract_random_effects(bayesian_mixed)
```

However, we also have some nice plotting functions.  Here I plot the occasion effect

```{r brms-cond}
conditional_effects(bayesian_mixed)
```

Here are estimated predictions from the model (multiple posterior draws) vs. our observed GPA values.

```{r brms-pp}
pp_check(bayesian_mixed)
```

There is a lot more modeling we can do here as we'll see shortly, but it's important to know you can do the basics easily.  


## Example Models

In the following I attempt to show a wide variety of (mixed) models one could do with <span class="pack">brms</span>.  Typically is shown the modeling function <span class="func">brm</span>, where the syntax is <span class="pack">lme4</span>-like.  Elsewhere I use the specific <span class="func">bf</span> function, which allows one to build a potentially complicated formula as a separate object to be used in the eventual modeling function.  For example.

```{r brms-basics, eval=FALSE}
brm(y ~ x, data = mydata, family = gaussian)

f = bf(y ~ x)

brm(f, ...)
```


#### Standard mixed models

Random intercept.

```{r brms-ranint, eval=FALSE}
brm(y ~ x + z + (1 | g))
```

Random intercept and random coefficient for `x`.

```{r brms-ranslope, eval=FALSE}
brm(y ~ x + z + (1 + x | g))
```

Multiple grouping structure/random effects.

```{r brms-ranef-more, eval=FALSE}
brm(y ~ x + z + (1 | g1)  + (1 | g2))
brm(y ~ x + z + (1 | g1 + g2))  # same thing

brm(y ~ x + z + (1 | g1)  + (1 | g1:g2))
```

#### Other distributional families

Multiple types of ordinal models including 'generalized' or 'varying coefficients' models that include category specific effects.

```{r brms-ordinal, eval=FALSE}
brm(y ~ x + z + (1 | g), family = cumulative)

# x has category specific effects
brm(y ~ cs(x) + z + (1 | g), family = acat)  

# for ordered predictors, see the mo() function.
```

Multinomial.  Uses the categorical distribution for a standard multicategory target.

```{r brms-multinom, eval=FALSE}
brm(y ~ x + z + (1 | g), family = categorical)
```

Zero-inflated and hurdle models.

```{r brms-zero, eval=FALSE}
brm(
  y  ~ x + z + (1 | g), 
  zi ~ x + z, 
  family = zero_inflated_negbinomial(link = 'log')
)

brm(y ~ x + z + (1 | g), family = hurdle_lognormal)
```

Many more including <span class="emph">weibull</span>, <span class="emph">student t</span>, <span class="emph">beta</span>, <span class="emph">skew normal</span>, <span class="emph">von mises</span>, and more.


#### Residual structure and heterogeous variances

Various functions exist to model temporal, spatial and other residual structure.

```{r brms-ar, eval=FALSE}
brm(y ~  time +  (1 + time | g) +   ar(time, person, p = 2))
```

We can model the (observation level) variance just like anything else.

```{r brms-hetvar, eval=FALSE}
brm(y ~ x + z + (1 | g), 
    sigma ~ x + (1 | g))
```

We can allow the variance components themselves to vary by some group.  In the following we'd have separate variances for male and female.

```{r brms-grouped-vc, eval=FALSE}
brm(count ~ Sex + (1|gr(g, by = Sex)))
```

Multi-membership models, where individuals may belong to more than one cluster can also be used.  In the following, `g1` and `g2` are identical conceptually, but may take on different values for some observations.

```{r brms-multimem, eval=FALSE}
brm(y ~ 1 + (1 | mm(g1, g2))) 
```


#### Multivariate mixed models

For multiple outcomes we can allow random effects to be correlated.  In the following, `ID1` is an arbitrary label that serves to connect/correlate the modeled random effects across multiple outcomes `y1` and `y2`.  In SEM literature this would be akin to a parallel process model if we add a random slope for a time indicator variable.

```{r brms-multivariate, eval=FALSE}
bf(
  y1 ~ x + z + (1 | ID1 |g),
  y2 ~ x + z + (1 | ID1 |g)
)

```

Such an approach would also make sense for zero-inflated models for example, where we want random effects for the same clustering to be correlated for both the count model and zero-inflated model.

```{r brms-mv-zero, eval=FALSE}
bf(y  ~ x * z + (1 + x | ID1 | g), 
   zi ~ x + (1 | ID1 | g))
```


#### Additive mixed models

Much of the basic functionality of <span class="pack">mgcv</span> is incorporated for generalized additive models, and works with the same syntax.

```{r brms-gam, eval=FALSE}
brm(y ~ s(x) + z + (1 | g))
```

#### Nonlinear mixed models

We can model similar situations where the functional form is known, as with <span class="pack">nlme</span>.

```{r, eval=FALSE}
bf(
  y  ~ a1 - a2^x, 
  a1 ~ 1, 
  a2 ~ x + (x | g), 
  nl = TRUE
)
```

#### Censored and truncated targets

For censored data, just supply the censoring variables as you would typically note in a survival/event-history model.

```{r brms-cens, eval=FALSE}
bf(y | cens(censor_variable) ~ x + z + (1 | g), family = lognormal)  # frailty

# see also stan_jm in the rstanarm package for joint models
```

For truncated models, specify the lower bound, upper bound, or both.

```{r brms-trunc, eval=FALSE}
brm(count | trunc(ub = 10) ~ x * z + (1 | g), family = poisson)
```

#### Measurment error

There may be cases where we know one variable is assumed to be measured with error, such as the mean of several trials, or latent variables estimated by some other means.  In the following, `sdx` is the known standard deviation for x, which may be constant or vary by observation.

```{r brms-meas-err, eval=FALSE}
brm(y ~ me(x, sdx) + z + (1 | g))
```

#### Mixture models

Two clusters specified by multiple families along with `mixture`. So I guess this is technically a mixture mixed model.

```{r brms-mixture, eval=FALSE}
brm(y ~ x + z + (1 | g), family = mixture(gaussian, gaussian))
```

A 'growth mixture model'.

```{r brms-growth-mix, eval=FALSE}
brm(y ~ time + z + (1 + time | g), family = mixture(gaussian, gaussian))
```


#### Missing Values

We can construct the model formula for missing values as follows, including using a mixed model as the imputation model (for `x`).

```{r, eval=FALSE}
f = 
  bf(y ~ mi(x) + z + (1 | g)) +
  bf(x | mi() ~ z + (1 | g)) + 
  set_rescor(FALSE)
```


## Beyond the Model

The development of Stan and packages like <span class="pack">rstanarm</span> and <span class="pack">brms</span> is rapid, and with the combined powers of those involved, there are a lot of useful tools for exploring the model results.  Even if one found a specialty package for a specific type of mixed model, it is doubtful you would have as many tools for model exploration such as posterior predictive checks, marginal effects, model comparison, basic model diagnostics and more.  That said, the Stan ecosystem of R packages is notable at this point, and so use what works for your situation. For mixed models specifically, you should turn to these tools if <span class="pack" style = "">lme4</span> can't do it for you.