-
Notifications
You must be signed in to change notification settings - Fork 41
/
bayesian.Rmd
308 lines (195 loc) · 13.8 KB
/
bayesian.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
```{r chunk_setup-bayes, include=FALSE, eval=TRUE}
knitr::opts_chunk$set(
# cache
cache.rebuild = F,
cache = T
)
```
# Bayesian Approaches
<img src="img/priorpost.png" style="display:block; margin: 0 auto; width:50%">
With mixed models we've been thinking of coefficients as coming from a distribution (normal). While we have what we are calling 'fixed' effects, the distinguishing feature of the mixed model is the addition of this random component. Now consider a standard regression model, i.e. no clustering. You can actually do the same thing, i.e. assume the coefficients are not fixed, but random. In this sense, the goal is to understand that distribution, and focus on it, rather than just the summary of it, like the mean. However, the mean (or other central tendency) of that distribution can be treated like you've been doing the fixed effects in your standard models.
Thus you can use how you've been thinking about the random effects in mixed models as a natural segue to the Bayesian approach for any model, where all parameters are random draws from a distribution. Using Bayesian versions of your favorite models takes no more syntactical effort than your standard models. The following is a standard linear regression and a mixed model in the <span class="pack">brms</span> package, but would likewise be the same for <span class="pack" style = "">rstanarm</span>, two very popular packages for Bayesian estimation that use Stan under the hood.
```{r syntax, eval=FALSE}
brms::brm(gpa ~ occasion, data = gpa)
brms::brm(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
rstanarm::stan_lm(gpa ~ occasion, data = gpa)
rstanarm::stan_lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
```
So running the Bayesian models is not only as easy, the syntax is identical! Furthermore, just like mixed models allowed you to understand your data more deeply, the Bayesian models have the potential to do the same. Even the probabilities and intervals make more sense compared to our usual `frequentist` approaches. With <span class="pack">rstanarm</span> and especially <span class="pack">brms</span>, you can do fairly complex models, taking you further than the standard mixed model packages, all without learning how to code the models explicitly in Stan, the probabilistic programming language that both are based on. However, when you get to that point, the modeling possibilities are only limited by your imagination.
You will have to learn a new inferential framework, as well as some of the nuances of the Markov Chain Monte Carlo (MCMC) approach. But you may be surprised to find that the basics come more easily than you would anticipate. Using tools like <span class="pack">brms</span> and related make it easier than ever to dive into Bayesian data analysis, and you've already been in a similar mindset with mixed models, so try it out some time. I have an [introduction to Baysian analysis with Stan](https://m-clark.github.io/bayesian-basics/), and a bit more on the Bayesian approach and mixed models in this [document](https://m-clark.github.io/docs/mixedModels/mixedModels.html#mixed_model_7:_bayesian_mixed_model).
## Priors
The following information about priors assumes some background knowledge of Bayesian analysis, particularly for regression models. The Stan development group offers recommendations [here](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations), so refer to it often. Note that Stan does not require conjugacy, in contrast to tools such as BUGS/JAGS. This frees one up to use other prior distributions as they see fit. Generally though, using some normal distribution for the fixed effects, and the package defaults for variance components, should suffice for the standard models we've been discussing.
### Fixed effects
For fixed effect regression coefficients, normal and student t would be the most common prior distributions, but the default <span class="pack">brms</span> (and <span class="pack">rstanarm</span>) implementation does not specify any, and so defaults to a uniform/improper prior, which is a poor choice. You will want to set this for your models. Note that scaling numeric predictors benefits here just like it does with <span class="pack">lme4</span>, and makes specifying the prior easier as well, since you have a better idea about the range of plausible values for the estimate.
### Variance components
In Bayesian linear mixed models, the random effects are estimated parameters, just like the fixed effects (and thus are not BLUPs). The benefit to this is that getting interval estimates for them, or predictions using them, is as easy as anything else. Typically priors for variance components are half-t or similar, as the values can only be positive, but beyond that, e.g. intercept and slope correlations, you can again just rely on the package defaults.
To make this more explicit, let's say we have a situation with random intercepts and slopes, with variances 1 and .1 respectively, with a .3 correlation. The random effects, say for 10 clusters, would come from a multivariate distribution as follows.
```{r demo-re}
re_cov = matrix(c(1, .3, .3, .1), ncol = 2)
re_cov
mvtnorm::rmvnorm(10, mean = c(0, 0), sigma = re_cov)
```
The priors in the model for the variance components would regard this correlation matrix, and the estimated random effects from that would be explicitly added to the linear predictor, as we showed in the [beginning][Initial depiction].
## Demonstration
Let's return to our GPA model. I will add the priors for the fixed effects, and an option to speed computation by parallelizing the chains.
```{r brms-gpa, results='hide'}
library(brms)
pr = prior(normal(0, 1), class = 'b')
bayesian_mixed = brm(
gpa ~ occasion + (1 + occasion | student),
data = gpa,
prior = pr,
cores = 4
)
```
Before getting too far, we can see more explicitly what our priors were. Though the result looks complicated, a. you didn't have to worry about setting everything and these are likely fine for your situations, and b., it's not as bad as it seems once you start to get the hang of things. But for example, along with the prior we just set, we have student-t priors for the variances (on the standard deviation scale).
```{r brms-gpa-prior-summary}
prior_summary(bayesian_mixed)
```
Now for the model summary.
```{r brms-gpa-summary}
summary(bayesian_mixed)
```
Compare to our previous results.
```{r old-results}
summary(gpa_mixed, cor= F)
```
Aside from additional diagnostic information, the Bayesian results are essentially the same, but now we can continue to explore the model. The <span class="pack" style = "">brms</span> package tries to use the same function names as <span class="pack" style = "">lme4</span> where possible, so <span class="func" style = "">ranef</span>, <span class="func" style = "">fixef</span>, <span class="func" style = "">VarCorr</span>, etc. are still in play. However, you can still use my <span class="pack" style = "">mixedup</span> functions for standard models, which will return tidy data frames.
```{r brms-ranef}
# examine random effects with the usual functions, not too tidy
# ranef(bayesian_mixed)
mixedup::extract_random_effects(bayesian_mixed)
```
However, we also have some nice plotting functions. Here I plot the occasion effect
```{r brms-cond}
conditional_effects(bayesian_mixed)
```
Here are estimated predictions from the model (multiple posterior draws) vs. our observed GPA values.
```{r brms-pp}
pp_check(bayesian_mixed)
```
There is a lot more modeling we can do here as we'll see shortly, but it's important to know you can do the basics easily.
## Example Models
In the following I attempt to show a wide variety of (mixed) models one could do with <span class="pack">brms</span>. Typically is shown the modeling function <span class="func">brm</span>, where the syntax is <span class="pack">lme4</span>-like. Elsewhere I use the specific <span class="func">bf</span> function, which allows one to build a potentially complicated formula as a separate object to be used in the eventual modeling function. For example.
```{r brms-basics, eval=FALSE}
brm(y ~ x, data = mydata, family = gaussian)
f = bf(y ~ x)
brm(f, ...)
```
#### Standard mixed models
Random intercept.
```{r brms-ranint, eval=FALSE}
brm(y ~ x + z + (1 | g))
```
Random intercept and random coefficient for `x`.
```{r brms-ranslope, eval=FALSE}
brm(y ~ x + z + (1 + x | g))
```
Multiple grouping structure/random effects.
```{r brms-ranef-more, eval=FALSE}
brm(y ~ x + z + (1 | g1) + (1 | g2))
brm(y ~ x + z + (1 | g1 + g2)) # same thing
brm(y ~ x + z + (1 | g1) + (1 | g1:g2))
```
#### Other distributional families
Multiple types of ordinal models including 'generalized' or 'varying coefficients' models that include category specific effects.
```{r brms-ordinal, eval=FALSE}
brm(y ~ x + z + (1 | g), family = cumulative)
# x has category specific effects
brm(y ~ cs(x) + z + (1 | g), family = acat)
# for ordered predictors, see the mo() function.
```
Multinomial. Uses the categorical distribution for a standard multicategory target.
```{r brms-multinom, eval=FALSE}
brm(y ~ x + z + (1 | g), family = categorical)
```
Zero-inflated and hurdle models.
```{r brms-zero, eval=FALSE}
brm(
y ~ x + z + (1 | g),
zi ~ x + z,
family = zero_inflated_negbinomial(link = 'log')
)
brm(y ~ x + z + (1 | g), family = hurdle_lognormal)
```
Many more including <span class="emph">weibull</span>, <span class="emph">student t</span>, <span class="emph">beta</span>, <span class="emph">skew normal</span>, <span class="emph">von mises</span>, and more.
#### Residual structure and heterogeous variances
Various functions exist to model temporal, spatial and other residual structure.
```{r brms-ar, eval=FALSE}
brm(y ~ time + (1 + time | g) + ar(time, person, p = 2))
```
We can model the (observation level) variance just like anything else.
```{r brms-hetvar, eval=FALSE}
brm(y ~ x + z + (1 | g),
sigma ~ x + (1 | g))
```
We can allow the variance components themselves to vary by some group. In the following we'd have separate variances for male and female.
```{r brms-grouped-vc, eval=FALSE}
brm(count ~ Sex + (1|gr(g, by = Sex)))
```
Multi-membership models, where individuals may belong to more than one cluster can also be used. In the following, `g1` and `g2` are identical conceptually, but may take on different values for some observations.
```{r brms-multimem, eval=FALSE}
brm(y ~ 1 + (1 | mm(g1, g2)))
```
#### Multivariate mixed models
For multiple outcomes we can allow random effects to be correlated. In the following, `ID1` is an arbitrary label that serves to connect/correlate the modeled random effects across multiple outcomes `y1` and `y2`. In SEM literature this would be akin to a parallel process model if we add a random slope for a time indicator variable.
```{r brms-multivariate, eval=FALSE}
bf(
y1 ~ x + z + (1 | ID1 |g),
y2 ~ x + z + (1 | ID1 |g)
)
```
Such an approach would also make sense for zero-inflated models for example, where we want random effects for the same clustering to be correlated for both the count model and zero-inflated model.
```{r brms-mv-zero, eval=FALSE}
bf(y ~ x * z + (1 + x | ID1 | g),
zi ~ x + (1 | ID1 | g))
```
#### Additive mixed models
Much of the basic functionality of <span class="pack">mgcv</span> is incorporated for generalized additive models, and works with the same syntax.
```{r brms-gam, eval=FALSE}
brm(y ~ s(x) + z + (1 | g))
```
#### Nonlinear mixed models
We can model similar situations where the functional form is known, as with <span class="pack">nlme</span>.
```{r, eval=FALSE}
bf(
y ~ a1 - a2^x,
a1 ~ 1,
a2 ~ x + (x | g),
nl = TRUE
)
```
#### Censored and truncated targets
For censored data, just supply the censoring variables as you would typically note in a survival/event-history model.
```{r brms-cens, eval=FALSE}
bf(y | cens(censor_variable) ~ x + z + (1 | g), family = lognormal) # frailty
# see also stan_jm in the rstanarm package for joint models
```
For truncated models, specify the lower bound, upper bound, or both.
```{r brms-trunc, eval=FALSE}
brm(count | trunc(ub = 10) ~ x * z + (1 | g), family = poisson)
```
#### Measurment error
There may be cases where we know one variable is assumed to be measured with error, such as the mean of several trials, or latent variables estimated by some other means. In the following, `sdx` is the known standard deviation for x, which may be constant or vary by observation.
```{r brms-meas-err, eval=FALSE}
brm(y ~ me(x, sdx) + z + (1 | g))
```
#### Mixture models
Two clusters specified by multiple families along with `mixture`. So I guess this is technically a mixture mixed model.
```{r brms-mixture, eval=FALSE}
brm(y ~ x + z + (1 | g), family = mixture(gaussian, gaussian))
```
A 'growth mixture model'.
```{r brms-growth-mix, eval=FALSE}
brm(y ~ time + z + (1 + time | g), family = mixture(gaussian, gaussian))
```
#### Missing Values
We can construct the model formula for missing values as follows, including using a mixed model as the imputation model (for `x`).
```{r, eval=FALSE}
f =
bf(y ~ mi(x) + z + (1 | g)) +
bf(x | mi() ~ z + (1 | g)) +
set_rescor(FALSE)
```
## Beyond the Model
The development of Stan and packages like <span class="pack">rstanarm</span> and <span class="pack">brms</span> is rapid, and with the combined powers of those involved, there are a lot of useful tools for exploring the model results. Even if one found a specialty package for a specific type of mixed model, it is doubtful you would have as many tools for model exploration such as posterior predictive checks, marginal effects, model comparison, basic model diagnostics and more. That said, the Stan ecosystem of R packages is notable at this point, and so use what works for your situation. For mixed models specifically, you should turn to these tools if <span class="pack" style = "">lme4</span> can't do it for you.