-
Notifications
You must be signed in to change notification settings - Fork 0
/
binary-models.Rmd
580 lines (389 loc) · 16.3 KB
/
binary-models.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
---
title: "Models for Binary Data"
description: |
Logistic regression / item response models
date: 06-11-2021
author:
- first_name: "Andrew"
last_name: "Ellis"
url: https://github.com/awellis
affiliation: Cognitive psychology, perception & methods, Univerity of Bern
affiliation_url: https://www.kog.psy.unibe.ch
orcid_id: 0000-0002-2788-936X
citation_url: https://awellis.github.io/learnmultilevelmodels/item-response-models.html
bibliography: ./bibliography.bib
output:
distill::distill_article:
toc: true
toc_float: true
toc_depth: 2
code_folding: false
css: ./css/style.css
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
cache = TRUE,
warning = FALSE,
message = FALSE)
# recall <- bcogsci::df_recall_complete |>
# select(subj, session, block, trial, set_size, response, rt, correct, response_category) |>
# drop_na()
# recall |> write_csv(file = "data/recall-oberauer.csv")
```
```{r packages}
library(knitr)
library(kableExtra)
library(tidyverse)
library(patchwork)
library(brms)
# set ggplot theme
theme_set(theme_grey(base_size = 14) +
theme(panel.grid = element_blank()))
# set rstan options
rstan::rstan_options(auto_write = TRUE)
options(mc.cores = 4)
```
# Logistic Regression
This example is based on[Bayesian Data Analysis for Cognitive Science](https://vasishth.github.io/bayescogsci/book/) and uses data from @oberauerWorkingMemoryCapacity2019.
In this study, the effect of word list length on working memory capacity was assessed. Subjects were shown word lists of varying length (set size: 2, 4, 6, or 8 words) and were required to recall the correct word according to its position in the list. A single trial is shown in Figure \@ref(fig:oberauer-2019).
```{r oberauer-2019, echo=FALSE, fig.cap="Subjects were required to indicate the correct word from the studied list according to its cued position."}
knitr::include_graphics("images/oberauer-2019.png")
```
```{r}
recall <- read_csv(file = "https://raw.githubusercontent.com/awellis/learnmultilevelmodels/main/data/recall-oberauer.csv")
recall <- recall |>
mutate(subj = as_factor(subj))
```
```{r}
glimpse(recall)
```
Count the number of subjects and number of trials per set size for each subjetc.
```{r}
recall |> distinct(subj) |> count()
recall |> count(subj, set_size)
```
The latter is equivalent to:
```{r}
recall |> group_by(subj, set_size) |>
summarize(N = n())
```
Before fitting any models, we will center the `set_size` variable.
```{r}
recall <- recall |>
mutate(c_set_size = set_size - mean(set_size))
```
We are interested in subjects' probability of responding correctly based on the set size. First, we can get a maximum likelihood estimate:
```{r}
recall_sum <- recall %>%
group_by(subj, set_size) %>%
summarise(accuracy = mean(correct),
sd = sd(correct))
```
```{r}
recall_sum |>
ggplot(aes(set_size, accuracy)) +
geom_line() +
geom_point() +
facet_wrap(~subj)
```
## Likelihood
Instead of computing point estimates, we can model these data using a generalized linear model. We know that each response is either correct or an error, and thus the most sensible distribution is a Bernoulli.
$$ correct_i \sim Bernoulli(\theta_{i})$$
Each response $i$ is drawn from a Bernoulli distribution, with a probability of being correct of $\theta_i$. THe probability of an error response is $1-\theta_i$.
We want to predict the probability of getting a correct response using a linear predictor. Since the linear predictor in on the real line, and the parameter we are trying to predict lies in the interval $[0, 1]$, we need to transform the linear predictor. This is conventionally written as
$$ g(\theta_i) = logit(\theta_i) = b_0 + b_{set\_size} \cdot set\_size_i $$
with the link function $g()$ being applied to the parameter of the likelihood distribution. An alternative way of writing this is
$$ \theta_i = g^{-1}(b_0 + b_{set\_size} \cdot set\_size_i) $$
where $g^{-1}$ is the inverse link function.
In a logistic regression, the link function is the `logit` function:
$$ logit(\theta) = log \bigg( \frac{\theta}{(1-\theta)} \bigg) $$
and gives the log-odds, i.e. the natural logarithm of the odds of a success.
Therefore(for a single subject)
$$ log \bigg( \frac{\theta_i}{1-\theta_i} \bigg) = b_0 + b_{set\_size} \cdot set\_size_i $$
The coefficients $b_0$ and $b_{set\_size}$ have an additive effect on the log-odds scale. If we want this effect on the probability scale, we need to apply the inverse link function. This is is function $f(x) = 1/(1 + exp(-x))$, and is known a the logistic function. It is also the cumulative distribution function of the logistic distribution, and exist in R under the name `plogis()`.
For example, if the log-odds are 0.1, then the probability is 0.525:
```{r}
logodds <- 0.1
prob <- plogis(logodds)
prob
```
We can plot the log-odds on the x axis, and the cumulative distribution on the y axis. Is it noticeable that the log-odds with an absolute value greater than approximately 5 lead to probabilities of 0 and 1, asymptotically. This is relevant when considering prior distributions.
```{r}
d1 <- tibble(x = seq(-5, 5, by = 0.01),
y = plogis(x))
d1 %>%
ggplot(aes(x, y)) +
geom_hline(yintercept = 0.5, linetype = 3) +
geom_vline(xintercept = 0, linetype = 3) +
geom_line(size = 2, color = "steelblue") +
xlab("log-odds") + ylab("probability of success")
```
## Prior distributions
Since we centered our predictor variable `c_set_size`, the zero point represents the average set size. The intercept will therefore represent the expected log-odds for the average set size. With minimal prior knowledge, we may assume that at the average set size, a subject may be equally likely to give a correct or error response. In that case, $\theta = 0.5$, and the log-odds are $0$.
We can express our uncertainty using a normal distribution centred at 0, with a standard deviation of 1, i.e. we are 95% certain that the log-odds will lie between $[-2, 2]$.
```{r}
qnorm(c(.025, .975), mean = 0, sd = 1.0)
```
YOu can plot the prior by using random draws from a normal distribution. The figure on the left shows the normal distribution on the log-odds scale, and the figure on the right shows the transformed values, on the probability scale.
```{r}
library(patchwork)
samples <- tibble(Intercept = rnorm(1e5, 0, 1.0),
p = plogis(Intercept))
p_logodds <- samples %>%
ggplot(aes(Intercept)) +
geom_density() +
ggtitle("Log-odds")
p_prob <- samples %>%
ggplot(aes(p)) +
geom_density() +
ggtitle("Probability")
p_logodds + p_prob
```
Our `normal(0, 1.0)` prior is very similar to using a `Beta(2, 2)` on the probability scale..
```{r}
tibble(x = seq(0, 1, by = 0.01),
dens = dbeta(x, shape1 = 2, shape2 = 2)) %>%
ggplot(aes(x, dens)) +
geom_line()
```
For the average effect of `set_size` we will choose a prior on the log-odds scale that is uninformative on the probability scale. A `normal(0, 0.5)` prior expresses the belief that the an increase of the set size by one unit will lead to an change in the log-odds of somewhere between $[-1, 1]$, with a probability of 95%. The figure on the right again shows the effect transformed onto the probability scale.
```{r}
samples2 <- tibble(b_set_size = rnorm(1e5, 0, 0.5),
p = plogis(b_set_size))
p_logodds <- samples2 %>%
ggplot(aes(b_set_size)) +
geom_density() +
ggtitle("Log-odds")
p_prob <- samples2 %>%
ggplot(aes(p)) +
geom_density() +
ggtitle("Probability")
p_logodds + p_prob
```
## Fitting models
We have set our priors on the average (population-level) effects of the set size.
Let's look at the default priors in a multilevel model
```{r}
library(brms)
```
```{r}
get_prior(correct ~ 1 + c_set_size + (1 + c_set_size | subj),
family = bernoulli(link = logit),
data = recall)
```
```{r}
priors <- prior(normal(0, 1.0), class = Intercept) +
prior(normal(0, .1), class = b, coef = c_set_size)
fit_recall_1 <- brm(correct ~ 1 + c_set_size + (1 + c_set_size | subj),
family = bernoulli(link = logit),
prior = priors,
data = recall,
file = "models/fit_recall-1",
file_refit = "on_change")
```
```{r}
fit_recall_1
```
```{r}
fit_recall_1 |> mcmc_plot()
```
We can plot the expected effect of set size on the probability of giving a correct response using `conditional_effects. The default shows the exptected value of the posterior predictive distribution.
```{r}
fit_recall_1 |>
conditional_effects()
```
If you want the expected log-odds, you can use the argument `method = 'posterior_linpred'`.
```{r}
fit_recall_1 |>
conditional_effects(method = 'posterior_linpred')
```
# Item Response Models
We will look at an example from @burknerBayesianItemResponse2020a. I would thoroughly recommend reading this if you are interested in logistics regression models, and Item Response Theory (IRT) models in particular.
IRT models are widely applied in the human sciences to model persons’ responses on a set of items measuring one or more latent constructs.
## Response distributions {#respdists}
The response format of the items will critically determine which distribution
is appropriate to model individuals' responses on the items. The possibility
of using a wide range of response distributions within the same framework and
estimating all of them using the same general-purpose algorithms is an
important advantage of Bayesian statistics.
If the response $y$ is a binary success (1) vs. failure (0) indicator, the
canonical family is the *Bernoulli* distribution with density
$$
y \sim \text{Bernoulli}(\psi) = \psi^y (1-\psi)^{1-y},
$$
where $\psi \in [0, 1]$ can be interpreted as the success probability. Common
IRT models that can be built on top of the Bernoulli distribution are the 1, 2,
and 3 parameter logistic models (1PL, 2PL, and 3PL models).
This results in what is known as a
*generalized linear model* (GLM). That is, the predictor term $\eta = \theta_p +
\xi_i$ is still linear but transformed, as a whole, by a non-linear function
$f$, which is commonly called 'response function'. For Bernoulli distributions,
we can canonically use the logistic response function
$$
f(\eta) = \text{logistic}(\eta) = \frac{\exp(\eta)}{1 + \exp(\eta)},
$$
which yields values $f(\eta) \in [0, 1]$ for any real value $\eta$. As a result,
we could write down the model of $\psi$ as
$$
\psi = \frac{\exp(\theta_p + \xi_i)}{1 + \exp(\theta_p + \xi_i)},
$$
which is known as the Rasch or 1PL model. Under the above model,
we can interprete $\theta_p$ as the ability of person $p$ (higher values of $\theta_p$ imply higher success probabilities regardless of the
administered item). $\xi_i$ can be interpreted as the easiness of item
$i$ (higher values of $\xi_i$ imply higher success probabilities regardless of
the person to which the item is administered).
## Data
The dataset is taken from @deboeck2011 and is included in the `lme4` package.
There are 24 items, based on four frustrating situations, two of which where someone else is to be blamed (e.g., “A bus fails to stop for me”), and two of which where one is to be blamed oneself (e.g., “I am entering a grocery store when it is about to close”). Each of these situations is combined with each of three behaviors, cursing, scolding, and shouting, leading to 4x3 combinations. These 12 combinations are formulated in two modes, a wanting mode and a doing mode, so that in total there are 24 items. An example is "A bus fails tostop for me. I would want to curse".
```{r}
data("VerbAgg", package = "lme4")
VerbAgg <- VerbAgg |> as_tibble()
head(VerbAgg, 10)
```
The variables are:
```{r echo=FALSE}
tribble(~Variable, ~Description,
"Anger", "the subject's Trait Anger score as measured on the State-Trait Anger Expression Inventory (STAXI)",
"Gender", "the subject's gender - a factor with levels M and F",
"item", "the item on the questionaire, as a factor",
"resp", "the subject's response to the item - an ordered factor with levels no < perhaps < yes",
"id", "the subject identifier, as a factor",
"btype", "behavior type - a factor with levels curse, scold and shout",
"situ", "situation type - a factor with levels other and self indicating other-to-blame and self-to-blame",
"mode", "behavior mode - a factor with levels want and do",
"r2", "dichotomous version of the response - a factor with levels N and Y") |>
kbl() |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
## Partial pooling for items
```{r}
formula_1pl <- bf(r2 ~ 1 + (1 | item) + (1 | id))
```
```{r}
get_prior(formula_1pl,
data = VerbAgg,
family = bernoulli())
```
To impose a small amount of regularization on the model, we'll set
$\text{half-normal}(0, 3)$ priors on the hierarchical standard deviations of
person and items parameters. Given the scale of the logistic response function,
this can be regarded as a weakly informative prior.
```{r}
prior_1pl <-
prior("normal(0, 3)", class = "sd", group = "id") +
prior("normal(0, 3)", class = "sd", group = "item")
```
```{r}
fit_1pl <- brm(formula_1pl,
prior = prior_1pl,
data = VerbAgg,
family = bernoulli(),
file = "models/fit_va_1pl") |>
add_criterion("loo")
```
```{r}
summary(fit_1pl)
plot(fit_1pl, ask = FALSE)
```
## No pooling for items
```{r}
fit_1pl_nopooling <- brm(r2 ~ 0 + item + (1 | id),
prior = prior("normal(0, 3)", class = "sd", group = "id"),
data = VerbAgg,
family = bernoulli(),
file = "models/fit_1pl_nopooling") |>
add_criterion("loo")
```
```{r}
summary(fit_1pl_nopooling)
```
## Person and item parameters
```{r}
# extract person parameters
ranef_1pl <- ranef(fit_1pl)
person_pars_1pl <- ranef_1pl$id
head(person_pars_1pl, 10)
```
```{r}
# extract item parameters
item_pars_1pl <- coef(fit_1pl)$item
head(item_pars_1pl, 10)
```
```{r}
# plot item parameters
item_pars_1pl[, , "Intercept"] %>%
as_tibble() %>%
rownames_to_column() %>%
rename(item = "rowname") %>%
mutate(item = as.numeric(item)) %>%
ggplot(aes(item, Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_pointrange() +
coord_flip() +
labs(x = "Item Number")
```
```{r}
# plot person parameters
person_pars_1pl[, , "Intercept"] %>%
as_tibble() %>%
rownames_to_column() %>%
arrange(Estimate) %>%
mutate(id = seq_len(n())) %>%
ggplot(aes(id, Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_pointrange(alpha = 0.4) +
coord_flip() +
labs(x = "Person Number (Sorted)")
```
```{r}
library(tidybayes)
fit_1pl |>
spread_draws(r_id[id,Intercept])
```
```{r eval=FALSE, include=FALSE}
varying_subject_effects <- fit_1pl |>
spread_draws(b_Intercept, r_id[id, Intercept]) |>
median_qi(ability = b_Intercept + r_id, .width = c(.95)) |>
arrange(ability)
varying_subject_effects |>
mutate(id = as_factor(id),
id = fct_reorder(id, ability, .fun = 'min')) |>
ggplot(aes(y = id, x = ability, xmin = .lower, xmax = .upper)) +
geom_pointinterval(alpha = 0.4)
```
```{r}
varying_item_effects <- fit_1pl |>
spread_draws(b_Intercept, r_item[item, Intercept]) |>
median_qi(difficulty = b_Intercept + r_item, .width = c(.95)) |>
arrange(difficulty)
varying_item_effects |>
ggplot(aes(y = item, x = difficulty, xmin = .lower, xmax = .upper)) +
geom_pointinterval(alpha = 0.4) +
ggtitle("Partial pooling estimates")
```
```{r eval=FALSE, include=FALSE}
r2 ~ Anger + Gender + btype + situ + mode + mode:Gender +
(0 + Gender | item) + (0 + mode | id)
```
```{r}
population_item_effects <- fit_1pl_nopooling |>
gather_draws(`b_item.*`, regex = TRUE) |>
median_qi(.width = c(.95))
```
```{r}
population_item_effects |>
ggplot(aes(y = .variable, x = .value, xmin = .lower, xmax = .upper)) +
geom_pointinterval(alpha = 0.4) +
ggtitle("No pooling estimates")
```
```{r eval=FALSE, include=FALSE}
library(lme4)
# specify a 1PL model with lme4 for comparison
lme4_va_1pl <- glmer(
r2 ~ 1 + (1 | item) + (1 | id),
data = VerbAgg,
family = binomial()
)
summary(lme4_va_1pl)
# person and item parameters are similar to those obtained by brms
coef(lme4_va_1pl)$item
ranef(lme4_va_1pl)$id
```