-
Notifications
You must be signed in to change notification settings - Fork 44
/
23.Rmd
2494 lines (1985 loc) · 106 KB
/
23.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = FALSE, cachse = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 110)
```
# Ordinal Predicted Variable
> This chapter considers data that have an ordinal predicted variable. For example, we might want to predict people's happiness ratings on a $1$-to-$7$ scale as a function of their total financial assets. Or we might want to predict ratings of movies as a function of the year they were made.
>
> One traditional treatment of this sort of data structure is called *ordinal or ordered probit regression*. We will consider a Bayesian approach to this model. As usual, in Bayesian software, it is easy to generalize the traditional model so it is robust to outliers, allows different variances within levels of a nominal predictor, or has hierarchical structure to share information across levels or factors as appropriate.
>
In the context of the generalized linear model (GLM) introduced in Chapter 15, this chapter's situation involves an inverse-link function that is a thresholded cumulative normal with a categorical distribution for describing noise in the data, as indicated in the fourth row of Table 15.2 (p. 443). For a reminder of how this chapter's combination of predicted and predictor variables relates to other combinations, see Table 15.3 (p. 444). [@kruschkeDoingBayesianData2015, p. 671, *emphasis* in the original]
We might follow Kruschke's advice and rewind a little before tackling this chapter. The cumulative normal function is denoted $\Phi(x; \mu, \sigma)$, where $x$ is some real number and $\mu$ and $\sigma$ have their typical meaning. The function is cumulative in that it produces values ranging from zero to 1. Figure 15.8 showed an example of the normal distribution atop of the cumulative normal. Here it is, again.
```{r, warning = F, message = F}
library(tidyverse)
d <-
tibble(z = seq(from = -3, to = 3, by = .1)) %>%
# add the density values
mutate(`p(z)` = dnorm(z, mean = 0, sd = 1),
# add the CDF values
`Phi(z)` = pnorm(z, mean = 0, sd = 1))
head(d)
```
It's time to talk color and theme. For this chapter, we'll take our color palette from the [**scico** package](https://CRAN.R-project.org/package=scico) [@R-scico], which provides 17 perceptually-uniform and colorblind-safe palettes based on the work of [Fabio Crameri](https://twitter.com/fcrameri). Our palette of interest will be `"lajolla"`.
```{r, warning = F, message = F, fig.height = 3}
library(scico)
sl <- scico(palette = "lajolla", n = 9)
scales::show_col(sl)
```
Our overall plot theme will be based on `ggplot2::theme_linedraw()` with just a few adjustments.
```{r, warning = F, message = F}
theme_set(
theme_linedraw() +
theme(panel.grid = element_blank(),
strip.background = element_rect(color = sl[9], fill = sl[9]),
strip.text = element_text(color = sl[1]))
)
```
Now plot!
```{r, fig.width = 3.5, fig.height = 5.5}
p1 <-
d %>%
ggplot(aes(x = z, y = `p(z)`)) +
geom_area(aes(fill = z <= 1),
show.legend = F) +
geom_line(linewidth = 1, color = sl[3]) +
scale_fill_manual(values = c("transparent", sl[2])) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
labs(title = "Normal Density",
y = expression(p(italic(z))))
p2 <-
d %>%
ggplot(aes(x = z, y = `Phi(z)`)) +
geom_area(aes(fill = z <= 1),
show.legend = F) +
geom_line(linewidth = 1, color = sl[3]) +
scale_fill_manual(values = c("transparent", sl[2])) +
scale_y_continuous(expand = expansion(mult = c(0, 0)), limits = 0:1) +
labs(title = "Cumulative Normal",
y = expression(Phi(italic(z))))
# combine and adjust with patchwork
library(patchwork)
p1 / p2 &
scale_x_continuous(breaks = -2:2) &
coord_cartesian(xlim = c(-2.5, 2.5))
```
For both plots, `z` is in a standardized metric (i.e., $z$-score). With the cumulative normal function, the cumulative probability $\Phi(z)$ increases nonlinearly with the $z$-scores such that, much like with the logistic curve, the greatest change occurs around $z = 0$ and tapers off in the tails.
The inverse of $\Phi(x)$ is the *probit* function. As indicated in the above block quote, we'll be making extensive use of the probit function in this chapter for our Bayesian models.
## Modeling ordinal data with an underlying metric variable
> You can imagine that the distribution of ordinal values might not resemble a normal distribution, even though the underlying metric values are normally distributed. Figure 23.1 shows some examples of ordinal outcome probabilities generated from an underlying normal distribution. The horizontal axis is the underlying continuous metric value. Thresholds are plotted as vertical dashed lines, labeled $\theta$. In all examples, the ordinal scale has $7$ levels, and hence, there are $6$ thresholds. The lowest threshold is set at $\theta_1 = 1.5$ (to separate outcomes $1$ and $2$), and the highest threshold is set at $\theta_1 = 6.5$ (to separate outcomes $6$ and $7$). The normal curve in each panel shows the distribution of underlying continuous values. What differs across panels are the settings of means, standard deviations, and remaining thresholds. (p. 672)
The various Figure 23.1 subplots require a lot of ins and outs. We'll start with the top panel and build from there. Here is how we might make the values necessary for the density curve.
```{r}
den <-
# define the parameters for the underlying normal distribution
tibble(mu = 4,
sigma = 1.5) %>%
mutate(strip = str_c("mu==", mu, "~~sigma==", sigma)) %>%
# this will allow us to rescale the density in terms of the bar plot
mutate(multiplier = 26 / dnorm(mu, mu, sigma)) %>%
# we need values for the x-axis
expand_grid(y = seq(from = -1, to = 9, by = .1)) %>%
# compute the density values
mutate(density = dnorm(y, mu, sigma)) %>%
# use that multiplier value from above to rescale the density values
mutate(percent = density * multiplier)
head(den)
```
Before making the data for the bar portion of the plot, we'll need to define the $\theta$-values they'll be placed between. We also need to define the exact points on the $x$-axis from which we'd like those bars to originate. Those points, which we'll call `label_1`, will double as names for the individual bars.
```{r}
(theta_1 <- seq(from = 1.5, to = 6.5, by = 1))
(label_1 <- 1:7)
```
Now we can define the data for the bars.
```{r}
bar <-
# define the parameters for the underlying normal distribution
tibble(mu = 4,
sigma = 1.5) %>%
mutate(strip = str_c("mu==", mu, "~~sigma==", sigma)) %>%
# take random draws from the underlying normal distribution
mutate(draw = map2(mu, sigma, ~rnorm(1e4, mean = .x, sd = .y))) %>%
unnest(draw) %>%
# bin those draws into ordinal categories defined by `theta_1`
# and named by `label_1`
mutate(y = case_when(
draw < theta_1[1] ~ label_1[1],
draw < theta_1[2] ~ label_1[2],
draw < theta_1[3] ~ label_1[3],
draw < theta_1[4] ~ label_1[4],
draw < theta_1[5] ~ label_1[5],
draw < theta_1[6] ~ label_1[6],
draw >= theta_1[6] ~ label_1[7]
)) %>%
# summarize
count(y) %>%
mutate(percent = (100 * n / sum(n)) %>% round(0)) %>%
mutate(percent_label = str_c(percent, "%"),
percent_max = max(percent))
head(bar)
```
Make the top subplot.
```{r, fig.width = 5, fig.height = 2.25}
bar %>%
ggplot(aes(x = y, y = percent)) +
geom_area(data = den,
fill = sl[3]) +
geom_vline(xintercept = theta_1, color = "white", linetype = 3) +
geom_col(width = .5, alpha = .85, fill = sl[7]) +
geom_text(aes(y = percent + 2, label = percent_label),
size = 3.5) +
annotate(geom = "text",
x = theta_1, y = -5.55, label = theta_1,
size = 3) +
scale_x_continuous(NULL, expand = c(0, 0),
breaks = theta_1,
labels = parse(text = str_c("theta[", 1:6, "]"))) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.06))) +
coord_cartesian(ylim = c(0, 28.5), clip = F) +
theme(plot.margin = margin(5.5, 5.5, 11, 5.5)) +
facet_wrap(~ strip, labeller = label_parsed)
```
This method works okay for plotting one or two panels. The sheer number of code lines and moving parts seem unwieldy for plotting four. It'd be convenient if we could save the density information for all four panels in one data object. Here's one way how.
```{r}
den <-
tibble(panel = 1:4,
mu = c(4, 1, 4, 4),
sigma = c(1.5, 2.5, 1, 3)) %>%
mutate(strip = factor(panel,
labels = str_c("mu==", mu, "~~sigma==", sigma),
ordered = T)) %>%
mutate(multiplier = c(26, 58, 24, 26) / dnorm(mu, mu, sigma)) %>%
expand_grid(y = seq(from = -1, to = 9, by = .1)) %>%
mutate(density = dnorm(y, mu, sigma)) %>%
mutate(percent = density * multiplier)
head(den)
```
Notice we added a `panel` column for indexing the subplots. Next we'll need to define `theta_[i]` and `label_[i]` values for the remaining plots.
```{r}
theta_3 <- c(1.5, 3.1, 3.7, 4.3, 4.9, 6.5)
theta_4 <- c(1.5, 2.25, 3, 5, 5.75, 6.5)
label_3 <- c(1, 2.2, 3.4, 4, 4.6, 5.7, 7)
label_4 <- c(1, 1.875, 2.625, 4, 5.375, 6.125, 7)
```
Since the values are the same for the top two panels, we didn't bother defining a `theta_2` or `label_2`. Now we have all the `theta_[i]` and `label_[i]` values, we'll want to make a function that can use them within `case_when()` for any of the four panels. Here's one way to make such a function, which we'll call `make_ordinal()`.
```{r}
make_ordinal <- function(x, panel) {
if (panel < 3) {
case_when(
x < theta_1[1] ~ label_1[1],
x < theta_1[2] ~ label_1[2],
x < theta_1[3] ~ label_1[3],
x < theta_1[4] ~ label_1[4],
x < theta_1[5] ~ label_1[5],
x < theta_1[6] ~ label_1[6],
x >= theta_1[6] ~ label_1[7]
)
} else if (panel == 3) {
case_when(
x < theta_3[1] ~ label_3[1],
x < theta_3[2] ~ label_3[2],
x < theta_3[3] ~ label_3[3],
x < theta_3[4] ~ label_3[4],
x < theta_3[5] ~ label_3[5],
x < theta_3[6] ~ label_3[6],
x >= theta_3[6] ~ label_3[7]
)
} else {
case_when(
x < theta_4[1] ~ label_4[1],
x < theta_4[2] ~ label_4[2],
x < theta_4[3] ~ label_4[3],
x < theta_4[4] ~ label_4[4],
x < theta_4[5] ~ label_4[5],
x < theta_4[6] ~ label_4[6],
x >= theta_4[6] ~ label_4[7]
)
}
}
```
Now put those values and our `make_ordinal()` function to work to make the data for the bar plots.
```{r, fig.width = 5, fig.height = 8}
set.seed(23)
bar <-
tibble(panel = 1:4,
mu = c(4, 1, 4, 4),
sigma = c(1.5, 2.5, 1, 3)) %>%
mutate(strip = factor(panel,
labels = str_c("mu==", mu, "~~sigma==", sigma),
ordered = T)) %>%
mutate(draw = map2(mu, sigma, ~rnorm(1e5, mean = .x, sd = .y))) %>%
unnest(draw) %>%
mutate(y = map2_dbl(draw, panel, make_ordinal)) %>%
group_by(panel, strip) %>%
count(y) %>%
mutate(percent = (100 * n / sum(n)) %>% round(0)) %>%
mutate(percent_label = str_c(percent, "%"),
percent_max = max(percent))
head(bar)
```
Like before, we added a `panel` index. As our final preparatory step, we will make something of a super function with which we'll plug the desired information into **ggplot2**, which will then make each subplot. Much of the plotting and data wrangling code will be the same across subplots. As far as I can tell, we only need to vary four parameters. First, we'll want to be able to subset the data by `panel` index. We'll do that with the `panel_n` argument. Second, we'll want to select which of the `theta_[i]` values we'd like to use in `geom_vline()`, `annotate()`, and `scale_x_continuous()`. We'll do that with the `theta` argument. We'll make a `y_second_x` to pin down exactly where below the $x$-axis we'd like to put those secondary axis values defined by the `theta_[i]` values. Finally, we'll want an `ylim_ub` parameter to set the upper limit of the $y$-axis with. The name of our four-parameter super function will be `plot_bar_den()`.
```{r}
plot_bar_den <- function(panel_n, theta, y_second_x, ylim_ub) {
bar %>%
filter(panel == panel_n) %>%
ggplot(aes(x = y)) +
geom_area(data = den %>% filter(panel == panel_n),
aes(y = percent),
fill = sl[3]) +
geom_vline(xintercept = theta, color = "white", linetype = 3) +
geom_linerange(aes(ymin = 0, ymax = percent),
color = sl[7], alpha = .85, linewidth = 8) +
geom_text(aes(y = percent + (percent_max / 15), label = percent_label),
size = 3.5) +
annotate(geom = "text", x = theta, y = y_second_x,
label = theta, size = 3) +
scale_x_continuous(NULL,
breaks = theta,
labels = parse(text = str_c("theta[", 1:6, "]")),
expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
coord_cartesian(ylim = c(0, ylim_ub),
clip = F) +
theme(plot.margin = margin(5.5, 5.5, 11, 5.5)) +
facet_wrap(~ strip, labeller = label_parsed)
}
```
Finally, make all four subplots and combine them with **patchwork** syntax!
```{r, eval = F, echo = F}
# approximate values for the bookdown 0.4.0 approach
bar %>%
group_by(panel) %>%
filter(percent == max(percent)) %>%
mutate(y_second_x = (percent + (percent_max / 15)) / -4.1)
# suggested updates
bar %>%
group_by(panel) %>%
filter(percent == max(percent)) %>%
mutate(y_second_x = (percent + (percent_max / 15)) / -5)
```
```{r, fig.width = 5, fig.height = 9}
p1 <- plot_bar_den(panel_n = 1,
theta = theta_1,
# y_second_x = -6.75,
y_second_x = -5.55,
ylim_ub = 28)
p2 <- plot_bar_den(panel_n = 2,
theta = theta_1,
# y_second_x = -15.5 * 3/4,
y_second_x = -12.37,
ylim_ub = 63)
p3 <- plot_bar_den(panel_n = 3,
theta = theta_3,
# y_second_x = -6.25 * 3/4,
y_second_x = -5.12,
ylim_ub = 25.75)
p4 <- plot_bar_den(panel_n = 4,
theta = theta_4,
# y_second_x = -6.75 * 3/4,
y_second_x = -5.55,
ylim_ub = 28)
p1 / p2 / p3 / p4
```
Oh mamma. "*The crucial concept in Figure 23.1 is that the probability of a particular ordinal outcome is the area under the normal curve between the thresholds of that outcome*" (p. 672, *emphasis* in the original). In each of the subplots, we used six thresholds to discretize the continuous data into seven categories. More generally, we need $K$ thresholds to make $K + 1$ ordinal categories. To make this work,
> the idea is that we consider the cumulative area under the normal up the high-side threshold, and subtract away the cumulative area under the normal up to the low-side threshold. Recall that the cumulative area under the standardized normal is denoted $\Phi(z)$, as was illustrated in Figure 15.8 [which we remade at the top of this chapter]. Thus, the area under the normal to the left of $\theta_k$ is $\Phi((\theta_k - \mu) / \sigma)$, and the area under the normal to the left of $\theta_{k - 1}$ is $\Phi((\theta_{k - 1} - \mu) / \sigma)$. Therefore, the area under the normal curve between the two thresholds, which is the probability of outcome $k$, is
>
> $$p(y = k | \mu, \sigma, \{ \theta_j \}) = \Phi((\theta_k - \mu) / \sigma) - \Phi((\theta_{k - 1} - \mu) / \sigma)$$
>
> [This equation] applies even to the least and greatest ordinal values if we append two "virtual" thresholds at $- \infty$ and $+ \infty$...
>
> Thus, a normally distributed underlying metric value can yield a clearly non-normal distribution of discrete ordinal values. This result does not imply that the ordinal values can be treated as if they were themselves metric and normally distributed; in fact it implies the opposite: We might be able to model a distribution of ordinal values as consecutive intervals of a normal distribution on an underlying metric scale with appropriately positioned thresholds. (pp. 674--675)
## The case of a single group
Given a model with no predictors, "if there are $K$ ordinal values, the model has $K + 1$ parameters: $\theta_1, \dots,\theta_{K - 1}, \mu$, and $\sigma$. If you think about it a moment, you’ll realize that the parameter values trade-off and are undetermined" (p. 675). The solution Kruschke took throughout this chapter was to fix the two thresholds at the ends, $\theta_1$ and $\theta_{K - 1}$, to the constants
\begin{align*}
\theta_1 \equiv 1 + 0.5 && \text{and} && \theta_{K - 1} \equiv K - 0.5.
\end{align*}
For example, all four subplots from Figure 23.1 had $K = 7$ categories, ranging from 1 to 7. Following Kruschke's convention would mean setting the endmost thresholds to
\begin{align*}
\theta_1 \equiv 1.5 && \text{and} && \theta_6 \equiv 6.5.
\end{align*}
As we'll see, there are other ways to parameterize these models.
### Implementation in ~~JAGS~~ **brms**.
The syntax to fit a basic ordered probit model with `brms::brm()` is pretty simple.
```{r, eval = F}
fit <-
brm(data = my_data,
family = cumulative(probit),
y ~ 1,
prior(normal(0, 4), class = Intercept))
```
The `family = cumulative(probit)` tells **brms** you'd like to use the probit link for the ordered-categorical data. It's important to specify `probit` because the **brms** default is to use the `logit` link, instead. We'll talk more about that approach at the end of this chapter.
Remember how, at the end of the last section, we said there are other ways to parameterize the ordered probit model? As it turns out, **brms** does not follow Kruschke's approach for fixing the thresholds on the ends. Rather, **brms** freely estimates all thresholds, $\theta_1,...,\theta_{K - 1}$, by fixing $\mu = 0$ and $\sigma = 1$. That is, instead of estimating $\mu$ and $\sigma$ from the normal cumulative density function $\Phi(x)$, `brms::brm()` uses the standard normal cumulative density function $\Phi(z)$.
This all probably seems abstract. We'll get a lot of practice comparing the two approaches as we go along. Each has its strengths and weaknesses. At this point, the thing to get is that when fitting a single-group ordered-probit model with the `brm()` function, there will be no priors for $\mu$ and $\sigma$. We only have to worry about setting the priors for all $K - 1$ thresholds. And because those thresholds are conditional on $\Phi(z)$, we should think about their priors with respect to the scale of standard normal distribution. Thus, to continue on with Kruschke's minimally-informative prior approach, something like `prior(normal(0, 4), class = Intercept)` might be a good starting place. Do feel free to experiment with different settings.
### Examples: Bayesian estimation recovers true parameter values.
The data for Kruschke's first example come from his `OrdinalProbitData-1grp-1.csv` file. Load the data.
```{r, warning = F, message = F}
my_data_1 <- read_csv("data.R/OrdinalProbitData-1grp-1.csv")
glimpse(my_data_1)
```
Plot the distribution for `Y`.
```{r, fig.width = 4, fig.height = 2.5}
my_data_1 %>%
mutate(Y = factor(Y)) %>%
ggplot(aes(x = Y)) +
geom_bar(fill = sl[5]) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)))
```
It looks a lot like the distribution of the data from one of the panels from Figure 23.1. Load **brms**.
```{r, warning = F, message = F}
library(brms)
```
Fit the first cumulative-probit model.
```{r fit23.1}
fit23.1 <-
brm(data = my_data_1,
family = cumulative(probit),
Y ~ 1,
prior(normal(0, 4), class = Intercept),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 23,
file = "fits/fit23.01")
```
Examine the model summary.
```{r}
print(fit23.1)
```
The **brms** output for these kinds of models names the thresholds $\theta_{[i]}$ as `Intercept[i]`. Again, whereas Kruschke identified his model by fixing $\theta_1 = 1.5$ (i.e., $1 + 0.5$) and $\theta_6 = 5.5$ (i.e., $6 - 0.5$), we freely estimated all six thresholds by using the cumulative density function for the standard normal. As a result, our thresholds are in a different metric from Kruschke's.
Let's extract the posterior draws.
```{r}
draws <- as_draws_df(fit23.1)
glimpse(draws)
```
Wrangle the `draws` a bit.
```{r, warning = F}
draws <-
draws %>%
select(.draw, `b_Intercept[1]`:`b_Intercept[6]`)
```
Here's our **brms** version of the bottom plot of Figure 23.2.
```{r, fig.width = 5, fig.height = 2}
# compute the posterior means for each threshold
means <-
draws %>%
summarise_at(vars(`b_Intercept[1]`:`b_Intercept[6]`), mean) %>%
pivot_longer(everything(),
values_to = "mean")
# wrangle
draws %>%
pivot_longer(-.draw, values_to = "threshold") %>%
group_by(.draw) %>%
mutate(theta_bar = mean(threshold)) %>%
# finally we plot
ggplot(aes(x = threshold, y = theta_bar, color = name)) +
geom_vline(data = means,
aes(xintercept = mean, color = name),
linetype = 2) +
geom_point(alpha = 1/10) +
scale_color_scico_d(palette = "lajolla", begin = .25) +
ylab("mean threshold") +
theme(legend.position = "none")
```
The initial `means` data at the top contains the $\theta_i$-specific means, which we used to make the dashed vertical lines with `geom_vline()`. Did you see what we did there with those `group_by()` and `mutate()` lines? That's how we computed the mean threshold within each step of the HMC chain, what Kruschke (p. 680) denoted as $\bar \theta (s) = \sum_k^{K-1} \theta_k (s) / (K - 1)$, where $s$ refers to particular steps in the HMC chain.
Perhaps of greater interest, you might have noticed how different our plot is from the one in the text. We might should compare the results of our **brms** parameterization of $\theta_{[i]}$ with one based on the parameterization in the text in an expanded version of the bottom plot of Figure 23.2. To convert our **brms** output to match Kruschke's, we'll rescale our $\theta_{[i]}$ draws with help from the `scales::rescale()` function, about which you might learn more [here](https://www.rdocumentation.org/packages/scales/versions/0.4.1/topics/rescale).
```{r, fig.width = 5, fig.height = 4, message = F}
# primary data wrangling
p <-
bind_rows(
# brms parameterization
draws %>%
pivot_longer(-.draw, values_to = "threshold") %>%
group_by(.draw) %>%
mutate(theta_bar = mean(threshold)),
# Kruschke's parameterization
draws %>%
pivot_longer(-.draw, values_to = "threshold") %>%
group_by(.draw) %>%
mutate(threshold = scales::rescale(threshold, to = c(1.5, 6.5))) %>%
mutate(theta_bar = mean(threshold))
) %>%
# add an index
mutate(model = rep(c("brms parameterization", "Kruschke's parameterization"), each = n() / 2))
# compute the means by model and threshold for the vertical lines
means <-
p %>%
ungroup() %>%
group_by(model, name) %>%
summarise(mean = mean(threshold))
# plot!
p %>%
ggplot(aes(x = threshold, y = theta_bar)) +
geom_vline(data = means,
aes(xintercept = mean, color = name),
linetype = 2) +
geom_point(aes(color = name),
alpha = 1/10, size = 1/2) +
scale_color_scico_d(palette = "lajolla", begin = .25) +
ylab("mean threshold") +
theme(legend.position = "none") +
facet_wrap(~ model, ncol = 1, scales = "free")
```
We can take our rescaling approach further to convert the posterior distributions for $\mu$ and $\sigma$ from the **brms** $\operatorname{Normal}(0, 1)$ constants to the metric from Kruschke's approach. Say $y_1$ and $y_2$ are two draws from some Gaussian and $z_1$ and $z_2$ are their corresponding $z$-scores. Here's how to solve for $\sigma$.
\begin{align*}
z_1 - z_2 & = \frac{(y_1 - \mu)}{\sigma} - \frac{(y_2 - \mu)}{\sigma} \\
& = \frac{(y_1 - \mu) - (y_2 - \mu)}{\sigma} \\
& = \frac{y_1 - \mu - y_2 + \mu}{\sigma} \\
& = \frac{y_1 - y_2}{\sigma}, \;\; \text{therefore} \\
\sigma & = \frac{y_1 - y_2}{z_1 – z_2}.
\end{align*}
If you'd like to compute $\mu$, it's even simpler.
\begin{align*}
z_1 & = \frac{y_1 - \mu}{\sigma} \\
z_1 \sigma & = y_1 - \mu \\
z_1 \sigma + \mu & = y_1, \;\; \text{therefore} \\
\mu & = y_1 - z_1 \sigma
\end{align*}
Big shout out to my math-stats savvy friends academic twitter for the formulas, especially [Ph.Demetri](https://twitter.com/PhDemetri), [Lukas Neugebauer](https://twitter.com/lukasneugebau), and [Brenton Wiernik](https://twitter.com/bmwiernik) for walking out the formulas (see [this twitter thread](https://twitter.com/SolomonKurz/status/1219649489016958976)). For our application, `Intercept[1]` and `Intercept[6]` will be our two $z$-scores and Kruschke's 1.5 and 6.5 will be their corresponding $y$-values.
```{r, fig.width = 8, fig.height = 2.25, message = F, warning = F}
library(tidybayes)
draws %>%
select(.draw, `b_Intercept[1]`, `b_Intercept[6]`) %>%
mutate(`y[1]` = 1.5,
`y[6]` = 6.5) %>%
mutate(mu = `y[1]` - `b_Intercept[1]` * 1,
sigma = (`y[1]` - `y[6]`) / (`b_Intercept[1]` - `b_Intercept[6]`)) %>%
mutate(`(mu-2)/sigma` = (mu - 2) / sigma) %>%
pivot_longer(mu:`(mu-2)/sigma`) %>%
mutate(name = factor(name, levels = c("mu", "sigma", "(mu-2)/sigma"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8],
normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
facet_wrap(~ name, scales = "free", labeller = label_parsed)
```
Our results are similar to Kruschke's. Given we used a different algorithm, a different parameterization, and different priors, I'm not terribly surprised they're a little different. If you have more insight on the matter or have spotted a flaw in this method, [please share with the rest of us](https://github.com/ASKurz/Doing-Bayesian-Data-Analysis-in-brms-and-the-tidyverse/issues).
It's unclear, to me, how we'd interpret the effect size. The difficulty isn't that Kruschke's comparison of $C = 2.0$ is arbitrary, but that we can only interpret the comparison given the model assumption of $\theta_1 = 1.5$ and $\theta_6 = 6.5$. If your theory doesn't allow you to understand the meaning of those constants and why you'd prefer them to slightly different ones, you'd be fooling yourself if you attempted to interpret any effect sizes conditional on those values. Proceed with caution, friends.
In the large paragraph on the lower part of page 679, Kruschke discussed why the thresholds tend to have nontrivial covariances. This is what he was trying to convey with the bottom subplot in Figure 23.2. Just for practice, we might also explore the correlation matrix among the thresholds with a customized `GGally::ggpairs()` plot.
```{r, fig.width = 6, fig.height = 6, warning = F, message = F}
# load
library(GGally)
# customize
my_upper <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping) +
geom_point(size = 1/5, alpha = 1/5, color = sl[7])
}
my_diag <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping) +
geom_density(linewidth = 0, fill = sl[3]) +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
}
my_lower <- function(data, mapping, ...) {
# get the x and y data to use the other code
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# compute the correlations
corr <- cor(x, y, method = "p", use = "pairwise")
# plot the cor value
ggally_text(
label = formatC(corr, digits = 2, format = "f") %>% str_replace(., "0\\.", "."),
mapping = aes(),
color = "black",
size = 4) +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
}
# wrangle
as_draws_df(fit23.1) %>%
select(contains("Intercept")) %>%
set_names(str_c("theta[", 1:6, "]")) %>%
# plot!
ggpairs(upper = list(continuous = my_upper),
diag = list(continuous = my_diag),
lower = list(continuous = my_lower),
labeller = label_parsed) +
ggtitle("The thresholds are highly correlated")
```
```{r, echo = F, eval = F}
# Valid distributional parameters are: 'mu', 'disc'
# fitted(fit23.1)
```
Kruschke didn't do this in the text, but it might be informative to plot the probability distributions for the seven categories from `Y`, $p(y = k | \mu = 0, \sigma = 1, \{ \theta_i \})$.
```{r, fig.width = 6, fig.height = 2.25, warning = F, message = F}
library(tidybayes)
draws %>%
select(-.draw) %>%
mutate_all(.funs = ~pnorm(. ,0, 1)) %>%
transmute(`p[Y==1]` = `b_Intercept[1]`,
`p[Y==2]` = `b_Intercept[2]` - `b_Intercept[1]`,
`p[Y==3]` = `b_Intercept[3]` - `b_Intercept[2]`,
`p[Y==4]` = `b_Intercept[4]` - `b_Intercept[3]`,
`p[Y==5]` = `b_Intercept[5]` - `b_Intercept[4]`,
`p[Y==6]` = `b_Intercept[6]` - `b_Intercept[5]`,
`p[Y==7]` = 1 - `b_Intercept[6]`) %>%
set_names(1:7) %>%
pivot_longer(everything(), names_to = "Y") %>%
ggplot(aes(x = value, y = Y)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8], size = 1/2, height = 2.5) +
scale_x_continuous(expression(italic(p)*"["*Y==italic(i)*"]"),
breaks = 0:5 / 5,
expand = c(0, 0), limits = c(0, 1))
```
Happily, the model produces data that look a lot like those from which it was generated.
```{r, fig.width = 4, fig.height = 2.5}
set.seed(23)
draws %>%
mutate(z = rnorm(n(), mean = 0, sd = 1)) %>%
mutate(Y = case_when(
z < `b_Intercept[1]` ~ 1,
z < `b_Intercept[2]` ~ 2,
z < `b_Intercept[3]` ~ 3,
z < `b_Intercept[4]` ~ 4,
z < `b_Intercept[5]` ~ 5,
z < `b_Intercept[6]` ~ 6,
z >= `b_Intercept[6]` ~ 7
) %>% as.factor(.)) %>%
ggplot(aes(x = Y)) +
geom_bar(fill = sl[5]) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)))
```
Along similar lines, we can use the `pp_check()` function to make a version of the upper right panel of Figure 23.2. The `type = "bars"` argument will allow us to summarize the posterior predictions as a dot (mean) and standard error bars superimposed on a bar plot of the original data. Note how this differs a little from Kruschke's use of the posterior median and 95% HDIs. The `ndraws = 1000` argument controls how many posterior predictions we wanted to summarize over. The rest is just formatting.
```{r, fig.width = 4, fig.height = 2.5, message = F, warning = F}
bayesplot::color_scheme_set(sl[2:7])
set.seed(23)
pp_check(fit23.1, type = "bars", ndraws = 1000, fatten = 2) +
scale_x_continuous("y", breaks = 1:7) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
ggtitle("Data with posterior predictions",
subtitle = "N = 100") +
theme(legend.background = element_blank(),
legend.position = c(.9, .8))
```
Load the data for the next model.
```{r, warning = F, message = F}
my_data_2 <- read_csv("data.R/OrdinalProbitData-1grp-2.csv")
```
Since we're reusing all the specifications from the last model for this one, we can just use `update()`.
```{r fit23.2}
fit23.2 <-
update(fit23.1,
newdata = my_data_2,
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 23,
file = "fits/fit23.02")
```
```{r}
print(fit23.2)
```
Extract and wrangle the posterior draws.
```{r, warning = F}
draws <-
as_draws_df(fit23.2) %>%
select(.draw, `b_Intercept[1]`:`b_Intercept[6]`)
```
Now we might compare the **brms** parameterization of $\theta_{[i]}$ with Kruschke's parameterization in an expanded version of the bottom plot of Figure 23.3. As we'll be making a lot of these plots throughout this chapter, it might be worthwhile to just make a custom function. We'll call it `compare_thresholds()`.
```{r}
compare_thresholds <- function(data, lb = 1.5, ub = 6.5) {
# we have two parameters:
# lb = lower bound
# ub = upper bound
# primary data wrangling
p <-
bind_rows(
data %>%
pivot_longer(-.draw, values_to = "threshold") %>%
group_by(.draw) %>%
mutate(theta_bar = mean(threshold)),
data %>%
pivot_longer(-.draw, values_to = "threshold") %>%
group_by(.draw) %>%
mutate(threshold = scales::rescale(threshold, to = c(lb, ub))) %>%
mutate(theta_bar = mean(threshold))
) %>%
mutate(model = rep(c("brms parameterization", "Kruschke's parameterization"), each = n() / 2))
# compute the means by model and threshold for the vertical lines
means <-
p %>%
ungroup() %>%
group_by(model, name) %>%
summarise(mean = mean(threshold))
# plot!
p %>%
ggplot(aes(x = threshold, y = theta_bar)) +
geom_vline(data = means,
aes(xintercept = mean, color = name),
linetype = 2) +
geom_point(aes(color = name),
alpha = 1/10, size = 1/2) +
scale_color_scico_d(palette = "lajolla", begin = .25) +
ylab("mean threshold") +
theme(legend.position = "none") +
facet_wrap(~ model, ncol = 1, scales = "free")
}
```
Take that puppy for a spin.
```{r, fig.width = 5, fig.height = 4, message = F}
draws %>%
compare_thresholds(lb = 1.5, ub = 6.5)
```
Oh man, that works sweet. Now let's use the same parameter-transformation approach from before to get our un-standardized posteriors for $\mu$, $\sigma$, and the effect size.
```{r, fig.width = 8, fig.height = 2.25}
draws %>%
select(.draw, `b_Intercept[1]`, `b_Intercept[6]`) %>%
mutate(`y[1]` = 1.5,
`y[6]` = 6.5) %>%
mutate(mu = `y[1]` - `b_Intercept[1]` * 1,
sigma = (`y[1]` - `y[6]`) / (`b_Intercept[1]` - `b_Intercept[6]`)) %>%
mutate(`(mu-4)/sigma` = (mu - 4) / sigma) %>%
pivot_longer(mu:`(mu-4)/sigma`) %>%
mutate(name = factor(name, levels = c("mu", "sigma", "(mu-4)/sigma"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8],
normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
facet_wrap(~ name, scales = "free", labeller = label_parsed)
```
Use `pp_check()` to make our version of the upper-right panel of Figure 23.3.
```{r, fig.width = 4, fig.height = 2.5, message = F, warning = F}
set.seed(23)
pp_check(fit23.2, type = "bars", ndraws = 1000, fatten = 2) +
scale_x_continuous("y", breaks = 1:7) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
ggtitle("Data with posterior predictions",
subtitle = "N = 70") +
theme(legend.background = element_blank(),
legend.position = c(.9, .8))
```
Just as in the text, "the posterior predictive distribution in the top-right subpanel accurately describes the bimodal distribution of the outcomes" (p. 680).
Here are the probability distributions for each of the 7 categories of `Y`.
```{r, fig.width = 6, fig.height = 2.25}
draws %>%
select(-.draw) %>%
mutate_all(.funs = ~pnorm(. ,0, 1)) %>%
transmute(`p[Y==1]` = `b_Intercept[1]`,
`p[Y==2]` = `b_Intercept[2]` - `b_Intercept[1]`,
`p[Y==3]` = `b_Intercept[3]` - `b_Intercept[2]`,
`p[Y==4]` = `b_Intercept[4]` - `b_Intercept[3]`,
`p[Y==5]` = `b_Intercept[5]` - `b_Intercept[4]`,
`p[Y==6]` = `b_Intercept[6]` - `b_Intercept[5]`,
`p[Y==7]` = 1 - `b_Intercept[6]`) %>%
set_names(1:7) %>%
pivot_longer(everything(), names_to = "Y") %>%
ggplot(aes(x = value, y = Y)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8], size = 1/2) +
scale_x_continuous(expression(italic(p)*"["*Y==italic(i)*"]"),
breaks = 0:5 / 5,
expand = c(0, 0), limits = c(0, 1))
```
Before we move on, it might be helpful to nail down what the thresholds mean within the context of our **brms** parameterization. To keep things simple, we'll focus on their posterior means.
```{r, fig.width = 5, fig.height = 3}
tibble(x = seq(from = -3.5, to = 3.5, by = .01)) %>%
mutate(d = dnorm(x)) %>%
ggplot(aes(x = x, ymin = 0, ymax = d)) +
geom_ribbon(fill = sl[3]) +
geom_vline(xintercept = fixef(fit23.2)[, 1], color = "white", linetype = 3) +
scale_x_continuous(NULL, breaks = fixef(fit23.2)[, 1],
labels = parse(text = str_c("theta[", 1:6, "]"))) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
ggtitle("Standard normal distribution underlying the ordinal Y data:",
subtitle = "The dashed vertical lines mark the posterior means for the thresholds.") +
coord_cartesian(xlim = c(-3, 3))
```
Compare that to Figure 23.1.
#### Not the same results as pretending the data are metric.
"In some conventional approaches to ordinal data, the data are treated as if they were metric and normally distributed" (p. 681). Here's what that `brms::brm()` model might look like using methods from back in Chapter 16. First, we'll define our `stanvars`.
```{r}
mean_y <- mean(my_data_1$Y)
sd_y <- sd(my_data_1$Y)
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y")
```
Fit the model.
```{r fit23.3}
fit23.3 <-
brm(data = my_data_1,
family = gaussian,
Y ~ 1,
prior = c(prior(normal(mean_y, sd_y * 100), class = Intercept),
prior(normal(0, sd_y), class = sigma)),
chains = 4, cores = 4,
stanvars = stanvars,
seed = 23,
file = "fits/fit23.03")
```
Check the results.
```{r}
print(fit23.3)
```
As Kruschke indicated in the text, it yielded a distributional mean of about 1.95 and a standard deviation of about 1.41. Here we'll use a posterior predictive check to compare histograms of data generated from this model to that of the original data.
```{r, fig.width = 6, fig.height = 3.5}
pp_check(fit23.3, type = "hist", ndraws = 10, binwidth = 1) +
scale_x_continuous(breaks = seq(from = -3, to = 7, by = 2)) +
theme(legend.position = c(.9, .15))
```
Yeah, that's not a good fit. We won't be conducting a $t$-test like Kruschke did on page 681. But we might compromise and take a look at the marginal distribution of the intercept (i.e., for $\mu$) and its difference from 2, the reference value.
```{r, fig.width = 6, fig.height = 2.25, warning = F}
as_draws_df(fit23.3) %>%
mutate(`2 - b_Intercept` = 2 - b_Intercept,
`effect size` = (2 - b_Intercept) / sigma) %>%
pivot_longer(`2 - b_Intercept`:`effect size`) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8], normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
facet_wrap(~ name, scales = "free")
```
Yes indeed, 2 is a credible value for the intercept. And as reported in the text, we got a very small $d$ effect size. Now we repeat the process for the second data set.
```{r fit23.4}
mean_y <- mean(my_data_2$Y)
sd_y <- sd(my_data_2$Y)
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y")
fit23.4 <-
update(fit23.3,
newdata = my_data_2,
chains = 4, cores = 4,
stanvars = stanvars,
seed = 23,
file = "fits/fit23.04")
```
Let's just jump to the plot. This time we're comparing the `b_Intercept` to the value of 4.0.
```{r, fig.width = 6, fig.height = 2.25, warning = F}
as_draws_df(fit23.4) %>%
mutate(`2 - b_Intercept` = 4 - b_Intercept,
`effect size` = (4 - b_Intercept) / sigma) %>%
pivot_longer(`2 - b_Intercept`:`effect size`) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(point_interval = mode_hdi, .width = .95,
fill = sl[4], color = sl[8], normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
facet_wrap(~ name, scales = "free")
```
As in the text, our $d$ is centered around 0.3. Let's use a posterior predictive check to see how well `fit23.4` summarized these data.
```{r, fig.width = 6, fig.height = 3.5}
pp_check(fit23.4, type = "hist", ndraws = 10, binwidth = 1) +
scale_x_continuous(breaks = seq(from = -3, to = 7, by = 2)) +
theme(legend.position = c(.9, .15))
```
The histograms aren't as awful as the ones from the previous model. But they're still not great. We might further inspect the model misspecification with a cumulative distribution function overlay, this time comparing `fit23.2` directly to `fit23.4`.
```{r, fig.width = 6.25, fig.height = 2.75, warning = F, message = F}
p1 <-
pp_check(fit23.2, type = "ecdf_overlay", ndraws = 50) +
ggtitle("Cumulative-normal (fit23.2)")
p2 <-
pp_check(fit23.4, type = "ecdf_overlay", ndraws = 50) +
ggtitle("Conventional-normal (fit23.4)")
(p1 + p2 &
scale_x_continuous(breaks = 0:7, limits = c(0, 7)) &
scale_y_continuous(expand = c(0, 0)) &
theme(title = element_text(size = 10.5))) +
plot_layout(guides = 'collect')
```
"Which of the analyses yields the more trustworthy conclusion? The one that describes the data better. In these cases, there is no doubt that the cumulative-normal model is the better description of the data" than the conventional Gaussian model (p. 682).
#### Ordinal outcomes versus Likert scales.
Just for fun,
> rate how much you agree with the statement, "Bayesian estimation is more informative than null-hypothesis significance testing," by selecting one option from the following: $1$ = strongly disagree; $2$ = disagree; $3$ = undecided; $4$ = agree; $5$ = strongly agree. This sort of ordinal response interface is often called a Likert-type response [@likertTechniqueMeasurementAttitudes1932], pronounced LICK-ert not LIKE-ert). Sometimes, it is called a Likert "scale" but the term "scale" in this context is more properly reserved for referring to an underlying metric variable that is indicated by the arithmetic mean of several meaningfully related Likert-type responses [e.g., @carifio2007ten; @carifioResolving50yearDebate2008; @normanLikertScalesLevels2010]. (p. 681)
Kruschke then briefly introduced how one might combine several such meaningfully-related Likert-type responses with latent variable methods. He then clarified this text will not explore that approach, further. The current version of **brms** (i.e., [2.12.0](https://CRAN.R-project.org/package=brms/news/news.html)) has very limited latent variable capacities. However, they are in the works. Interested modelers can follow Bürkner's progress in [GitHub issue #304](https://github.com/paul-buerkner/brms/issues/304). He also has a [-@burknerBayesianItemResponse2020] [paper](https://arxiv.org/pdf/1905.09501.pdf) on how one might use **brms** to fit item response theory models, which can be viewed as a special family of latent variable models. One can also fit Bayesian latent variable models with the [**blavaan** package](https://faculty.missouri.edu/~merklee/blavaan/).
## The case of two groups
> In both examples in the preceding text, the two groups of outcomes were on the same ordinal scale. In the first example, both questionnaire statements were answered on the same disagree–agree scale. In the second example, both groups responded on the same very unhappy–very happy scale. Therefore, we assume that both groups have the same underlying metric variable with the same thresholds. (p. 682)
### Implementation in ~~JAGS~~ **brms**.
The `brm()` syntax for adding a single categorical predictor to an ordered-probit model is much like that for any other likelihood. We just add the variable name to the right side of the `~` in the `formula` argument. If you're like me and like to use the verbose `1` syntax for your model intercepts--thresholds in these models--just use the `+` operator between them. For example, this is what it'd look like for an ordered-categorical criterion `y` and a single categorical predictor `x`.
```{r, eval = F}
fit <-
brm(data = my_data,
family = cumulative(probit),
y ~ 1 + x,
prior = c(prior(normal(0, 4), class = Intercept),
prior(normal(0, 4), class = b)))
```
Also of note, we've expanded the `prior` section to include a line for `class = b`. As with the thresholds, interpret this prior through the context of the underlying standard normal cumulative distribution, $\Phi(z)$. Note the interpretation, though. By **brms** defaults, the underlying Gaussian for the reference category of `x` will be $\operatorname{Normal}(0, 1)$. Thus whatever parameter value you get for the other categories in `x`, those will be standardized mean differences, making them a kind of effect size.