-
Notifications
You must be signed in to change notification settings - Fork 0
/
19.Rmd
2743 lines (2199 loc) · 120 KB
/
19.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = FALSE, cache = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 100)
```
# Metric Predicted Variable with One Nominal Predictor
> This chapter considers data structures that consist of a metric predicted variable and a nominal predictor.... This type of data structure can arise from experiments or from observational studies. In experiments, the researcher assigns the categories (at random) to the experimental subjects. In observational studies, both the nominal predictor value and the metric predicted value are generated by processes outside the direct control of the researcher. In either case, the same mathematical description can be applied to the data (although causality is best inferred from experimental intervention).
>
> The traditional treatment of this sort of data structure is called single-factor analysis of variance (ANOVA), or sometimes one-way ANOVA. Our Bayesian approach will be a hierarchical generalization of the traditional ANOVA model. The chapter will also consider the situation in which there is also a metric predictor that accompanies the primary nominal predictor. The metric predictor is sometimes called a covariate, and the traditional treatment of this data structure is called analysis of covariance (ANCOVA). The chapter also considers generalizations of the traditional models, because it is straight forward in Bayesian software to implement heavy-tailed distributions to accommodate outliers, along with hierarchical structure to accommodate heterogeneous variances in the different groups, etc. [@kruschkeDoingBayesianData2015, pp. 553--554]
## Describing multiple groups of metric data
> Figure 19.1 illustrates the conventional description of grouped metric data. Each group is represented as a position on the horizontal axis. The vertical axis represents the variable to be predicted by group membership. The data are assumed to be normally distributed within groups, with equal standard deviation in all groups. The group means are deflections from overall baseline, such that the deflections sum to zero. Figure 19.1 provides a specific numerical example, with data that were randomly generated from the model. (p. 554)
We'll want a custom data-generating function for our primary group data.
```{r, warning = F, message = F}
library(tidyverse)
generate_data <- function(seed, mean) {
set.seed(seed)
rnorm(n, mean = grand_mean + mean, sd = 2)
}
n <- 20
grand_mean <- 101
d <-
tibble(group = 1:5,
deviation = c(4, -5, -2, 6, -3)) %>%
mutate(d = map2(group, deviation, generate_data)) %>%
unnest(d) %>%
mutate(iteration = rep(1:n, times = 5))
glimpse(d)
```
Here we'll make a tibble containing the necessary data for the rotated Gaussians. As far as I can tell, Kruschke's Gaussians only span to the bounds of percentile-based 98% intervals. We partition off those bounds for each `group` by the `ll` and `ul` columns in the first `mutate()` function. In the second `mutate()`, we expand the dataset to include a sequence of 100 values between those lower- and upper-limit points. In the third `mutate()`, we feed those points into the `dnorm()` function, with group-specific means and a common `sd`.
```{r}
densities <-
d %>%
distinct(group, deviation) %>%
mutate(ll = qnorm(.01, mean = grand_mean + deviation, sd = 2),
ul = qnorm(.99, mean = grand_mean + deviation, sd = 2)) %>%
mutate(d = map2(ll, ul, seq, length.out = 100)) %>%
mutate(density = map2(d, grand_mean + deviation, dnorm, sd = 2)) %>%
unnest(c(d, density))
head(densities)
```
We'll need two more supplementary tibbles to add the flourishes to the plot. The `arrow` tibble will specify our light-gray arrows. The `text` tibble will contain our annotation information.
```{r}
arrow <-
tibble(d = grand_mean,
group = 1:5,
deviation = c(4, -5, -2, 6, -3),
offset = .1)
head(arrow)
text <-
tibble(d = grand_mean,
group = c(0:5, 0),
deviation = c(0, 4, -5, -2, 6, -3, 10),
offset = rep(c(1/4, 0), times = c(6, 1)),
angle = rep(c(90, 0), times = c(6, 1)),
label = c("beta[0]==101", "beta['[1]']==4","beta['[2]']==-5", "beta['[3]']==-2", "beta['[4]']==6", "beta['[5]']==3", "sigma['all']==2"))
head(text)
```
We're almost ready to plot. Before we do, let's talk color and theme. For this chapter, we'll take our color palette from the [**palettetown** package](https://CRAN.R-project.org/package=palettetown) [@R-palettetown], which provides an array of color palettes inspired by [Pokémon](https://www.pokemon.com/us/). Our color palette will be #17, which is based on [Pidgeotto](https://www.pokemon.com/us/pokedex/pidgeotto).
```{r, warning = F, message = F, fig.height = 4}
library(palettetown)
scales::show_col(pokepal(pokemon = 17))
pp <- pokepal(pokemon = 17)
pp
```
Our overall plot theme will be based on the default `theme_grey()` with a good number of adjustments.
```{r}
theme_set(
theme_grey() +
theme(text = element_text(color = pp[4]),
axis.text = element_text(color = pp[4]),
axis.ticks = element_line(color = pp[4]),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.key = element_rect(fill = pp[9]),
panel.background = element_rect(fill = pp[9], color = pp[9]),
panel.grid = element_blank(),
plot.background = element_rect(fill = pp[12], color = pp[12]),
strip.background = element_rect(fill = alpha(pp[2], 1/3), color = "transparent"),
strip.text = element_text(color = pp[4]))
)
```
Now make Figure 19.1.
```{r, fig.width = 6, fig.height = 2.75, warning = F, message = F}
library(ggridges)
d %>%
ggplot(aes(x = d, y = group, group = group)) +
geom_vline(xintercept = grand_mean, color = pp[12]) +
geom_jitter(height = .05, alpha = 4/4, shape = 1, color = pp[10]) +
# the Gaussians
geom_ridgeline(data = densities,
aes(height = -density),
min_height = NA, scale = 3/2, size = 3/4,
fill = "transparent", color = pp[7]) +
# the small arrows
geom_segment(data = arrow,
aes(xend = d + deviation,
y = group + offset, yend = group + offset),
color = pp[5], size = 1,
arrow = arrow(length = unit(.2, "cm"))) +
# the large arrow on the left
geom_segment(aes(x = 80, xend = grand_mean,
y = 0, yend = 0),
color = pp[5], size = 3/4,
arrow = arrow(length = unit(.2, "cm"))) +
# the text
geom_text(data = text,
aes(x = grand_mean + deviation, y = group + offset,
label = label, angle = angle),
size = 4, color = pp[4], parse = T) +
scale_y_continuous(NULL, breaks = 1:5,
labels = c("<1,0,0,0,0>", "<0,1,0,0,0>", "<0,0,1,0,0>", "<0,0,0,1,0>", "<0,0,0,0,1>")) +
xlab(NULL) +
coord_flip(xlim = c(90, 112),
ylim = c(-0.2, 5.5))
```
> The descriptive model presented in Figure 19.1 is the traditional one used by classical ANOVA (which is described a bit more in the next section). More general models are straight forward to implement in Bayesian software. For example, outliers could be accommodated by using heavy-tailed noise distributions (such as a $t$ distribution) instead of a normal distribution, and different groups could be given different standard deviations. (p. 556)
## Traditional analysis of variance
> The terminology, "analysis of variance," comes from a decomposition of overall data variance into within-group variance and between-group variance [@fisherStatisticalMethodsResearch1925]. Algebraically, the sum of squared deviations of the scores from their overall mean equals the sum of squared deviations of the scores from their respective group means plus the sum of squared deviations of the group means from the overall mean. In other words, the total variance can be partitioned into within-group variance plus between-group variance. Because one definition of the word "analysis" is separation into constituent parts, the term ANOVA accurately describes the underlying algebra in the traditional methods. That algebraic relation is not used in the hierarchical Bayesian approach presented here. The Bayesian method can estimate component variances, however. Therefore, the Bayesian approach is not ANOVA, but is analogous to ANOVA. (p. 556)
## Hierarchical Bayesian approach
"Our goal is to estimate its parameters in a Bayesian framework. Therefore, all the parameters need to be given a meaningfully structured prior distribution" (p. 557). However, our approach will depart a little from the one in the text. All our parameters will **not** "have generic noncommittal prior distributions" (p. 557). Most importantly, we will not follow the example in [@gelmanPriorDistributionsVariance2006] of putting a broad uniform prior on $\sigma_y$. Rather, we will continue using the half-Gaussian prior, as [recommended by the Stan team](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations). However, we will follow Kruschke's lead for the overall intercept and use a Gaussian prior "made broad on the scale of the data" (p. 557). And like Kruschke, we will estimate $\sigma_\beta$ from the data. To further get a sense of this, let's make our version of the hierarchical model diagram of Figure 19.2.
```{r, fig.width = 6.5, fig.height = 5.75, message = F}
library(patchwork)
# bracket
p1 <-
tibble(x = .99,
y = .5,
label = "{_}") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = 10, hjust = 1, color = pp[8], family = "Times") +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
## plain arrow
# save our custom arrow settings
my_arrow <- arrow(angle = 20, length = unit(0.35, "cm"), type = "closed")
p2 <-
tibble(x = .72,
y = 1,
xend = .68,
yend = .25) %>%
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# normal density
p3 <-
tibble(x = seq(from = -3, to = 3, by = .1)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = .2,
label = "normal",
size = 7, color = pp[4]) +
annotate(geom = "text",
x = c(0, 1.45), y = .6,
hjust = c(.5, 0),
label = c("italic(M)[0]", "italic(S)[0]"),
size = 7, color = pp[4], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(size = 0.5, color = pp[4]))
# second normal density
p4 <-
tibble(x = seq(from = -3, to = 3, by = .1)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = .2,
label = "normal",
size = 7, color = pp[4]) +
annotate(geom = "text",
x = c(0, 1.45), y = .6,
hjust = c(.5, 0),
label = c("0", "sigma[beta]"),
size = 7, color = pp[4], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(size = 0.5, color = pp[4]))
# two annotated arrows
p5 <-
tibble(x = c(.16, .81),
y = c(1, 1),
xend = c(.47, .77),
yend = c(0, 0)) %>%
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
annotate(geom = "text",
x = c(.25, .74, .83), y = .5,
label = c("'~'", "'~'", "italic(j)"),
size = c(10, 10, 7),
color = pp[4], family = "Times", parse = T) +
xlim(0, 1) +
theme_void()
# likelihood formula
p6 <-
tibble(x = .99,
y = .25,
label = "beta[0]+sum()[italic(j)]*beta['['*italic(j)*']']*italic(x)['['*italic(j)*']'](italic(i))") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(hjust = 1, size = 7, color = pp[4], parse = T, family = "Times") +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# half-normal density
p7 <-
tibble(x = seq(from = 0, to = 3, by = .01)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 1.5, y = .2,
label = "half-normal",
size = 7, color = pp[4]) +
annotate(geom = "text",
x = 1.5, y = .6,
label = "0*','*~italic(S)[sigma]",
size = 7, color = pp[4], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(size = 0.5, color = pp[4]))
# annotated arrow
p8 <-
tibble(x = .38,
y = .65,
label = "'='") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = 10, color = pp[4], parse = T, family = "Times") +
geom_segment(x = .5, xend = .5,
y = 1, yend = .25,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# the third normal density
p9 <-
tibble(x = seq(from = -3, to = 3, by = .1)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = .2,
label = "normal",
size = 7, color = pp[4]) +
annotate(geom = "text",
x = c(0, 1.45), y = .6,
hjust = c(.5, 0),
label = c("mu[italic(i)]", "sigma[italic(y)]"),
size = 7, color = pp[4], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(size = 0.5, color = pp[4]))
# another annotated arrow
p10 <-
tibble(x = .55,
y = .6,
label = "'~'") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = 10, color = pp[4], parse = T, family = "Times") +
geom_segment(x = .82, xend = .38,
y = 1, yend = .2,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# the final annotated arrow
p11 <-
tibble(x = c(.375, .625),
y = c(1/3, 1/3),
label = c("'~'", "italic(i)")) %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = c(10, 7),
color = pp[4], parse = T, family = "Times") +
geom_segment(x = .5, xend = .5,
y = 1, yend = 0,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# some text
p12 <-
tibble(x = .5,
y = .5,
label = "italic(y[i])") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = 7, color = pp[4], parse = T, family = "Times") +
xlim(0, 1) +
theme_void()
# define the layout
layout <- c(
area(t = 1, b = 1, l = 6, r = 7),
area(t = 2, b = 3, l = 6, r = 7),
area(t = 3, b = 4, l = 1, r = 3),
area(t = 3, b = 4, l = 5, r = 7),
area(t = 6, b = 7, l = 1, r = 7),
area(t = 5, b = 6, l = 1, r = 7),
area(t = 6, b = 7, l = 9, r = 11),
area(t = 9, b = 10, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 11),
area(t = 11, b = 11, l = 5, r = 7),
area(t = 12, b = 12, l = 5, r = 7)
)
# combine and plot!
(p1 + p2 + p3 + p4 + p6 + p5 + p7 + p9 + p8 + p10 + p11 + p12) +
plot_layout(design = layout) &
ylim(0, 1) &
theme(plot.margin = margin(0, 5.5, 0, 5.5))
```
Later on in the text, Kruschke opined:
> A crucial pre-requisite for estimating $\sigma_\beta$ from all the groups is an assumption that all the groups are representative and informative for the estimate. It only makes sense to influence the estimate of one group with data from the other groups if the groups can be meaningfully described as representative of a shared higher-level distribution. (p. 559)
Although I agree with him in spirit, this doesn't appear to strictly be the case. As odd and paradoxical as this sounds, partial pooling can be of use even when the some of the cases are of a different kind. For more on the topic, see Efron and Morris's classic [-@efronSteinParadoxStatistics1977] paper, [*Stein's paradox in statistics*](http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf), and [my blog post](https://solomonkurz.netlify.com/post/stein-s-paradox-and-what-partial-pooling-can-do-for-you/) walking out one of their examples in **brms**.
### Implementation in ~~JAGS~~ brms.
The **brms** setup, of course, differs a bit from JAGS.
```{r, eval = F}
fit <-
brm(data = my_data,
family = gaussian,
y ~ 1 + (1 | categirical_variable),
prior = c(prior(normal(0, x), class = Intercept),
prior(normal(0, x), class = b),
prior(cauchy(0, x), class = sd),
prior(cauchy(0, x), class = sigma)))
```
The noise standard deviation $\sigma_y$ is depicted in the prior statement including the argument `class = sigma`. The grand mean is depicted by the first `1` in the model formula and its prior is indicated by the `class = Intercept` argument. We indicate we'd like group-based deviations from the grand mean with the `(1 | categirical_variable)` syntax, where the `1` on the left side of the bar indicates we'd like our intercepts to vary by group and the `categirical_variable` part simply represents the name of a given categorical variable we'd like those intercepts to vary by. The **brms** default is to do this with deviance scores, the mean for which will be zero. Although it's not obvious in the formula syntax, the model presumes the group-based deviations are normally distributed with a mean of zero and a standard deviation, which Kruschke termed $\sigma_\beta$. There is no prior for the mean. It's set at zero. But there is a prior for $\sigma_\beta$, which is denoted by the argument `class = sd`. We, of course, are not using a uniform prior on any of our variance parameters. But in order to be weakly informative, we will use the half-Cauchy. Recall that since the **brms** default is to set the lower bound for any variance parameter to 0, there's no need to worry about doing so ourselves. So even though the syntax only indicates `cauchy`, it's understood to mean Cauchy with a lower bound at zero; since the mean is usually 0, that makes is a half-Cauchy.
Kruschke set the upper bound for his $\sigma_y$ to 10 times the standard deviation of the criterion variable. The tails of the half-Cauchy are sufficiently fat that, in practice, I've found it doesn't matter much what you set the $SD$ of its prior to. One is often a sensible default for reasonably-scaled data. But if we want to take a more principled approach, we can set it to the size of the criterion's $SD$ or perhaps even 10 times that.
Kruschke suggested using a gamma on $\sigma_\beta$, which is a sensible alternative to half-Cauchy often used within the Stan universe. Especially in situations in which you would like to (a) keep the variance parameter above zero, but (b) still allow it to be arbitrarily close to zero, and also (c) let the likelihood dominate the posterior, the Stan team recommends the gamma(2, 0) prior, based on the paper by Chung and colleagues [-@chungNondegeneratePenalizedLikelihood2013, click [here](http://www.stat.columbia.edu/~gelman/research/published/chung_etal_Pmetrika2013.pdf)]. But you should note that I don't mean a literal 0 for the second parameter in the gamma distribution, but rather some small value like 0.1 or so. This is all clarified in @chungNondegeneratePenalizedLikelihood2013. Here's what $\operatorname{gamma}(2, 0.1)$ looks like.
```{r, fig.width = 6, fig.height = 2}
tibble(x = seq(from = 0, to = 110, by = .1)) %>%
ggplot(aes(x = x, y = dgamma(x, 2, 0.1))) +
geom_area(fill = pp[10]) +
annotate(geom = "text",
x = 14.25, y = 0.015, label = "'gamma'*(2*', '*0.1)",
parse = T, color = pp[1], size = 4.25) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 110)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)))
```
If you'd like that prior be even less informative, just reduce it to like $\operatorname{gamma}(2, 0.01)$ or so. Kruschke goes further to recommend "the shape and rate parameters of the gamma distribution are set so its mode is `sd(y)/2` and its standard deviation is `2*sd(y)`, using the function `gammaShRaFromModeSD` explained in [Section 9.2.2][A realistic model with MCMC.]." (pp. 560--561). Let's make that function.
```{r}
gamma_a_b_from_omega_sigma <- function(mode, sd) {
if (mode <= 0) stop("mode must be > 0")
if (sd <= 0) stop("sd must be > 0")
rate <- (mode + sqrt(mode^2 + 4 * sd^2)) / (2 * sd^2)
shape <- 1 + mode * rate
return(list(shape = shape, rate = rate))
}
```
So in the case of standardized data where `sd(1)` = 1, we'd use our `gamma_a_b_from_omega_sigma()` function like so.
```{r}
sd_y <- 1
omega <- sd_y / 2
sigma <- 2 * sd_y
(s_r <- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma))
```
And that produces the following gamma distribution.
```{r, fig.width = 6, fig.height = 2}
tibble(x = seq(from = 0, to = 21, by = .01)) %>%
ggplot(aes(x = x, y = dgamma(x, s_r$shape, s_r$rate))) +
geom_area(fill = pp[8]) +
annotate(geom = "text",
x = 2.75, y = 0.02, label = "'gamma'*(1.283196*', '*0.5663911)",
parse = T, color = pp[7], size = 2.75) +
scale_x_continuous(breaks = c(0, 1, 5, 10, 20), expand = c(0, 0), limits = c(0, 21)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)))
```
In the parameter space that matters, from zero to one, that gamma is pretty noninformative. It peaks between the two, slopes very gently rightward, but has the nice steep slope on the left keeping the estimates off the zero boundary. And even though that right slope is very gentle given the scale of the data, it's aggressive enough that it should keep the MCMC chains from spending a lot of time in ridiculous parts of the parameter space. I.e., when working with finite numbers of iterations, we want our MCMC chains wasting exactly zero iterations investigating what the density might be for $\sigma_\beta \approx 1e10$ for standardized data.
### Example: Sex and death.
Let's load and `glimpse()` at Hanley and Shapiro's [-@hanleySexualActivityLifespan1994] fruit-fly data.
```{r, message = F}
my_data <- read_csv("data.R/FruitflyDataReduced.csv")
glimpse(my_data)
```
We can use `geom_density_ridges()` to help get a sense of how our criterion `Longevity` is distributed across groups of `CompanionNumber`.
```{r, fig.width = 6, fig.height = 2.5, message = F}
my_data %>%
group_by(CompanionNumber) %>%
mutate(group_mean = mean(Longevity)) %>%
ungroup() %>%
mutate(CompanionNumber = fct_reorder(CompanionNumber, group_mean)) %>%
ggplot(aes(x = Longevity, y = CompanionNumber, fill = group_mean)) +
geom_density_ridges(scale = 3/2, size = .2, color = pp[9]) +
scale_fill_gradient(low = pp[4], high = pp[2]) +
scale_x_continuous(expand = expansion(mult = c(0, 0.05)), limits = c(0, NA)) +
scale_y_discrete(NULL, expand = expansion(mult = c(0, 0.4))) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
legend.position = "none")
```
Let's fire up **brms**.
```{r, warning = F, message = F}
library(brms)
```
We'll want to do the preparatory work to define our `stanvars`.
```{r}
(mean_y <- mean(my_data$Longevity))
(sd_y <- sd(my_data$Longevity))
omega <- sd_y / 2
sigma <- 2 * sd_y
(s_r <- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma))
```
With the prep work is done, here are our `stanvars`.
```{r}
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta")
```
Now fit the model, our hierarchical Bayesian alternative to an ANOVA.
```{r fit19.1}
fit19.1 <-
brm(data = my_data,
family = gaussian,
Longevity ~ 1 + (1 | CompanionNumber),
prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(gamma(alpha, beta), class = sd),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.99),
stanvars = stanvars,
file = "fits/fit19.01")
```
Much like Kruschke's JAGS chains, our **brms** chains are well behaved.
```{r, fig.width = 8, fig.height = 4, message = F, warning = F}
library(bayesplot)
color_scheme_set(scheme = pp[c(10, 8, 12, 5, 1, 4)])
plot(fit19.1)
```
Also like Kruschke, our chains appear moderately autocorrelated.
```{r, fig.width = 7, fig.height = 4}
post <- posterior_samples(fit19.1, add_chain = T)
mcmc_acf(post, pars = c("b_Intercept", "sd_CompanionNumber__Intercept", "sigma"), lags = 10)
```
Here's the model summary.
```{r}
print(fit19.1)
```
With the `ranef()` function, we can get the summaries of the group-specific deflections.
```{r}
ranef(fit19.1)
```
And with the `coef()` function, we can get those same group-level summaries in a non-deflection metric.
```{r}
coef(fit19.1)
```
Those are all estimates of the group-specific means. Since it wasn't modeled, all have the same parameter estimates for $\sigma_y$.
```{r}
posterior_summary(fit19.1)["sigma", ]
```
To prepare for our version of the top panel of Figure 19.3, we'll use `sample_n()` to randomly sample from the posterior draws.
```{r}
# how many random draws from the posterior would you like?
n_draws <- 20
set.seed(19)
post_draws <-
post %>%
sample_n(size = n_draws, replace = F)
glimpse(post_draws)
```
Before we make our version of the top panel, let's make a corresponding plot of the fixed intercept, the grand mean. The most important lines in the code, below are the ones where we used `stat_function()` within `mapply()`.
```{r, fig.height = 2.75, fig.width = 4}
tibble(x = c(0, 150)) %>%
ggplot(aes(x = x)) +
mapply(function(mean, sd) {
stat_function(fun = dnorm,
args = list(mean = mean, sd = sd),
alpha = 2/3,
size = 1/3,
color = pp[4])
},
# enter means and standard deviations here
mean = post_draws[, "b_Intercept"],
sd = post_draws[, "sigma"]
) +
geom_jitter(data = my_data, aes(x = Longevity, y = -0.001),
height = .001,
alpha = 3/4, color = pp[10]) +
scale_x_continuous("Longevity", breaks = 0:4 * 25,
limits = c(0, NA), expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Posterior Predictive Distribution",
subtitle = "The jittered dots are the ungrouped Longevity data. The\nGaussians are posterior draws depicting the overall\ndistribution, the grand mean.") +
coord_cartesian(xlim = c(0, 110))
```
Unfortunately, we can't extend our `mapply(stat_function())` method to the group-level estimates. To my knowledge, there isn't a way to show the group estimates at different spots along the y-axis. And our `mapply(stat_function())` approach has other limitations, too. Happily, we have some great alternatives. To use them, we'll need a little help from **tidybayes**.
```{r, warning = F, message = F}
library(tidybayes)
```
For the first part, we'll take `tidybayes::add_fitted_draws()` for a whirl.
```{r}
densities <-
my_data %>%
distinct(CompanionNumber) %>%
add_fitted_draws(fit19.1, n = 20, seed = 19, dpar = c("mu", "sigma"))
glimpse(densities)
```
With the first two lines, we made a $5 \times 1$ tibble containing the five levels of the experimental grouping variable, `CompanionNumber`. The `add_fitted_draws()` function comes from **tidybayes** [see the [Posterior fits](https://mjskay.github.io/tidybayes/articles/tidy-brms.html#posterior-fits) section of @kayExtractingVisualizingTidy2020]. The first argument of the `add_fitted_draws()` is `newdata`, which works much like it does in `brms::fitted()`; it took our $5 \times 1$ tibble. The next argument took our **brms** model fit, `fit19.1`. With the `n` argument, we indicated we just wanted 20 random draws from the posterior. The `seed` argument makes those random draws reproducible. With `dpar`, we requested distributional regression parameters in the output. In our case, those were the $\mu$ and $\sigma$ values for each level of `CompanionNumber`. Since we took 20 draws across 5 groups, we ended up with a 100-row tibble.
The next steps are a direct extension of the method we used to make our Gaussians for our version of Figure 19.1.
```{r}
densities <-
densities %>%
mutate(ll = qnorm(.025, mean = mu, sd = sigma),
ul = qnorm(.975, mean = mu, sd = sigma)) %>%
mutate(Longevity = map2(ll, ul, seq, length.out = 100)) %>%
unnest(Longevity) %>%
mutate(density = dnorm(Longevity, mu, sigma))
glimpse(densities)
```
If you look at the code we used to make `ll` and `ul`, you'll see we used 95% intervals, this time. Our second `mutate()` function is basically the same. After unnesting the tibble, we just needed to plug in the `Longevity`, `mu`, and `sigma` values into the `dnorm()` function to compute the corresponding density values.
```{r, fig.height = 4.5, fig.width = 4}
densities %>%
ggplot(aes(x = Longevity, y = CompanionNumber)) +
# here we make our density lines
geom_ridgeline(aes(height = density, group = interaction(CompanionNumber, .draw)),
fill = NA, color = adjustcolor(pp[4], alpha.f = 2/3),
size = 1/3, scale = 25) +
# the original data with little jitter thrown in
geom_jitter(data = my_data,
height = .04, alpha = 3/4, color = pp[10]) +
# pretty much everything below this line is aesthetic fluff
scale_x_continuous(breaks = 0:4 * 25, limits = c(0, 110),
expand = expansion(mult = c(0, 0.05))) +
labs(title = "Data with Posterior Predictive Distrib.",
y = NULL) +
coord_cartesian(ylim = c(1.25, 5.25)) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
```
Do be aware that when you use this method, you may have to fiddle around with the `geom_ridgeline()` `scale` argument to get the Gaussian's heights on reasonable-looking relative heights. Stick in different numbers to get a sense of what I mean. I also find that I'm often not a fan of the way the spacing on the y axis ends up with default `geom_ridgeline()`. It's easy to overcome this with a little `ylim` fiddling.
To return to the more substantive interpretation, the top panel of
> Figure 19.3 suggests that the normal distributions with homogeneous variances appear to be reasonable descriptions of the data. There are no dramatic outliers relative to the posterior predicted curves, and the spread of the data within each group appears to be reasonably matched by the width of the posterior normal curves. (Be careful when making visual assessments of homogeneity of variance because the visual spread of the data depends on the sample size; for a reminder see the [see the right panel of Figure 17.1, p. 478].) The range of credible group means, indicated by the peaks of the normal curves, suggests that the group Virgin8 is clearly lower than the others, and the group Virgin1 might be lower than the controls. To find out for sure, we need to examine the differences of group means, which we do in the next section. (p. 564)
For clarity, the "see the right panel of Figure 17.1, p. 478" part was changed following Kruschke's [Corrigenda](https://sites.google.com/site/doingbayesiandataanalysis/corrigenda).
### Contrasts.
> It is straight forward to examine the posterior distribution of credible differences. Every step in the MCMC chain provides a combination of group means that are jointly credible, given the data. Therefore, every step in the MCMC chain provides a credible difference between groups...
>
>To construct the credible differences of group 1 and group 2, at every step in the MCMC chain we compute
>
> \begin{align*}
> \mu_1 - \mu_2 & = (\beta_0 + \beta_1) - (\beta_0 + \beta_2) \\
> & = (+1) \cdot \beta_1 + (-1) \cdot \beta_2
> \end{align*}
>
> In other words, the baseline cancels out of the calculation, and the difference is a sum of weighted group deflections. Notice that the weights sum to zero. To construct the credible differences of the average of groups 1-3 and the average of groups 4-5, at every step in the MCMC chain we compute
>
> \small{
> \begin{align*}
> (\mu_1 + \mu_2 + \mu_3) / 3 - (\mu_4 + \mu_5) / 2 & = ((\beta_0 + \beta_1) + (\beta_0 + \beta_2) + (\beta_0 + \beta_3) ) / 3 - ((\beta_0 + \beta_4) + (\beta_0 + \beta_5) ) / 2 \\
> & = (\beta_1 + \beta_2 + \beta_3) / 3 - (\beta_4 + \beta_5) / 2 \\
> & = (+ 1/3) \cdot \beta_1 + (+ 1/3) \cdot \beta_2 + (+ 1/3) \cdot \beta_3 + (- 1/2) \cdot \beta_4 + (- 1/2) \cdot \beta_5
> \end{align*}
> }
>
> Again, the difference is a sum of weighted group deflections. The coefficients on the group deflections have the properties that they sum to zero, with the positive coefficients summing to +1 and the negative coefficients summing to −1. Such a combination is called a contrast. The differences can also be expressed in terms of effect size, by dividing the difference by $\sigma_y$ at each step in the chain. (pp. 565--566)
To warm up, here's how to compute the first contrast shown in the lower portion of Kruschke's Figure 19.3--the contrast between the two pregnant conditions and the none-control condition.
```{r, fig.width = 3, fig.height = 2.5}
post %>%
transmute(c = (`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant8,Intercept]`) / 2 - `r_CompanionNumber[None0,Intercept]`) %>%
ggplot(aes(x = c, y = 0)) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_color = pp[5], slab_fill = pp[5], color = pp[4]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Pregnant1.Pregnant8 vs None0",
x = expression(Difference~(mu[1]+mu[2])/2-mu[3]))
```
Up to this point, our primary mode of showing marginal posterior distributions has either been minute variations on Kruschke's typical histogram approach or with densities. We'll use those again in the future, too. In this chapter and the next, we'll veer a little further from the source material and depict our marginal posteriors with dot plots and their very close relatives, quantile plots. In the dot plot, above, each of the 4,000 posterior draws is depicted by one of the stacked brown dots. To stack the dots in neat columns like that, **tidybayes** has to round a little. Though we lose something in the numeric precision, we gain a lot in interpretability. We'll have more to say in just a moment.
In case you were curious, here are the HMC-based posterior mode and 95% HDIs.
```{r}
post %>%
transmute(difference = (`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant8,Intercept]`) / 2 - `r_CompanionNumber[None0,Intercept]`) %>%
mode_hdi(difference)
```
Little difference, there. Now let's quantify the same contrast as a standardized mean difference effect size.
```{r, fig.width = 3, fig.height = 2.5}
post %>%
transmute(es = ((`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant8,Intercept]`) / 2 - `r_CompanionNumber[None0,Intercept]`) / sigma) %>%
ggplot(aes(x = es, y = 0)) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_fill = pp[5], color = pp[4],
slab_size = 0, quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Pregnant1.Pregnant8 vs None0",
x = expression(Effect~Size~(Difference/sigma[italic(y)])))
```
From a standardized-mean-difference perspective, that's tiny. Also note that because our model `fit19.1` did not allow the standard deviation parameter $\sigma_y$ to vary across groups, $\sigma_y$ is effectively a pooled standard deviation ($\sigma_p$).
Did you notice the `quantiles = 100` argument within `stat_dotsinterval()`? Instead of a dot plot with 4,000 tiny little dots, that argument converted the output to a quantile plot. The 4,000 posterior draws are now summarized by 100 dots, each of which represents $1\%$ of the total sample [see @kayWhenIshMy2016, @fernandesUncertaintyDisplaysUsing2018]. This quantile dot-plot method will be our main approach for the rest of the chapter.
Okay, now let's do the rest in bulk. First we'll do the difference scores.
```{r, fig.width = 8, fig.height = 2.5}
differences <-
post %>%
transmute(`Pregnant1.Pregnant8.None0 vs Virgin1` = (`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant8,Intercept]` + `r_CompanionNumber[None0,Intercept]`) / 3 - `r_CompanionNumber[Virgin1,Intercept]`,
`Virgin1 vs Virgin8` = `r_CompanionNumber[Virgin1,Intercept]` - `r_CompanionNumber[Virgin8,Intercept]`,
`Pregnant1.Pregnant8.None0 vs Virgin1.Virgin8` = (`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant8,Intercept]` + `r_CompanionNumber[None0,Intercept]`) / 3 - (`r_CompanionNumber[Virgin1,Intercept]` + `r_CompanionNumber[Virgin8,Intercept]`) / 2)
differences %>%
gather() %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_fill = pp[5], color = pp[4],
slab_size = 0, quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Difference") +
facet_wrap(~ key, scales = "free")
```
Because we save our data wrangling labor from above as `differences`, it won't take much more effort to compute and plot the corresponding effect sizes as displayed in the bottom row of Figure 19.3.
```{r, fig.width = 8, fig.height = 2.5}
differences %>%
mutate_all(.funs = ~ . / post$sigma) %>%
gather() %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_fill = pp[5], color = pp[4],
slab_size = 0, quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Effect Size (Standardized mean difference)") +
facet_wrap(~ key, scales = "free_x")
```
> In traditional ANOVA, analysts often perform a so-called omnibus test that asks whether it is plausible that all the groups are simultaneously exactly equal. I find that the omnibus test is rarely meaningful, however.... In the hierarchical Bayesian estimation used here, there is no direct equivalent to an omnibus test in ANOVA, and the emphasis is on examining all the meaningful contrasts. (p. 567)
Speaking of all meaningful contrasts, if you'd like to make all pairwise comparisons in a hierarchical model of this form, **tidybayes** offers a convenient way to do so [see the [Comparing levels of a factor](https://mjskay.github.io/tidybayes/articles/tidy-brms.html#comparing-levels-of-a-factor) section of @kayExtractingVisualizingTidy2020]. Here we'll demonstrate with `stat_dotsinterval()`.
```{r, fig.height = 4.5, fig.width = 6}
fit19.1 %>%
# these two lines are where the magic is at
spread_draws(r_CompanionNumber[CompanionNumber,]) %>%
compare_levels(r_CompanionNumber, by = CompanionNumber) %>%
ggplot(aes(x = r_CompanionNumber, y = CompanionNumber)) +
geom_vline(xintercept = 0, color = pp[12]) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_fill = pp[5], color = pp[4],
slab_size = 0, quantiles = 100) +
labs(x = "Contrast",
y = NULL) +
coord_cartesian(ylim = c(1.5, 10.5)) +
theme(axis.text.y = element_text(hjust = 0))
```
But back to that omnibus test notion. If you really wanted to, I suppose one rough analogue would be to use information criteria to compare the hierarchical model to one that includes a single intercept with no group-level deflections. Here's what the simpler model would look like.
```{r fit19.2}
fit19.2 <-
brm(data = my_data,
family = gaussian,
Longevity ~ 1,
prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.02")
```
Here's the model summary.
```{r}
print(fit19.2)
```
Here are their LOO values and their difference score.
```{r, message = F}
fit19.1 <- add_criterion(fit19.1, criterion = "loo")
fit19.2 <- add_criterion(fit19.2, criterion = "loo")
loo_compare(fit19.1, fit19.2) %>%
print(simplify = F)
```
The hierarchical model has a better LOO. Here are the stacking-based model weights.
```{r}
(mw <- model_weights(fit19.1, fit19.2))
```
If you don't like scientific notation, just `round()`.
```{r}
mw %>%
round(digits = 3)
```
Yep, in complimenting the LOO difference, virtually all the stacking weight went to the hierarchical model. You might think of this another way. The conceptual question we're asking is *Does it make sense to say that the* $\sigma_\beta$ *parameter is zero? Is zero a credible value?* We'll, I suppose we could just look at the posterior to assess for that.
```{r, fig.width = 4, fig.height = 2.5}
post %>%
ggplot(aes(x = sd_CompanionNumber__Intercept, y = 0)) +
stat_dotsinterval(point_interval = mode_hdi, .width = .95,
slab_fill = pp[5], color = pp[4],
slab_size = 0, quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 50)) +
labs(title = expression("Behold the fit19.1 posterior for "*sigma[beta]*"."),
subtitle = "This parameter's many things, but zero isn't one of them.",
x = NULL)
```
Yeah, zero and other values close to zero don't look credible for that parameter. 95% of the mass is between 5 and 30, with the bulk hovering around 10. We don't need an $F$-test or even a LOO model comparison to see the writing on wall.
### Multiple comparisons and shrinkage.
> The previous section suggested that an analyst should investigate all contrasts of interest. This recommendation can be thought to conflict with traditional advice in the context on null hypothesis significance testing, which instead recommends that a minimal number of comparisons should be conducted in order to maximize the power of each test while keeping the overall false alarm rate capped at 5% (or whatever maximum is desired).... Instead, a Bayesian analysis can mitigate false alarms by incorporating prior knowledge into the model. In particular, hierarchical structure (which is an expression of prior knowledge) produces shrinkage of estimates, and shrinkage can help rein in estimates of spurious outlying data. For example, in the posterior distribution from the fruit fly data, the modal values of the posterior group means have a range of 23.2. The sample means of the groups have a range of 26.1. Thus, there is some shrinkage in the estimated means. The amount of shrinkage is dictated only by the data and by the prior structure, not by the intended tests. (p. 568)
We may as well compute those ranges by hand. Here's the range of the observed data.
```{r, message = F}
my_data %>%
group_by(CompanionNumber) %>%
summarise(mean = mean(Longevity)) %>%
summarise(range = max(mean) - min(mean))
```
For our hierarchical model `fit19.1`, the posterior means are rank ordered in the same way as the empirical data.
```{r}
coef(fit19.1)$CompanionNumber[, , "Intercept"] %>%
data.frame() %>%
rownames_to_column(var = "companion_number") %>%
arrange(Estimate) %>%
mutate_if(is.double, round, digits = 1)
```
If we compute the range by a difference of the point estimates of the highest and lowest posterior means, we can get a quick number.
```{r}
coef(fit19.1)$CompanionNumber[, , "Intercept"] %>%
as_tibble() %>%
summarise(range = max(Estimate) - min(Estimate))
```
Note that wasn't fully Bayesian of us. Those means and their difference carry uncertainty and that uncertainty can be fully expressed if we use all the posterior draws (i.e., use `summary = F` and wrangle).
```{r}
coef(fit19.1, summary = F)$CompanionNumber[, , "Intercept"] %>%
as_tibble() %>%
transmute(range = Pregnant1 - Virgin8) %>%
mode_hdi(range)
```
Happily, the central tendency of the range is near equivalent with both methods, but now we have 95% intervals, too. Do note how wide they are. This is why we work with the full set of posterior draws.
### The two-group case.
> A special case of our current scenario is when there are only two groups. The model of the present section could, in principle, be applied to the two-group case, but the hierarchical structure would do little good because there is virtually no shrinkage when there are so few groups (and the top-level prior on $\sigma_\beta$ is broad as assumed here). (p. 568)
For kicks and giggles, let's practice. Since `Pregnant1` and `Virgin8` had the highest and lowest empirical means—making them the groups best suited to define our range, we'll use them to fit the 2-group hierarchical model. To fit it with haste, just use `update()`.
```{r fit19.3}
fit19.3 <-
update(fit19.1,
newdata = my_data %>%
filter(CompanionNumber %in% c("Pregnant1", "Virgin8")),
control = list(adapt_delta = 0.999,
max_treedepth = 12),
seed = 19,
file = "fits/fit19.03")
```
Even with just two groups, there were no gross issues with fitting the model.
```{r}
print(fit19.3)
```
If you compare the posteriors for $\sigma_\beta$ across the two models, you'll see how the one for `fit19.3` is substantially larger.
```{r}
posterior_summary(fit19.1)["sd_CompanionNumber__Intercept", ]
posterior_summary(fit19.3)["sd_CompanionNumber__Intercept", ]
```
Here that is in a coefficient plot using `tidybayes::stat_interval()`.
```{r, fig.width = 6, fig.height = 1.25}
bind_rows(posterior_samples(fit19.1) %>% select(sd_CompanionNumber__Intercept),
posterior_samples(fit19.3) %>% select(sd_CompanionNumber__Intercept)) %>%
mutate(fit = rep(c("fit19.1", "fit19.3"), each = n() / 2)) %>%
ggplot(aes(x = sd_CompanionNumber__Intercept, y = fit)) +
stat_interval(point_interval = mode_hdi, .width = c(.5, .8, .95)) +
scale_color_manual(values = pp[c(11, 5, 7)],
labels = c("95%", "80%", "50%")) +
scale_x_continuous(expression(sigma[beta]),
limits = c(0, NA), expand = expansion(mult = c(0, 0.05))) +
ylab(NULL) +
theme(legend.key.size = unit(0.45, "cm"))
```
This all implies less shrinkage and a larger range.
```{r, warning = F, message = F}
coef(fit19.3, summary = F)$CompanionNumber[, , "Intercept"] %>%
as_tibble() %>%
transmute(range = Pregnant1 - Virgin8) %>%
mode_hdi(range)
```
And indeed, the range between the two groups is larger. Now the posterior mode for their difference has almost converged to that of the raw data. Kruschke then went on to recommend using a single-level model in such situations, instead.
> That is why the two-group model in [Section 16.3][Two groups] did not use hierarchical structure, as illustrated in Figure 16.11 (p. 468). That model also used a $t$ distribution to accommodate outliers in the data, and that model allowed for heterogeneous variances across groups. Thus, for two groups, it is more appropriate to use the model of Section 16.3. The hierarchical multi-group model is generalized to accommodate outliers and heterogeneous variances in [Section 19.5][Heterogeneous variances and robustness against outliers]. (p. 568)
As a refresher, here's what the **brms** code for that Chapter 16 model looked like.
```{r, eval = F}
fit16.3 <-
brm(data = my_data,
family = student,
bf(Score ~ 0 + Group,
sigma ~ 0 + Group),
prior = c(prior(normal(mean_y, sd_y * 100), class = b),
prior(normal(0, log(sd_y)), class = b, dpar = sigma),
prior(exponential(one_over_twentynine), class = nu)),
chains = 4, cores = 4,
stanvars = stanvars,
seed = 16,
file = "fits/fit16.03")
```
Let's adjust it for our data. Since we have a reduced data set, we'll need to re-compute our `stanvars` values, which were based on the raw data.
```{r}
# it's easier to just make a reduced data set
my_small_data <-
my_data %>%
filter(CompanionNumber %in% c("Pregnant1", "Virgin8"))
(mean_y <- mean(my_small_data$Longevity))
(sd_y <- sd(my_small_data$Longevity))
omega <- sd_y / 2
sigma <- 2 * sd_y
(s_r <- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma))
```
Here we update `stanvars`.
```{r}
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta") +
stanvar(1/29, name = "one_over_twentynine")
```
Note that our priors, here, are something of a blend of those from Chapter 16 and those from our hierarchical model, `fit19.1`.