forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-pooling_effect_sizes.Rmd
707 lines (504 loc) · 38.6 KB
/
03-pooling_effect_sizes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
# (PART) Meta-Analysis in R {-}
# Pooling Effect Sizes {#pool}
![](_figs/pooling.jpg)
Now, let us get to the core of every meta-analysis: **pooling your effect sizes** to get one overall effect size estimate of the studies.
```{block,type='rmdinfo'}
When pooling effect sizes in Meta-Analysis, there are two approaches which we can use: the **Fixed-Effect Model**, or the **Random-Effects Model** [@borenstein2011]. There is an extensive debate on which model fits best in which context [@fleiss1993review], with no clear consensus in sight. Although it has been recommended to **only resort to the random-effects pooling model** in clinical psychology and the health sciences [@cuijpers2016meta], we will describe how to conduct both in *R* here.
Both of these models only require an **effect size**, and a **dispersion (variance)** estimate for each study, of which the inverse is taken. This is why the methods are often called **generic inverse-variance methods**.
```
We will describe in-depth how to conduct meta-analyses in *R* with **effect sizes based on continuous outcome data** (such as standardized mean differences) first, as these are the most common ones in psychology and the health science field. Later on, we will extend this knowledge to meta-analyses with **binary outcome** data, which might be important if you are focusing on prevention trials.
For all meta-analyses, we will use the `meta` package [@schwarzer2007meta]. In [Section 2.1](#RStudio), we showed how to install the package. Now, load the package from your library to proceed.
```{r, warning=FALSE,message=FALSE}
library(meta)
```
```{r, echo=FALSE,warning=FALSE,message=FALSE}
library(tidyverse)
library(knitr)
```
<br><br>
---
## Fixed-Effects-Model {#fixed}
### Pre-calculated effect size data {#pre.calc}
```{block, type='rmdinfo'}
**The idea behind the fixed-effects-model**
The fixed-effects-model assumes that all studies along with their effect sizes stem from a single homogeneous population [@borenstein2011]. To calculate the overall effect, we therefore average all effect sizes, but give studies with greater precision a higher weight. In this case, greater precision means that the study has a larger **$N$**, which leads to a smaller **Standard Error** of its effect size estimate.
For this weighting, we use the **inverse of the variance** $1/\hat\sigma^2_k$ of each study $k$. We then calculate a weighted average of all studies, our fixed effect size estimate $\hat\theta_F$:
```
\begin{equation}
\hat\theta_F = \frac{\sum\limits_{k=1}^K \hat\theta_k/ \hat\sigma^2_k}{\sum\limits_{k=1}^K 1/\hat\sigma^2_k}
\end{equation}
In [Chapter 3.1](#excel_preparation), we have described two ways your Excel spreadsheet for your meta-analysis data can look like:
* It can either be stored as the **raw data** (including the Mean, N, and SD of every study arm)
* Or it only contains the **calculated effect sizes and the standard error (SE)**.
The functions to pool the results with a fixed-effect-model **differ depending on which data format you used**, but not much. First, let us assume you already have a dataset with the **calcucated effect sizes and SE** for each study. In my case, this is my `madata` dataset.
```{r,echo=FALSE}
load("_data/Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data
```
```{r}
str(madata)
```
The effect sizes in this dataset are based on **continuous outcome data**. As our effect sizes are already calculated, we can use the `meta::metagen` function. For this function, we can specify loads of parameters, all of which can be accessed by typing `?metagen` in your Console once the `meta` package is loaded.
**Here is a table with the most important parameters for our code:**
```{r,echo=FALSE}
i<-c("TE","seTE","data=","studlab=paste()","comb.fixed=","comb.random","prediction=","sm=")
ii<-c("This tells R to use the TE column to retrieve the effect sizes for each study",
"This tells R to use the seTE column to retrieve the standard error for each study",
"After =, paste the name of your dataset here",
"This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author)",
"Whether to use a fixed-effect model",
"Whether to use a random-effects model",
"Whether to print a prediction interval for the effect of future studies based on present evidence","The summary measure we want to calculate. We can either calculate the mean difference (MD) or Hedges' g/Cohen's d (SMD)")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
Let us code our first fixed-effects model meta-analysis. We we will give the results of this analysis the simple name `m`.
```{r}
m <- metagen(TE,
seTE,
data=madata,
studlab=paste(Author),
comb.fixed = TRUE,
comb.random = FALSE,
prediction=TRUE,
sm="SMD")
m
```
We now see the results of our meta-analysis, including
* The **individual effect sizes** for each study, and their weight
* The total **number of included studies** ($k$)
* The **overall effect** (in our case, $g$ = 0.48) and its confidence interval and $p$-value
* Measures of **between-study heterogeneity**, such as $tau^2$ or $I^2$ and a $Q$-test of heterogeneity
Using the `$` command, we can also have a look at various outputs directly. For example:
```{r, eval=FALSE}
m$lower.I2
```
Gives us the lower bound of the 95% confidence interval for *I^2^*
```{r, echo=FALSE}
m$lower.I2
```
We can **save the results of the meta-analysis** to our working directory as a .txt-file using this command:
```{r,eval=FALSE}
sink("results.txt")
print(m)
sink()
```
### Raw effect size data {#fixed.raw}
To conduct a fixed-effects model meta-analysis from **raw data** (i.e, if your data has been prepared the way we describe in [Chapter 3.1.1](#excel_preparation)), we have to use the `meta::metacont()` function instead. The structure of the code however, looks quite similar.
```{r,echo=FALSE}
i<-c("Ne", "Me", "Se", "Nc", "Mc","Sc","data=","studlab=paste()","comb.fixed=","comb.random","prediction=","sm=")
ii<-c("The number of participants (N) in the intervention group",
"The mean (M) of the intervention group",
"The standard deviation (SD) of the intervention group",
"The number of participants (N) in the control group",
"The mean (M) of the control group",
"The standard deviation (SD) of the control group",
"After '=', paste the name of your dataset here",
"This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author)",
"Whether to use a fixed-effects-model",
"Whether to use a random-effects-model",
"Whether to print a prediction interval for the effect of future studies based on present evidence","The summary measure we want to calculate. We can either calculate the mean difference (MD) or Hedges' g (SMD)")
ms2<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms2)<-names
kable(ms2)
```
```{r,echo=FALSE,warning=FALSE}
load("_data/metacont_data.RData")
metacont$Ne<-as.numeric(metacont$Ne)
metacont$Me<-as.numeric(metacont$Me)
metacont$Se<-as.numeric(metacont$Se)
metacont$Mc<-as.numeric(metacont$Mc)
metacont$Sc<-as.numeric(metacont$Sc)
```
For this purpose, I will use my dataset `metacont`, which contains the raw data of all studies I want to snythesize.
```{r}
str(metacont)
```
Now, let us code the meta-analysis function, this time using the `meta::metacont` function, and my `metacont` dataset. I want to name my output `m.raw` now.
```{r}
m.raw <- metacont(Ne,
Me,
Se,
Nc,
Mc,
Sc,
data=metacont,
studlab=paste(Author),
comb.fixed = TRUE,
comb.random = FALSE,
prediction=TRUE,
sm="SMD")
m.raw
```
```{block,type='rmdachtung'}
As you can see, all the calculated effect sizes are **negative** now, including the pooled effect. However, all studies report a positive outcome, meaning that the symptoms in the intervention group (e.g., of depression) were reduced. The negative orientation results from the fact that in **many clinical trials, lower scores indicate better outcomes** (e.g., less depression). It is no problem to report values like this: in fact, it is conventional.
Some readers who are unfamiliar with meta-analysis, however, **might be confused** by this, so you may consider changing the orientation of your values before you report them in your paper.
```
We can **save the results of the meta-analysis** to our working directory as a .txt-file using this command.
```{r,eval=FALSE}
sink("results.txt")
print(m.raw)
sink()
```
<br><br>
---
## Random-Effects-Model {#random}
Previously, we showed how to perform a fixed-effect-model meta-analysis using the `metagen` and `metacont` functions.
However, we can only use the fixed-effect-model when we can assume that **all included studies come from the same population**. In practice this is hardly ever the case: interventions may vary in certain characteristics, the sample used in each study might be slightly different, or its methods. In this case, we cannot assume that all studies stem from the same hypothesized "population" of studies.
Same is the case once we detect **statistical heterogeneity** in our fixed-effect-model meta-analysis, as indicated by $I^{2}>0\%$.
So, it is very likely that you will actually use a random-effects-model for your meta-analysis. Thankfully, there is not much more we have to think about when conducting a random-effects model meta-analysis in *R* instead of a fixed-effect model meta-analysis.
```{block,type='rmdinfo'}
**The Idea behind the Random-Effects-Model**
In the Random-Effects-Model, we want to account for our assumption that the study effect estimates show more variance than when drawn from a single population [@schwarzer2015meta]. The random-effects-model works under the so-called **assumption of exchangeability**. This means that in random-effects model meta-analyses, we not only assume that effects of individual studies deviate from the true intervention effect of all studies due to sampling error, but that there is another source of variance introduced by the fact that the studies do not stem from one single population, but are drawn from a "universe" of populations.
We therefore assume that there is not only one true effect size, but **a distribution of true effect sizes**. We therefore want to estimate the mean of this distribution of true effect sizes. The fixed-effect-model assumes that when the observed effect size $\hat\theta_k$ of an individual study $k$ deviates from the true effect size $\theta_F$, the only reason for this is that the estimate is burdened by (sampling) error $\epsilon_k$.
$$\hat\theta_k = \theta_F + \epsilon_k$$
While the random-effects-model assumes that, in addition, there is **a second source of error** $\zeta_k$.This second source of error is introduced by the fact that even the true effect size $\theta_k$ of our study $k$ is also only part of an over-arching distribution of true effect sizes with the mean $\mu$ [@borenstein2011].
```
```{r, echo=FALSE, fig.width=6,fig.height=4,fig.align='center'}
library(png)
library(grid)
img <- readPNG("_figs/density.png")
grid.raster(img)
```
<p style="text-align: center;">
*An illustration of parameters of the random-effects-model*
</p>
```{block,type='rmdinfo'}
The formula for the random-effects-model therefore looks like this:
$$\hat\theta_k = \mu + \epsilon_k + \zeta_k$$
When calculating a random-effects-model meta-analysis, we therefore also have to take the error $\zeta_k$ into account. To do this, we have to **estimate the variance of the distribution of true effect sizes**, which is denoted by $\tau^{2}$, or *tau^2^*. There are several estimators for $\tau^{2}$, many of which are implemented in `meta`. We will give you more details about them in the next section.
```
```{block,type='rmdachtung'}
Even though it is **conventional** to use random-effects-model meta-analyses in psychological outcome research, applying this model is **not undisputed**. The random-effects-model pays **more attention to small studies** when pooling the overall effect in a meta-analysis [@schwarzer2015meta]. Yet, small studies in particular are often fraught with **bias** (see [Chapter 8.1](#smallstudyeffects)). This is why some have argued that the fixed-effects-model should be nearly always preferred [@poole1999; @furukawa2003].
```
### Estimators for $\tau^2$ in the random-effects-model {#tau2}
Operationally, conducting a random-effects-model meta-analysis in *R* is not so different from conducting a fixed-effects-model meta-analysis. Yet, we do have choose an estimator for $\tau^{2}$. Here are the estimators implemented in `meta`, which we can choose using the `method.tau` variable in our meta-analysis code.
```{r,echo=FALSE}
i<-c("DL","PM","REML","ML","HS","SJ","HE","EB")
ii<-c("DerSimonian-Laird","Paule-Mandel","Restricted Maximum-Likelihood","Maximum-likelihood","Hunter-Schmidt","Sidik-Jonkman","Hedges","Empirical Bayes")
ms<-data.frame(i,ii)
names<-c("Code", "Estimator")
colnames(ms)<-names
kable(ms)
```
```{block,type='rmdinfo'}
**Which estimator should i use?**
All of these estimators derive $\tau^{2}$ using a slightly different approach, leading to somewhat different pooled effect size estimates and confidence intervals. If one of these approaches is more or less biased often depends on the context, and parameters such as the number of studies $k$, the number of participants $n$ in each study, how much $n$ varies from study to study, and how big $\tau^{2}$ is.
An overview paper by Veroniki and colleagues [@veroniki2016methods] provides an excellent summary on current evidence on which estimator might be more or less biased in which situation. The article is openly accessible, and you can read it [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4950030/).
Especially in medical and psychological research, the by far most often used estimator is the **DerSimonian-Laird estimator** [@dersimonian1986meta]. Part of this widespread use might be attributable to the fact that programs such as *RevMan* or *Comprehensive Meta-Analysis* (older versions) only use this estimator. It is also the default option in the `meta` package in *R*. Simulation studies, however, have shown that the **Maximum-Likelihood**, **Sidik-Jonkman**, and **Empirical Bayes** estimators have better properties in estimating the between-study variance [@sidik2007comparison;@viechtbauer2005bias].
```
```{block,type='rmdinfo'}
**The Hartung-Knapp-Sidik-Jonkman method**
Another criticism of the **DerSimonian-Laird** method is that when estimating the variance of our pooled effect $var(\hat\theta_F)$, this method is prone to producing false positives [@inthout2014hartung]. This is especially the case when the **number of studies** is small, and when there is substantial **heterogeneity** [@hartung1999alternative;@hartung2001refined;@hartung2001tests;@follmann1999valid;@makambi2004effect]. Unfortunately, this is very often the case in when we do meta-analyses in the medical field or in psychology. This is quite a problem, as we don't want to find pooled effects to be statistically significant when in fact they are not!
The **Hartung-Knapp-Sidik-Jonkman (HKSJ) method** was thus proposed a way to produce more robust estimates of $var(\hat\theta_F)$. It has been shown that this method substantially outperforms the DerSimonian-Laird method in many cases [@inthout2014hartung]. The HKSJ method can also be very easily applied in R, while other programs don't have this option yet. This is another big plus of doing a meta-analysis in *R*. The HKSJ usually leads to more **conservative** results, indicated by wider confidence intervals.
```
```{block,type='rmdachtung'}
**Residual concerns with the Hartung-Knapp-Sidik-Jonkman method**
It should be noted, however, that the HKSJ method is not uncontroversial. Some authors argue that other (standard) pooling models should also be used **in addition** to the HKSJ as a **sensitivity analysis** [@wiksten2016hartung]. Jackson and colleagues [@jackson2017hartung] present four residual concerns with this method, which you may take into account before selecting your meta-analytic method. The paper can be read [here](https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7411).
```
### Pre-calculated effect size data {#random.precalc}
After all this input, you will see that even random-effects model meta-analyses are very easy to code in *R*. Compared to the fixed-effects-model [Chapter 4.1](#fixed), there is just three extra parameters we have to define. Especially, as we have described before, we have to tell *R* which **between-study-variance estimator** ($\tau^{2}$) we want to use, and if we want to use the **Knapp-Hartung(-Sidik-Jonkman)** adjustment.
**Here's a table of all parameters we have to define in our code to perform a random-effects-model meta-analysis with pre-calculated effect sizes:**
```{r,echo=FALSE}
i<-c("TE","seTE","data=","studlab=paste()","comb.fixed=","comb.random=","method.tau=","hakn=", "prediction=","sm=")
ii<-c("This tells R to use the TE column to retrieve the effect sizes for each study",
"This tells R to use the seTE column to retrieve the standard error for each study",
"After =, paste the name of your dataset here",
"This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author)",
"Whether to use a fixed-effects model",
"Whether to use a random-effects model. This has to be set to TRUE",
"Which estimator to use for the between-study variance",
"Whether to use the Knapp-Hartung method",
"Whether to print a prediction interval for the effect of future studies based on present evidence","The summary measure we want to calculate. We can either calculate the mean difference (MD) or Hedges' g (SMD)")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
I will use my `madata` dataset again to do the meta-analysis. For illustrative purposes, let us use the Sidik-Jonkman estimator (`"SJ"`) and the HKSJ method. To do this analysis, make sure that `meta` as well as `metafor` are loaded in *R*.
```{r,eval=FALSE,warning=FALSE}
library(meta)
library(metafor)
```
Now, let us code our random-effects-model meta-analysis. Remember, as our effect size data are precalculated, I will use the `meta::metagen()` function.
```{r,echo=FALSE}
load("_data/Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data
```
```{r}
m.hksj <- metagen(TE,
seTE,
data = madata,
studlab = paste(Author),
comb.fixed = FALSE,
comb.random = TRUE,
method.tau = "SJ",
hakn = TRUE,
prediction = TRUE,
sm = "SMD")
m.hksj
```
The output shows that our estimated effect is $g=0.59$, and the 95% confidence interval stretches from $g=0.39$ to $0.80$ (rounded). It also becomes clear that this effect is different (and larger) than the one we found in the fixed-effects-model meta-analysis in [Chapter 4.1](#fixed) ($g=0.48$).
Let us compare this to the output using the **DerSimonian-Laird** estimator, and when setting `hakn = FALSE`. As this estimator is the **default**, I do not have to define `method.tau` this time.
```{r}
m.dl <- metagen(TE,
seTE,
data=madata,
studlab=paste(Author),
comb.fixed = FALSE,
comb.random = TRUE,
hakn = FALSE,
prediction=TRUE,
sm="SMD")
m.dl
```
We see that the overall effect size estimate using this estimator is similar to the previous one ($g=0.57$), but the confidence intervals **is narrower because we did not adjust it** using the HKSJ method.
```{r,echo=FALSE,fig.height=2}
TE<-c(m.hksj$TE.random,m.dl$TE.random)
seTE<-c(m.hksj$seTE.random,m.dl$seTE.random)
Method<-c("Knapp-Hartung-Sidik-Jonkman","DerSimonian-Laird")
frst.data<-data.frame(Method,TE,seTE)
m.frst<-metagen(TE,
seTE,
data=frst.data,
studlab=paste(Method),
comb.fixed = FALSE,
comb.random = FALSE,
hakn = FALSE,
prediction=FALSE)
meta::forest.meta(m.frst,xlim = c(0.34,0.85))
```
### Raw effect size data {#random.raw}
I we use raw effect size data, such as the one stored in my `metacont` dataset, we can use the `meta::metacont()` function again. The parameters stay the same as before. I will name the meta-analysis results `m.hksj.raw` this time.
```{r,echo=FALSE,warning=FALSE}
load("_data/metacont_data.RData")
metacont$Ne<-as.numeric(metacont$Ne)
metacont$Me<-as.numeric(metacont$Me)
metacont$Se<-as.numeric(metacont$Se)
metacont$Mc<-as.numeric(metacont$Mc)
metacont$Sc<-as.numeric(metacont$Sc)
save(metacont, file = "_data/metacont_data.RData")
```
```{r}
m.hksj.raw <- metacont(Ne,
Me,
Se,
Nc,
Mc,
Sc,
data = metacont,
studlab = paste(Author),
comb.fixed = FALSE,
comb.random = TRUE,
method.tau = "SJ",
hakn = TRUE,
prediction = TRUE,
sm = "SMD")
m.hksj.raw
```
<br><br>
---
## Binary outcomes {#binary}
![](_figs/compare.jpg)
In some cases, researchers will have to work with **binary outcome data** (e.g., dead/alive, depressive disorder/no depressive disorder) instead of continuous outcome data. In such a case, you will probably be more interested in outcomes like the pooled **Odds Ratio**, the **Relative Risk** (which the [Cochrane Handbook](https://handbook-5-1.cochrane.org/chapter_9/9_2_2_2_measures_of_relative_effect_the_risk_ratio_and_odds.htm) advises to use instead of Odds Ratios, because they are easier to interpret), or the **Incidence Rate Ratio**. There are two common types of binary outcome data:
* **Event rate data**. This is data in which we are only dealing with the number of persons experiencing an event in each group, and the total sample size in each group. Effect sizes we can calculate from such data are the Odds Ratio, the Relative Risk or the Risk Difference, among others.
* **Incidence rate data**. Event rate data usually does not contain any information on the time span during which events did or did not occur. Given that studies often have drastically different follow-up times (e.g., 8 weeks vs. 2 years), it often makes sense to also take the time interval during which events occured into account. In epidemiology, incidence rates are often used to signify how many events occured within a standardized timeframe (e.g., one year). The corresponding effect size is the incidence rate ratio (IRR), which compares the incidence rate in the intervention group to the one in the control group.
For both event rate data and incidence rate data, there are again two options to pool effect sizes using the `meta` package:
* **The effect sizes are already calculated**. In this case, we can use the `metagen` function as we did before (see [Chapter 4.1](#fixed) and [Chapter 4.2](#random)), with a few other specifications. We describe how to use the `metagen` function for pre-calculated binary outcome data in [Chapter 4.3.3](#binarymetagen).
* **We only have the raw outcome data**. If this is the case, we will have to use the `meta::metabin()` or `meta::metainc()` function instead. We'll show you how to do this below.
```{block,type='rmdinfo'}
**The Mantel-Haenszel method**
In [Chapter 4.1](#pre.calc), we described the **inverse-variance** method through which effect sizes are pooled in meta-analyses based on continuous outcome data. For binary outcome data, it is common to use a slightly different approach, the **Mantel-Haenszel method** [@mantel1959statistical; @robins1986general], to pool results. The formula for this method looks like this:
\begin{equation}
\hat\theta_{MH} = \frac{\sum\limits_{k=1}^K w_{k} \hat\theta_k}{\sum\limits_{k=1}^K w_{k}}
\end{equation}
This calculation comes very close the the **inverse-variance** method, but the calculation of the study weights $w_k$ is different. The formulae for the weights for different effect sizes, with $a_k$ being the number of events in the intervention group, $c_k$ the number of event in the control group, $b_k$ the number of non-events in the intervention group, $d_k$ the number of non-events in the control group, and $n_k$ being the total sample size, look like this:
**Odds Ratio**
$$w_k = \frac{b_kc_k}{n_k}$$
**Risk Ratio**
$$w_k = \frac{(a_k+b_k)c_k}{n_k}$$
**Risk Difference**
$$w_k = \frac{(a_k+b_k)(c_k+d_k)}{n_k}$$
For binary outcome data, it is generally recommended to use the Mantel-Haenszel method, and it is the default method in *RevMan* [@schwarzer2015meta]. It is also the default used for binary outcome data in the `meta` package.
```
### Event rate data
For meta-analyses of event rate data, we need our data in the following format:
```{r,echo=FALSE, message=FALSE, error=FALSE}
library(kableExtra)
Package<-c("Author","Ee","Ne","Ec","Nc","Subgroup")
Description<-c(
"This signifies the column for the study label (i.e., the first author)",
"Number of events in the experimental treatment arm",
"Number of participants in the experimental treatment arm",
"Number of events in the control arm",
"Number of participants in the control arm",
"This is the label for one of your subgroup codes. It is not that important how you name it, so you can give it a more informative name (e.g. population). In this column, each study should then be given a subgroup code, which should be exactly the same for each subgroup, including upper/lowercase letters. Of course, you can also include more than one subgroup column with different subgroup codings, but the column name has to be unique")
m<-data.frame(Package,Description)
names<-c("Column", "Description")
colnames(m)<-names
kable(m)
```
I will use my dataset `binarydata`, which also has this format.
```{r}
load("_data/binarydata.RData")
str(binarydata)
```
The other parameters are like the ones we used in the meta-analyses with continuous outcome data, with two exceptions:
* **sm**: As we want to have a pooled effect for binary data, we have to choose another summary measure now. We can choose from **"OR"** (Odds Ratio), **"RR"** (Risk Ratio), or **RD** (Risk Difference), among other things.
* **incr**. This lets us define if and how we want a **continuity correction** to be performed. Such a correction is necessary in cases where one of the cells in your data is zero (e.g., because no one in the intervention arm died). This can be a frequent phenomenon in some contexts, and **distorts our effect size estimates**. By default, the `metabin` function adds the value **0.5** in all cells were $n$ is zero [@gart1967bias]. This value can be changed using the `incr`-parameter (e.g., `incr=0.1`). If your trial arms are very uneven in terms of their total $n$, we can also use the **treatment arm continuity correction** [@j2004add]. This can be done by using `incr="TACC"`.
**Here is the code for a meta-analysis with raw binary data**
I have decided to run a random-effect-model meta-analysis. I want the summary measure to be the Risk Ratio (RR).
```{r}
m.bin <- metabin(Ee,
Ne,
Ec,
Nc,
data = binarydata,
studlab = paste(Author),
comb.fixed = FALSE,
comb.random = TRUE,
method.tau = "SJ",
hakn = TRUE,
prediction = TRUE,
incr = 0.1,
sm = "RR")
m.bin
```
<br><br>
**L'Abbé Plots**
So-called **L'Abbé plots** [@labbe] are a good way to visualize data based on event rates. In a L'Abbé plot, the event rate of a study's intervention group is plotted against the event rate in the control group, and the $N$ of the study is signified by the size of the bubble in the plot. Despite its simplicity, this plot allows us to check for three important aspects of our meta-analysis with binary outcomes:
* **The overall trend of our meta-analysis**. If we are expecting a type of intervention to have a protective effect (i.e., making an adverse outcome such as death or depression onset less likely) the studies should mostly lie in the bottom-right corner of the L'Abbé plot, because the control group event rate should be higher than the intervention group event rate. If there is no effect of the intervention compared to the control group, the event rates are identical and the study is shown on the diagonal of the L'Abbé plot.
* **Heterogeneity of effect sizes**. The plot also allows us to eyeball for single studies or groups of studies which contribute to the heterogeneity of the effect we found. It could be the case, for example, that most studies lie in the bottom-right part of the plot as they report positive effects, while a few studies lie in the top-left sector indicated negative effects. Especially if such studies have a small precision (i.e., a small $N$ of participants, indicated by small bubbles in the plot), they could have distorted our pooled effect and may contribute to the between-study heterogeneity.
* **Heterogeneity of event rates**. It may also be the case that some of the heterogeneity in our meta-analysis was introduced by the fact that the event rates "per se" are higher or lower in some studies compared to the others. The L'Abbé plot provides as with this information, as studies with higher event rates will naturally tend towards the top-right corner of the plot.
The results of the `metabin` function can be easily used to generate L'Abbé plots using the `labbe.metabin` function included in the `meta` package. We can specify the following parameters:
```{r,echo=FALSE}
library(kableExtra)
Package<-c("x","bg","col","studlab","col.fixed","col.random")
Description<-c(
"This signifies our metabin meta-analysis output",
"The background color of the studies",
"The line color of the studies",
"Wether the names of the studies should be printed in the plot (TRUE/FALSE)",
"The color of the dashed line symbolizing the pooled effect of the meta-analysis, if the fixed-effect-model was used",
"The color of the dashed line symbolizing the pooled effect of the meta-analysis, if the random-effects model was used"
)
m<-data.frame(Package,Description)
names<-c("Parameter", "Description")
colnames(m)<-names
kable(m)
```
For this example, I will use the `m.bin` output I previously generated using the `metabin` function.
```{r}
labbe.metabin(x = m.bin,
bg = "blue",
studlab = TRUE,
col.random = "red")
```
Works like a charm! We see that the **dashed red line** signifying the **pooled effect estimate** of my meta-analysis is running trough the bottom-right sector of my L'Abbé plot, meaning that the overall effect size is positive (i.e., that the intervention has a preventive effect).
However, it also becomes clear that **all studies** clearly follow this trend: we see that most studies lie tightly in the bottom-left corner of the plot, meaning that these studies had **small event rates** (i.e., the event in these studies was very rare irrespective of group assignment). We also see that two of our included studies do not fall into this pattern: **Schmidthauer** and **van der Zee**. Those two studies have higher event rates, and both favor the intervention group more clearly than the others did.
### Incidence rates
To conduct meta-analyses using incidence rate data, so-called **person-time** data has to be collected or calculated by hand. What is basically needed to calculate person-time data is the **number of events** and the **timeframe** during which they occurred. You can find a general introduction into this topic [here](https://sph.unc.edu/files/2015/07/nciph_ERIC4.pdf), and this [quick course](https://www.cdc.gov/ophss/csels/dsepd/ss1978/lesson3/section2.html) by the Centers for Disease Control and Prevention gives a hands-on introduction on how person-time data is calculated.
As an example, I will use my `IRR.data` dataset, in which the person-time (in my case, person-year) is stored in the `time.e` and `time.c` column for the experimental and control group, respectively. The `event.e` and `event.c` column contains the number of **events** (in my case, depression onsets) for both groups.
```{r, echo=FALSE}
load("_data/IRR.data.RData")
IRR.data
```
To pool the data, we use the `metainc` function included in the `meta` package. We can set the following parameters:
```{r,echo=FALSE}
i<-c("event.e","time.e","event.c","time.c",
"data=","studlab=paste()","comb.fixed=","comb.random=","method.tau=","hakn=", "prediction=","sm=")
ii<-c("The number of events in the intervention group",
"The person-time at risk in the intervention group",
"The number of events in the control group",
"The person-time at risk in the control group",
"After =, paste the name of your dataset here",
"This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author)",
"Whether to use a fixed-effects-model",
"Whether to use a random-effects-model. This has to be set to TRUE",
"Which estimator to use for the between-study variance",
"Whether to use the Knapp-Hartung-Sidik-Jonkman method",
"Whether to print a prediction interval for the effect of future studies based on present evidence",
"The summary measure we want to calculate. We want to calculate the Incidence Rate Ratio (IRR)")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
```{r}
metainc(event.e,
time.e,
event.c,
time.c,
studlab = paste(Author),
data = IRR.data,
sm = "IRR",
method.tau = "DL",
comb.random = TRUE,
comb.fixed = FALSE,
hakn = TRUE)
```
### Pre-calculated effect sizes {#binarymetagen}
We can also synthesize pre-calculated effect sizes based on binary outcome data using the `metagen` function. However, there is an important caveat: binary outcome effect sizes are **not on a natural scale**. For effect sizes such as the Risk Ratio or Odds Ratio, our "zero" for no effect is not 0, but 1 instead, because this indicates that the event rates, or the like, are equal between groups. Furthermore, effect size differences are not directly comparable. A Risk Ratio of $0.5$, for example, is not simply the "opposite" of a Risk Ratio of $1.5$. While the first effect size means that the event rate has been halved, the latter does **not** mean that is has doubled.
To get effect sizes on a natural scale, it is common to **log-transform** the effect sizes first before they are pooled. While this is done automatically in the `metabin` and `metainc` function, we have to do this step ourselves once we use the `metagen` function. Let us have a look at my `bin.metagen` dataset:
```{r, echo=FALSE}
load("_data/bin.metagen.rda")
```
```{r}
bin.metagen
```
This dataset contains the Risk Ratio as well as the lower and upper confidence interval of seven studies. To produce the input needed for the `metagen` function, we first log-transform all the effect size data.
```{r}
bin.metagen$RR <- log(bin.metagen$RR)
bin.metagen$lower <- log(bin.metagen$lower)
bin.metagen$upper <- log(bin.metagen$upper)
```
We can calculate the **Standard Error** `seTE` like this:
```{r}
bin.metagen$seTE <- (bin.metagen$upper - bin.metagen$lower)/3.92
```
We now have everything we need to use the `metagen` function for our effect size data. To reconvert effect sizes to their original scale, we specificy `sm="RR"` in the function:
```{r}
metagen(RR,
seTE,
studlab = Author,
method.tau = "SJ",
sm = "RR",
data = bin.metagen)
```
```{block,type='rmdinfo'}
Please be aware that the `metagen` function can only use the generic inverse-variance method for pooling, and not the Mantel-Haenszel method. If possible, you should therefore always use the `metabin` (or `metainc`) function.
```
---
## Correlations
![](_figs/birds.jpg)
```{block,type='rmdinfo'}
Performing a meta-analysis of **correlations** is not too different from the methods we described before. Commonly, the **generic inverse-variance pooling** method is also used to combine correlations from different studies into one pooled correlation estimate. When pooling correlations, it is advised to perform **Fisher's $z$-transformation** to obtain accurate weights for each study. Luckily, we do not have to do this transformation ourselves. There is an additional function for meta-analyses of correlations included in the `meta` package, the `metacor` function, which does most of the calculations for us.
```
The parameters of the `metacor` function are mostly identical to the `metagen` and `metacont` function we described before (see [Chapter 4.1](#fixed) and [4.2](#random)), as is the statistics behind it. To use the function, we again need a dataset with the study label (`Author`), the **correlation** $r$ (`cor`) reported for each study, and the **sample size** (`n`) for each study. Let us have a look at my `cordata` dataset, which follows this structure:
```{r, echo=FALSE}
cordata = data.frame(Author = c("Mantel1999", "Haenszel1987", "Altman2011", "Olkin1999",
"Borenstein2013", "Egger2001", "Glass2007", "Ioannidis2010"),
cor = c(0.456, 0.234, 0.235, 0.297, 0.374, 0.561, 0.102, 0.377),
n = c(150, 67, 45, 80, 230, 110, 78, 45))
```
```{r}
str(cordata)
```
We can use this dataset to calculate a meta-analysis using the `metacor` function. Like in the chapters before, we will use a **random-effects model** with the **Sidik-Jonkman** estimator for the between-study heterogeneity $\tau^2$. In addition, we set `sm` to `"ZCOR"` to use the $z$-transformed correlations for the meta-analysis (as is advised). The rest of the syntax remains identical. I will call the results of the meta-analysis `m.cor`.
```{r}
m.cor <- metacor(cor,
n,
data = cordata,
studlab = cordata$Author,
sm = "ZCOR",
method.tau = "SJ")
```
Let us have a look at the **output**, which has the same structure as the one of the `metagen`, `metabin` and `metainc` functions.
```{r, echo=FALSE}
m.cor
```
As can be seen from the output, the pooled correlation in this dataset is $r=0.35$ (for the random-effects model), which is significant ($p<0.0001$). The $I^2$-heterogeneity in this analysis is substantial (58%), supporting the use of a random-effects model.
---