17-moderation.Rmd

# Moderation

-   Spotlight Analysis: Compare the mean of the dependent of the two groups (treatment and control) at every value ([Simple Slopes Analysis])
-   Floodlight Analysis: is spotlight analysis on the whole range of the moderator ([Johnson-Neyman intervals])

Other Resources:

-   `BANOVAL` : floodlight analysis on Bayesian ANOVA models

-   `cSEM` : `doFloodlightAnalysis` in SEM model

-   [@spiller2013]

Terminology:

-   Main effects (slopes): coefficients that do no involve interaction terms

-   Simple slope: when a continuous independent variable interact with a moderating variable, its slope at a particular level of the moderating variable

-   Simple effect: when a categorical independent variable interacts with a moderating variable, its effect at a particular level of the moderating variable.

Example:

$$
Y = \beta_0 + \beta_1 X + \beta_2 M + \beta_3 X \times M
$$

where

-   $\beta_0$ = intercept

-   $\beta_1$ = simple effect (slope) of $X$ (independent variable)

-   $\beta_2$ = simple effect (slope) of $M$ (moderating variable)

-   $\beta_3$ = interaction of $X$ and $M$

Three types of interactions:

1.  [Continuous by continuous]
2.  [Continuous by categorical]
3.  [Categorical by categorical]

When interpreting the three-way interactions, one can use the slope difference test [@dawson2006probing]

## emmeans package

```{r, eval=FALSE}
install.packages("emmeans")
```

```{r}
library(emmeans)
```

Data set is from [UCLA seminar](https://stats.oarc.ucla.edu/r/seminars/interactions-r/) where `gender` and `prog` are categorical

```{r}
dat <- readRDS("data/exercise.rds") %>%
    mutate(prog = factor(prog, labels = c("jog", "swim", "read"))) %>%
    mutate(gender = factor(gender, labels = c("male", "female")))
```

### Continuous by continuous

```{r}
contcont <- lm(loss~hours*effort,data=dat)
summary(contcont)
```

Simple slopes for a continuous by continuous model

Spotlight analysis [@aiken2005interaction]: usually pick 3 values of moderating variable:

-   Mean Moderating Variable + $\sigma \times$ (Moderating variable)

-   Mean Moderating Variable

-   Mean Moderating Variable - $\sigma \times$ (Moderating variable)

```{r}
effar <- round(mean(dat$effort) + sd(dat$effort), 1)
effr  <- round(mean(dat$effort), 1)
effbr <- round(mean(dat$effort) - sd(dat$effort), 1)
```

```{r}
# specify list of points
mylist <- list(effort = c(effbr, effr, effar))

# get the estimates
emtrends(contcont, ~ effort, var = "hours", at = mylist)

# plot
mylist <- list(hours = seq(0, 4, by = 0.4),
               effort = c(effbr, effr, effar))
emmip(contcont, effort ~ hours, at = mylist, CIs = TRUE)

# statistical test for slope difference
emtrends(
    contcont,
    pairwise ~ effort,
    var = "hours",
    at = mylist,
    adjust = "none"
)
```

The 3 p-values are the same as the interaction term.

For publication, we use

```{r}
library(ggplot2)

# data
mylist <- list(hours = seq(0, 4, by = 0.4),
               effort = c(effbr, effr, effar))
contcontdat <-
    emmip(contcont,
          effort ~ hours,
          at = mylist,
          CIs = TRUE,
          plotit = FALSE)
contcontdat$feffort <- factor(contcontdat$effort)
levels(contcontdat$feffort) <- c("low", "med", "high")

# plot
p  <-
    ggplot(data = contcontdat, 
           aes(x = hours, y = yvar, color = feffort)) +  
    geom_line()
p1 <-
    p + 
    geom_ribbon(aes(ymax = UCL, ymin = LCL, fill = feffort), 
                    alpha = 0.4)
p1  + labs(x = "Hours",
           y = "Weight Loss",
           color = "Effort",
           fill = "Effort")
```

### Continuous by categorical

```{r}
# use Female as basline
dat$gender <- relevel(dat$gender, ref = "female")

contcat <- lm(loss ~ hours * gender, data = dat)
summary(contcat)
```

Get simple slopes by each level of the categorical moderator

```{r}
emtrends(contcat, ~ gender, var = "hours")

# test difference in slopes
emtrends(contcat, pairwise ~ gender, var = "hours")
# which is the same as the interaction term
```

```{r}
# plot
(mylist <- list(
    hours = seq(0, 4, by = 0.4),
    gender = c("female", "male")
))
emmip(contcat, gender ~ hours, at = mylist, CIs = TRUE)
```

### Categorical by categorical

```{r}
# relevel baseline
dat$prog   <- relevel(dat$prog, ref = "read")
dat$gender <- relevel(dat$gender, ref = "female")
```

```{r}
catcat <- lm(loss ~ gender * prog, data = dat)
summary(catcat)
```

Simple effects

```{r}
emcatcat <- emmeans(catcat, ~ gender*prog)

# differences in predicted values
contrast(emcatcat, 
         "revpairwise", 
         by = "prog", 
         adjust = "bonferroni")
```

Plot

```{r}
emmip(catcat, prog ~ gender,CIs=TRUE)
```

Bar graph

```{r}
catcatdat <- emmip(catcat,
                   gender ~ prog,
                   CIs = TRUE,
                   plotit = FALSE)
p <-
    ggplot(data = catcatdat,
           aes(x = prog, y = yvar, fill = gender)) +
    geom_bar(stat = "identity", position = "dodge")

p1 <-
    p + geom_errorbar(
        position = position_dodge(.9),
        width = .25,
        aes(ymax = UCL, ymin = LCL),
        alpha = 0.3
    )
p1  + labs(x = "Program", y = "Weight Loss", fill = "Gender")
```

## probmod package

-   Not recommend: package has serious problem with subscript.

```{r, eval = FALSE}
install.packages("probemod")
```

```{r, eval = FALSE}
library(probemod)

myModel <-
    lm(loss ~ hours * gender, data = dat %>% 
           select(loss, hours, gender))
jnresults <- jn(myModel,
                dv = 'loss',
                iv = 'hours',
                mod = 'gender')


pickapoint(
    myModel,
    dv = 'loss',
    iv = 'hours',
    mod = 'gender',
    alpha = .01
)

plot(jnresults)
```

## interactions package

-   Recommend

```{r, eval = FALSE}
install.packages("interactions")
```

### Continuous interaction

-   (at least one of the two variables is continuous)

```{r}
library(interactions)
library(jtools) # for summ()
states <- as.data.frame(state.x77)
fiti <- lm(Income ~ Illiteracy * Murder + `HS Grad`, data = states)
summ(fiti)
```

For continuous moderator, the three values chosen are:

-   -1 SD above the mean

-   The mean

-   -1 SD below the mean

```{r}
interact_plot(fiti,
              pred = Illiteracy,
              modx = Murder,
              
              # if you don't want the plot to mean-center
              # centered = "none", 
              
              # exclude the mean value of the moderator
              # modx.values = "plus-minus", 
              
              # split moderator's distribution into 3 groups
              # modx.values = "terciles" 
              
              plot.points = T, # overlay data
              
              
              # different shape for differennt levels of the moderator
              point.shape = T, 
              
              # if two data points are on top one another, 
              # this moves them apart by little
              jitter = 0.1, 
              
              # other appearance option
              x.label = "X label", 
              y.label = "Y label",
              main.title = "Title",
              legend.main = "Legend Title",
              colors = "blue",
              
              # include confidence band
              interval = TRUE, 
              int.width = 0.9, 
              robust = TRUE # use robust SE
              ) 
```

To include weights from the regression inn the plot

```{r}
fiti <- lm(Income ~ Illiteracy * Murder,
           data = states,
           weights = Population)

interact_plot(fiti,
              pred = Illiteracy,
              modx = Murder,
              plot.points = TRUE)
```

Partial Effect Plot

```{r}
library(ggplot2)
data(cars)
fitc <- lm(cty ~ year + cyl * displ + class + fl + drv, 
           data = mpg)
summ(fitc)

interact_plot(
    fitc,
    pred = displ,
    modx = cyl,
    # the observed data is based on displ, cyl, and model error
    partial.residuals = TRUE, 
    modx.values = c(4, 5, 6, 8)
)
```

Check linearity assumption in the model

Plot the lines based on the subsample (red line), and whole sample (black line)

```{r}
x_2 <- runif(n = 200, min = -3, max = 3)
w   <- rbinom(n = 200, size = 1, prob = 0.5)
err <- rnorm(n = 200, mean = 0, sd = 4)
y_2 <- 2.5 - x_2 ^ 2 - 5 * w + 2 * w * (x_2 ^ 2) + err

data_2 <- as.data.frame(cbind(x_2, y_2, w))

model_2 <- lm(y_2 ~ x_2 * w, data = data_2)
summ(model_2)
interact_plot(
    model_2,
    pred = x_2,
    modx = w,
    linearity.check = TRUE,
    plot.points = TRUE
)
```

#### Simple Slopes Analysis

-   continuous by continuous variable interaction (still work for binary)

-   conditional slope of the variable of interest (i.e., the slope of $X$ when we hold $M$ constant at a value)

Using `sim_slopes` it will

-   mean-center all variables except the variable of interest

-   For moderator that is

    -   Continuous, it will pick mean, and plus/minus 1 SD

    -   Categorical, it will use all factor

`sim_slopes` requires

-   A regression model with an interaction term)

-   Variable of interest (`pred =`)

-   Moderator: (`modx =`)

```{r}
sim_slopes(fiti,
           pred = Illiteracy,
           modx = Murder,
           johnson_neyman = FALSE)

# plot the coefficients
ss <- sim_slopes(fiti,
                 pred = Illiteracy,
                 modx = Murder,
                 modx.values = c(0, 5, 10))
plot(ss)

# table 
ss <- sim_slopes(fiti,
                 pred = Illiteracy,
                 modx = Murder,
                 modx.values = c(0, 5, 10))
library(huxtable)
as_huxtable(ss)
```

#### Johnson-Neyman intervals

To know all the values of the moderator for which the slope of the variable of interest will be statistically significant, we can use the Johnson-Neyman interval [@johnson1936tests]

Even though we kind of know that the alpha level when implementing the Johnson-Neyman interval is not correct [@bauer2005probing], not until recently that there is a correction for the type I and II errors [@esarey2018marginal].

Since Johnson-Neyman inflates the type I error (comparisons across all values of the moderator)

```{r}
sim_slopes(
    fiti,
    pred = Illiteracy,
    modx = Murder,
    johnson_neyman = TRUE,
    control.fdr = TRUE,
    # correction for type I and II
    
    # include conditional intecepts
    # cond.int = TRUE, 
    
    robust = "HC3",
    # rubust SE
    
    # don't mean-centered non-focal variables
    # centered = "none",
    jnalpha = 0.05
)
```

For plotting, we can use `johnson_neyman`

```{r}
johnson_neyman(fiti,
               pred = Illiteracy,
               modx = Murder,
               
               # correction for type I and II
               control.fdr = TRUE, 
               alpha = .05)
```

Note:

-   y-axis is the **conditional slope** of the variable of interest

#### 3-way interaction

```{r}
# fita3 <-
#     lm(rating ~ privileges * critical * learning, 
#        data = attitude)
# 
# probe_interaction(
#     fita3,
#     pred = critical,
#     modx = learning,
#     mod2 = privileges,
#     alpha = .1
# )


mtcars$cyl <- factor(mtcars$cyl,
                     labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
fitc3 <- lm(mpg ~ hp * wt * cyl, data = mtcars)
interact_plot(fitc3,
              pred = hp,
              modx = wt,
              mod2 = cyl) +
    theme_apa(legend.pos = "bottomright")
```

Johnson-Neyman 3-way interaction

```{r}
library(survey)
data(api)

dstrat <- svydesign(
    id = ~ 1,
    strata = ~ stype,
    weights = ~ pw,
    data = apistrat,
    fpc = ~ fpc
)

regmodel3 <-
    survey::svyglm(api00 ~ avg.ed * growth * enroll, design = dstrat)

sim_slopes(
    regmodel3,
    pred = growth,
    modx = avg.ed,
    mod2 = enroll,
    jnplot = TRUE
)
```

Report

```{r}
ss3 <-
    sim_slopes(regmodel3,
               pred = growth,
               modx = avg.ed,
               mod2 = enroll)
plot(ss3)
as_huxtable(ss3)
```

### Categorical interaction

```{r}
library(ggplot2)
mpg2 <- mpg %>% 
    mutate(cyl = factor(cyl))

mpg2["auto"] <- "auto"
mpg2$auto[mpg2$trans %in% c("manual(m5)", "manual(m6)")] <- "manual"
mpg2$auto <- factor(mpg2$auto)
mpg2["fwd"] <- "2wd"
mpg2$fwd[mpg2$drv == "4"] <- "4wd"
mpg2$fwd <- factor(mpg2$fwd)
## Drop the two cars with 5 cylinders (rest are 4, 6, or 8)
mpg2 <- mpg2[mpg2$cyl != "5", ]
## Fit the model
fit3 <- lm(cty ~ cyl * fwd * auto, data = mpg2)

library(jtools) # for summ()
summ(fit3)
```

```{r}
cat_plot(fit3,
         pred = cyl,
         modx = fwd,
         plot.points = T)
#line plots
cat_plot(
    fit3,
    pred = cyl,
    modx = fwd,
    geom = "line",
    point.shape = TRUE,
    # colors = "Set2", # choose color
    vary.lty = TRUE
)


# bar plot
cat_plot(
    fit3,
    pred = cyl,
    modx = fwd,
    geom = "bar",
    interval = T,
    plot.points = TRUE
)
```

## interactionR package

-   For publication purposes
-   Following
    -   [@knol2012recommendations] for presentation

    -   [@hosmer1992confidence] for confidence intervals based on the delta method

    -   [@zou2008estimation] for variance recovery "mover" method

    -   [@assmann1996confidence] for bootstrapping

```{r, eval = FALSE}
install.packages("interactionR")
```

## sjPlot package

-   For publication purposes (recommend, but more advanced)

-   [link](https://strengejacke.github.io/sjPlot/articles/plot_interactions.html)