extensions.Rmd

```{r chunk_setup-ext, include=FALSE, eval=TRUE}
knitr::opts_chunk$set(
  # cache
  cache.rebuild = FALSE,
  cache         = TRUE
)
```


# Common Extensions

<br>

```{r top-plot-crossed, echo=FALSE, cache.rebuild=TRUE}
tags$div(
  style = "width:50%; margin:auto auto; font-size:50%",
  DiagrammeR::grViz(
    'scripts/crossed.gv', 
    width = '100%', 
    height = '33%'
  )
)
```

<br>

## Additional Grouping Structure


### Cross-classified models

Oftentimes there will be additional sources of variance beyond one grouping factor. Consider as an example, a visual perception experiment where there are multiple trials for each individual, to go along with specific images displayed to all individuals.  Such data might look like this.

```{r demodata_crossed, echo=FALSE}
crossing(Person = 1:20, Image = letters[1:10]) %>%
  mutate(Score = sample(1:10, 200, replace = TRUE)) %>%
  DT::datatable(
    options = list(
      dom = 'tp',
      autoWidth = F,
      columnDefs = list(
        list(width = '10px', targets = 0:2),
        list(className = 'dt-center', targets = 0:2)
      )
    ),
    rownames = F,
    width = 250
  )
```

<br>
<br>

In this scenario, we have observations clustered within both person and image, but person and image are not nested within one another- all participants see all 10 images.  Such a situation is typically referred to as having *crossed* random effects, which pretty much just means non-nested, or even more to the point, 'default'.  The larger issue for the situations we'll look at next is that we will have multiple sources variances to consider.


#### Example: Student achievement

For this demonstration we'll look at achievement scores for students.  The sources of dependency are due to students having gone to the same primary or secondary schools.  However, in this example, going to a primary school doesn't necessarily mean you'll go to a specific secondary school.  Note also that there are no repeated measures, we see each student only once. Here's a quick look a the data, and for more detail, check the [appendix][Data].


```{r pupil_nurses_setup, echo=FALSE, eval=FALSE}
pupils = read_sav('data/raw_data/joop_hox_data2/9 CrossClass/pupcross.sav') %>%
  as_factor() %>%
  mutate(ACHIEV = as.numeric(as.character(ACHIEV)),
         PUPSEX = factor(PUPSEX, labels = c('male', 'female'))) %>%
  rename(
    achievement = ACHIEV,
    primary_school_id = PSCHOOL,
    secondary_school_id = SSCHOOL,
    sex = PUPSEX,
    ses = PUPSES,
    primary_denominational = PDENOM,
    secondary_denominational = SDENOM
  )  

save(pupils, file='data/pupils.RData')

nurses = read_sav('data/raw_data/joop_hox_data2/2 Basic Model/nurses.sav') %>%
  as_factor() %>%
  rename(experience = experien,
         treatment = expcon,
         sex = gender) %>%
  mutate(
    treatment = factor(treatment, labels = c('Ctrl', 'Training')),
    age = as.numeric(as.character(age)),
    sex = factor(sex, labels = c('Male', 'Female'))
  ) %>%
  select(-starts_with('Z'),-starts_with('C'))

save(nurses, file = 'data/nurses.RData')
```


```{r examine_pupil_data, echo=1}
load('data/pupils.RData')

DT::datatable(pupils,
              options = list(
                dom = 'tp',
                scrollX = T,
                autoWidth = T
              ),
              # columnDefs = list(list(width = '150px', targets = 1),
              #                   list(width = '100px', targets = 3))),
              rownames = F)
```
<br>
<br>

For our mixed model, we'll look at the effects of `sex` and socioeconomic status, `ses`, a six level variable from low to high, on scholastic achievement.  The range of achievement scores is roughly `r round(min(pupils$achievement))` to `r round(max(pupils$achievement))`, with mean of `r round(mean(pupils$achievement), 1)` and standard deviation `r round(sd(pupils$achievement), 1)`.  We'll take into account the clustering at primary school and secondary school.  To incorporate the additional structure in <span class="pack">lme4</span> syntax is very easy- we just do as we did before, though now for both grouping factors[^crossed_notation].

```{r cross_classified, eval=1}
pupils_crossed = lmer(
  achievement ~ 
    sex 
  + ses 
  + (1|primary_school_id) 
  + (1|secondary_school_id),
  data = pupils
)

summary(pupils_crossed, correlation = FALSE)
```


```{r cross_classified_fixed, echo=FALSE}
pupils_crossed %>% 
  extract_fixed_effects() %>% 
  select(-(t:p_value)) %>% 
  kable_df()
```

The fixed effects tell us there is a positive effect of being female on achievement, and in general, relative to lowest SES category, being in the upper categories of SES also has a positive effect.  Now for the random effects.

```{r  cross_classified_random, echo=FALSE}
crossed_var_cor = extract_vc(pupils_crossed, ci_level = 0) 

crossed_var_cor %>%
  kable_df(digits = 2)
```

When we look at the variance components, we see that primary and secondary school contributes about `r perc(sum(crossed_var_cor$variance[1:2])/sum(crossed_var_cor$variance), digits = 0)` of the total variance.  Most of the variance attributable to school comes from the primary school.

If we inspect the random effects, we can see that we now have two sets of effects- `r n_distinct(pupils$primary_school_id)` for the primary schools, and `r n_distinct(pupils$secondary_school_id)` for the secondary.  Both would be incorporated into any pupil-specific prediction.

```{r crossed_re}
glimpse(ranef(pupils_crossed))
```

Let's look at them visually using <span class="pack">merTools</span>.

```{r crossed_re_plot, echo=FALSE}
p = merTools::plotREsim(merTools::REsim(pupils_crossed)) +
  theme_clean()
p

# multiplots with patchwork still do non-transparent background
# library(patchwork)
# p1 = plot_coefficients(pupils_crossed, ranef=T, which_ranef='primary_school_id') +
#   labs(x='Primary School') +
#   lims(y = c(-1.5,1.5)) +
#   theme(axis.ticks = element_blank(),
#         axis.text.x = element_blank())
# p2 = plot_coefficients(pupils_crossed, ranef=T, which_ranef='secondary_school_id') +
#   labs(x='Secondary School', y='') +
#   lims(y = c(-1.5,1.5)) +
#   theme(axis.ticks = element_blank(),
#         axis.text.x = element_blank(),
#         axis.text.y = element_blank())
# {p1 + p2} 
```


The above visualization allows us to actually see less variability among students for secondary schools. Note that we have the usual extensions here if desired.  As an example, we could also do random slopes for student characteristics.

### Hierarchical structure

Now that we have looked at cross-classified models, we can proceed to examine *hierarchical* cluster structuring.  In this situation, we have clusters nested within other clusters, which may be nested within still other clusters.  A typical example might be cities within counties, and counties within states.

#### Example: Nurses and stress

For our demonstration we'll use the nurses data set. Here we are interested in the effect of a training program (`treatment`) on stress levels (on a scale of 1-7) of nurses.  In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital. For more information see the [appendix][Data].

```{r nurses_data, echo=1}
load('data/nurses.RData')

DT::datatable(nurses,
              options = list(
                dom = 'tp',
                scrollX = T,
                autoWidth = T
              ),
              # columnDefs = list(list(width = '150px', targets = 1),
              #                   list(width = '100px', targets = 3))),
              rownames = F)
```

<br>
<br>

For the model, we examine effects of the treatment, as well as several other covariates, at least one  each at the nurse, ward, and hospital levels. Again, when it comes to the fixed effects portion, you can simply think about that part as you would any standard regression, we just add covariates as theory/exploration would suggest.  To incorporate this type of random effects structure is not too different from the cross-classified approach, but does have a slight change to the syntax.

```{r hierarchical, eval=1:2}
nurses_hierarchical = lmer(
  stress ~ 
    age 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize
  + (1 | hospital) 
  + (1 | hospital:ward),
  data = nurses
)

# same thing!
nurses_hierarchical = lmer(
  stress ~ 
    age 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1 | hospital/ward), 
  data = nurses
) 

summary(nurses_hierarchical, correlation = F)
```

```{r hierarchical_fixed, echo=FALSE}
nurses_hierarchical %>% 
  extract_fixed_effects() %>% 
  select(-(t:p_value)) %>% 
  kable_df()
```

<br>

As far as the fixed effects go, about the only thing that doesn't have a statistical effect is ward type[^signflip].  

<br>

```{r hierarchical_random, echo=FALSE}
# note tidy doesn't work with multiple random effects and conf.int
hierarch_var_cor = extract_vc(nurses_hierarchical, ci_level = 0)

hierarch_var_cor %>%
  kable_df()
```

Concerning the random effects, there appears to be quite a bit of variability from ward to ward especially, but also hospital.  Recall that stress is a 7 point scale, so from ward to ward we can expect scores to bounce around about half a  point on average, which is quite dramatic in my opinion. Again we inspect it visually.

```{r hierarchical_re_plot, echo=FALSE}
p = merTools::plotREsim(merTools::REsim(nurses_hierarchical)) +
  theme_clean()
p
```


### Crossed vs. nested

The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different in the result of these two models?

```{r crossed_vs_nested, echo=1:3}
nurses_hierarchical = lmer(
  stress ~ 
    age  
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1 | hospital)
  + (1 | hospital:wardid),  #**#
  data = nurses
)

nurses_crossed = lmer(
  stress ~ 
    age 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1 | hospital) 
  + (1 | wardid),           #**#
  data = nurses
)

hierarch_var_cor %>%  
  kable_df()

crossed_var_cor = extract_vc(nurses_crossed, ci_level = 0)

crossed_var_cor %>%  
  kable_df()
```


Nothing? Good, you're not crazy.  Here's a quote from the [lme4 text](http://lme4.r-forge.r-project.org/book/Ch2.pdf), section 2.2.1.1, which is definitely worth your time.

> The blurring of mixed-effects models with the concept of multiple,
hierarchical levels of variation results in an unwarranted emphasis on 'levels'
when defining a model and leads to considerable confusion. It is perfectly
legitimate to define models having random effects associated with non-nested
factors. The reasons for the emphasis on defining random effects with respect to
nested factors only are that such cases do occur frequently in practice, and that
some of the computational methods for estimating the parameters in the models
can only be easily applied to nested factors. 
> 
This is not the case for the methods used in the lme4 package. *Indeed there is
nothing special done for models with random effects for nested factors*. When
random effects are associated with multiple factors, exactly the same
computational methods are used whether the factors form a nested sequence or are
partially crossed or are completely crossed.

You might have noticed that we were using `wardid` rather than the `ward` grouping variable as in our first example. Even though every ward is unique, the `ward` column labels them with an arbitrary sequence starting with 1 within each hospital.  While this might seem natural, ward 1 in hospital 1 is not the same as ward 1 in hospital 2, so it's probably not a good idea to give them the same label. The `wardid` column properly distinguishes the wards with unique values (e.g. 11, 12).

What would have happened had we used that variable as a crossed random effect?

```{r bad_cross, echo=1}
nurses_crossed_bad_data = lmer(
  stress ~ 
    age 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1|hospital) 
  + (1|ward), 
  data = nurses
)

extract_vc(nurses_crossed_bad_data, ci_level = 0) %>%
  kable_df()
```

This is certainly not the result we want. The variance in `ward` is already captured by treatment and type. However, as demonstrated, this can be avoided with the proper syntax, or proper labeling in the data to allow unique clusters to have unique identifiers.

For more, see this [discussion on Stack Exchange](https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified), as well as [this from the GLMM FAQ](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#nested-or-crossed) from one of the <span class="pack">lme4</span> developers. Josh Errickson at the University of Michigan's CSCAR group also has a nice [write-up with visual depiction](http://errickson.net/stats-notes/vizrandomeffects.html) of the underlying matrices of interest, which served as inspiration for some of the visualization in the next section.

So there you have it.  When it comes to <span class="pack">lme4</span>, or mixed models more generally, crossed vs. nested is simply a state of mind (data)[^crossnest].


## Residual Structure

Sometimes we will want to obtain more specific estimates regarding the residual covariance/correlation structure. This is especially the case in the longitudinal setting, where we think that observations closer in time would be more strongly correlated than those further apart, or that the variance changes over time.  What does this sort of model look like? 

Let's begin by thinking about the covariance/correlation matrix for the entire set of observations for our target variable, and how we want to represent the dependency in those observations.  I'll show a visualization of the first 5 people from our GPA data and modeling situation. Recall that each person has 6 observations.  This display regards the results from our random intercepts (only) model for GPA. 

```{r residual_varcov, echo=FALSE}
rescov <- function(model, data) {
  var.d <- crossprod(getME(model,"Lambdat"))
  Zt <- getME(model,"Zt")
  vr <- sigma(model)^2
  var.b <- vr*(t(Zt) %*% var.d %*% Zt)
  sI <- vr * Diagonal(nrow(data))
  var.y <- var.b + sI
  invisible(var.y)
}
gpa_vis_cov = lmer(gpa ~ occasion + (1|student), data=gpa)
# summary(gpa_vis_cov)

rc1 <- rescov(gpa_vis_cov, gpa)
# image(rc1[1:60,1:60])
rc1[1:30, 1:30] %>%
  as.matrix() %>%
  reshape2::melt() %>%
  mutate(value = factor(round(value, 5))) %>%
  ggplot(aes(Var1, Var2, fill = value)) +
  geom_tile(color = 'gray92', size = 1) +
  scale_y_continuous(trans = 'reverse') +
  scale_fill_manual(values = c('gray92', '#00aaff80', '#ff550080')) +
  theme_void() +
  theme(
    legend.key = ggplot2::element_rect(fill = 'transparent', colour = NA),
    legend.background = ggplot2::element_rect(fill = 'transparent', colour = NA),
    panel.background = ggplot2::element_blank(),
    panel.grid = ggplot2::element_blank(),
    strip.background = ggplot2::element_blank(),
    plot.background = ggplot2::element_rect(fill = "transparent", colour = NA)
  )

  
# ggplotly()

# library(plotly)
# library(viridis)
# plot_ly(z =~ as.matrix(rc1[1:30, 1:30]), type='contour', colors = c('gray92','#00aaff80', '#ff550080')) %>%
#   layout(yaxis = list(autorange = "reversed"))
```

Each block represents the covariance matrix pertaining to observations within an individual. Within the person, there are variances on the diagonal and covariances of observations on the off-diagonal.  When considering the whole data, we can see that observations from one person have no covariance with another person (gray). Furthermore, the covariance within a person is a constant value, the variance is also a constant value.  Where did those values come from? Let's look again at the result from our random intercept model.

```{r residual_varcov2, echo=F}
vc = extract_vc(gpa_vis_cov, ci_level = 0)

totvar = sum(vc$variance)

vc %>%  
  kable_df()
```

Remember that there are two sources of variance in this model, the residual observation level variance, and that pertaining to person. Combined they provide the total residual variance that we aren't already capturing with our model features (i.e. fixed effect variables).  In this case, it's about `r round(totvar, 2)`, the value displayed on our diagonal. The off-diagonal is the variance attributable to student, which we alternately interpreted as an intraclass correlation (dividing by the total variance converts it to the correlation metric).

More generically, and referring to previous notation for our estimated variances, we can see the covariance matrix (for a cluster) as follows.

$$\Sigma = 
\left[
\begin{array}{ccc} 
\color{orange}{\sigma^2 + \tau^2} & \tau^2   & \tau^2  & \tau^2 & \tau^2 & \tau^2   \\
\tau^2   & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2    \\
\tau^2   & \tau^2   & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2  \\
\tau^2   & \tau^2   & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2\\
\tau^2   & \tau^2   & \tau^2  & \tau^2 & \color{orange}{\sigma^2 + \tau^2}  & \tau^2 \\
\tau^2   & \tau^2   & \tau^2  & \tau^2   & \tau^2  & \color{orange}{\sigma^2 + \tau^2} \\
\end{array}\right]$$
<br>

This represents a covariance structure called *compound symmetry*, where correlations, the off-diagonal values, are constant. It is the default in most mixed model settings, and the same as what is shown visually above. Now let's start to think about other types of covariance structures. 

Consider the following model for an individual and just three time points to keep things simpler to show.

$$\boldsymbol{y} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$$

So we have three observations of $y$ that are multivariate normally distributed.  The mean $\mu$ is a function of covariates just like in standard regression.

$$\mu = b_0 + b_1\cdot \mathrm{time} + b_2\cdot x_1 ...$$

However, instead of just plopping an $\epsilon$ at the end, we want to go further in defining the entire residual variance/covariance structure for all three time points.

In the simplest setting of a standard linear regression model, we have constant variance and no covariance.

$$\Sigma = 
\left[
\begin{array}{ccc} 
\sigma^2 & 0   & 0   \\
0   & \sigma^2 & 0   \\
0   & 0   & \sigma^2 \\
\end{array}\right]$$

Next, we can relax the assumption of equal variances, and estimate each separately. In this case of *heterogeneous variances*, we might see more or less variance over time, for example.

$$\Sigma = 
\left[
\begin{array}{ccc} 
\sigma_1^2 & 0   & 0   \\
0   & \sigma_2^2 & 0   \\
0   & 0   & \sigma_3^2 \\
\end{array}\right]$$

Now let's say we actually want to get at the underlying covariance/correlation. I'll switch to the correlation representation, but you can still think of the variances as constant or separately estimated.  So now we have something like this, where $\rho$ represents the residual correlation among observations.

$$\Sigma = \sigma^2
\left[
\begin{array}{ccc} 
1 & \rho_1   & \rho_2   \\
\rho_1   & 1 & \rho_3   \\
\rho_2   & \rho_3   & 1 \\
\end{array}\right]$$


In this case we'd estimate a different correlation for all time point pairs (with constant variance).  This is typically described as an *unstructured*, or simply 'symmetric', correlation structure. 

If you are familiar with repeated measures ANOVA, which is a [special case of a mixed model](https://m-clark.github.io/docs/mixedModels/anovamixed.html), you may recall that the usual assumption is a *sphericity*. Sphericity is a relaxed form of *compound symmetry*, where all the correlations have the same value, i.e. $\rho_1=\rho_2=\rho_3$, and all variances are equal.

Another very commonly used correlation structure (for time-based settings) is an *autocorrelation* structure, of lag order one, for the residuals.  What this means is that we assume the residuals at one time point apart correlate with some value $\rho$, observations at two time points apart correlate $\rho^2$, and so on.  As such we only need to estimate $\rho$, while the rest are then automatically determined.  Here's what it'd look like for four time points.

$$\Sigma = \sigma^2
\left[
\begin{array}{cccc} 
1 & \rho     & \rho^2   & \rho^3   \\
\rho     & 1 & \rho     & \rho^2   \\
\rho^2   & \rho     & 1 & \rho     \\
\rho^3   & \rho^2   & \rho     & 1 \\
\end{array}\right]$$


If our model estimated $\rho$ to be .5, the resulting correlation matrix would look like the following within each cluster.

$$\Sigma = \sigma^2
\left[
\begin{array}{cccc} 
1 & .5       & .25      & .06   \\
.5       & 1 & .5       & .25  \\
.25      & .5       & 1 & .5    \\
.06      & .25      &  .5      & 1 \\
\end{array}\right]$$

Again, the main point is that points further apart in time are assumed to have less correlation. 

Know that there are many patterns and possibilities to potentially consider, and that they are not limited to the repeated measures scenario.  For example, the correlation could represent spatial structure, where units closer together geographically would be more correlated.  And as noted, we could also have variances that are different at each time point[^residstruct].  We'll start with that for the next example.

### Heterogeneous variance

Unfortunately, <span class="pack">lme4</span> does not provide the ability to model the residual covariance structure, at least [not in a straightforward fashion](https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html), though many other mixed model packages do[^lmerho].  In fact, two packages that come with the basic R installation do so, <span class="pack">mgcv</span> and <span class="pack">nlme</span>.  We'll demonstrate with the latter. 

The <span class="pack">nlme</span> package will have a different random effect specification, though not *too* different.  In addition, to estimate heterogeneous variances, we'll need to use an additional `weights` argument.  The following will allow each time point of occasion to have a unique estimate.


```{r heterovar, echo=1:5, eval=-5}
library(nlme)

heterovar_res = lme(
  gpa ~ occasion,
  data = gpa,
  random = ~ 1 | student,
  weights = varIdent(form = ~ 1 | occasion)
)

summary(heterovar_res)

extract_fixed_effects(heterovar_res) %>% 
  kable_df()

vc = extract_vc(heterovar_res, ci_level = 0)
vc %>% 
  kable_df()
```


At this point we're getting the same stuff we're used to. Now the not-so fun part.  For the values we're interested in for this example, i.e. the variances at each occasion, <span class="pack">nlme</span> does not make it easy on someone to understand initially, as the output regards the way things are for estimation, not for what one would usually have to report.  The variances are scaled relative to the first variance estimate, which is actually the reported residual variance in the random effects part. Additionally the values are also on the standard deviation rather than variance scale.  From the default output display, we can see that variance decreases over time in this case, but the actual values are not provided.

```{r heterovar_variances}
summary(heterovar_res$modelStruct)
```

Relative values are fine, I guess, but what we'd want are the actual estimates.  Here's how you can get them using the residual standard deviation to scale those values, then square them to get on the variance scale.

```{r heterovar_extract_vars_from_nlmes_cold_dead_hands}
(c(1.0000000, coef(heterovar_res$modelStruct$varStruct, unconstrained = FALSE))*heterovar_res$sigma)^2
```

Yeah.  You'll have to look this up every time you want to do it, or just make your own function that takes the model input.  Here is how my function would extract the information.

```{r mixedup-extract_het_var}
mixedup::extract_het_var(heterovar_res, scale = 'var')
```

Another alternative to keep in mind is <span class="pack">glmmTMB</span>, which is provided by the primary developer of <span class="pack" style = "">lme4</span>.  It would allow one to stay more in the <span class="pack">lme4</span> syntax style for similar models, and provide similar looking output. In this case we have a special syntax for the case of heterogeneous variances, specifically using tthe <span class="func" style = "">diag</span> function within the model formula.

```{r glmmTMB_hetero}
library(glmmTMB)

heterovar_res2 = glmmTMB(
  gpa ~ occasion + (1|student) + diag(0 + occas |student), 
  data = gpa
)

summary(heterovar_res2)
```

Note that the variances displayed for each time point are not conflated with the residual variance. To compare with <span class="pack">nlme</span>, just add the residual variance to those estimates.  As with every mixed model package (apparently), it still takes a bit to get the variances in a usable form from the <span class="func">VarCorr</span> object. The following shows how though, and compares to the <span class="pack">nlme</span> result.

```{r extract-het-var-glmmtmb, eval = FALSE}
vc_glmmtmb = VarCorr(heterovar_res2)
vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2
```

Here is the output from my function:

```{r mixedup-extract_het_var-tmb}
mixedup::extract_het_var(heterovar_res2, scale = 'var')
```

In any case, putting these together shows we get the same result.

```{r glmmTMB_hetero_compare_nlme, echo=F}
vc_glmmtmb = extract_het_var(heterovar_res2, scale = 'var')[-1] # remove group id
vc_nlme = extract_het_var(heterovar_res, scale = 'var')

names(vc_glmmtmb) = names(vc_nlme) = 
  unite(expand_grid(paste('year', 1:3), paste('sem', 1:2)), 'x', everything(), sep = ' ')$x

vc_compare = rbind(glmmTMB = vc_glmmtmb, nlme = vc_nlme) 

vc_compare %>% 
  kable_df()
```


### Autocorrelation

The following example demonstrates the autocorrelation structure we described previously. In <span class="pack">nlme</span> we use the built-in <span class="func">corAR1</span> function and `correlation` argument similar to how we did with the `weights` argument for heterogeneous variances.  

```{r corr_residual, echo=1:5, eval=-5}
library(nlme)

corr_res = lme(
  gpa ~ occasion,
  data = gpa,
  random = ~ 1 | student,
  correlation = corAR1(form = ~ occasion)
)

summary(corr_res)

extract_fixed_effects(corr_res) %>% 
  kable_df()

vc = extract_vc(corr_res, ci_level = 0)
vc %>% 
  kable_df()
```
<br>


Notice first that the fixed effect for occasion is the same as [before][Mixed model]. The variance estimates have changed slightly along with the variances of the fixed effects (i.e. the standard errors). The main thing is that we have a new parameter called `Phi` in the <span class="pack">nlme</span> output that represents our autocorrelation, with value of `r round(coef(corr_res$model$corStruct, unconstrained = F), 3)`. This suggests at least some correlation exists among the residuals for observations next to each other in time, though it diminishes quickly as observations grow further apart.

Note that glmmTMB will analyze such structure as well. However, we need the factor form for occasion for this specification `occas`, and also note how it is part of model formula, like another random effect. More on this in the [supplemental section][Correlation Structure Revisited].

```{r corr_residual_glmmTMB, eval=FALSE, echo=1:4}
corr_res_tmb = glmmTMB(
  gpa ~ occasion +  ar1(0 + occas | student) + (1 | student),
  data = gpa
)

corr_res_brm = brms::brm(
  gpa ~ occasion +  ar(occasion, student) + (1 | student),
  data = gpa,
  cores = 4
)

summarise_model(corr_res)
summary(corr_res_tmb)
summarise_model(corr_res_brm)

extract_cor_structure(corr_res_tmb, which_cor = 'ar1')
extract_cor_structure(corr_res_brm)
extract_cor_structure(corr_res)
```

We can use my package function to extract the autocorrelation parameter from these types of models.

```{r extract-corr}
mixedup::extract_cor_structure(corr_res)
```

```{r glmmTMB_autocor, eval=FALSE, echo=FALSE}
# currently as of 2018-2 there is a bug in VarCorr at the end where it calls
# structure() when adding random intercept. Thus the model runs but one can't do
# anything with it because practically everything calls VarCorr; Inspection
# suggests the estimate (0.80877943) duplicates
# https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html and brms
# gets .83

# update 2020-09  brms and nlme are notably closer with gpa data using brms new approach to ar structure, while glmmTMB estimates no student variance and correlation value almost double the others with the gpa data

simCor1 <- function(
  phi = 0.8,
  sdgrp = 2,
  sdres = 1,
  npergrp = 20,
  ngrp = 20,
  seed = NULL,
  ## set linkinv/simfun for GLMM sims
  linkinv = identity,
  simfun = identity
) {
  
  if (!is.null(seed))
    set.seed(seed)
  cmat <- sdres * phi ^ abs(outer(0:(npergrp - 1), 0:(npergrp - 1), "-"))
  errs <- MASS::mvrnorm(ngrp, mu = rep(0, npergrp), Sigma = cmat)
  ranef <- rnorm(ngrp, mean = 0, sd = sdgrp)
  d <- data.frame(f = rep(1:ngrp, each = npergrp))
  eta <-
    ranef[as.numeric(d$f)] + c(t(errs)) ## unpack errors by row
  mu <- linkinv(eta)
  d$y <- simfun(mu)
  d$tt <- factor(rep(1:npergrp, ngrp))
  return(d)
}

d <- simCor1(
  phi = 0.8,
  sdgrp = 2,
  sdres = 1,
  seed = 101
)

corr_res2 = glmmTMB(gpa ~ occasion + (1 | student) + ar1(0 + occasion | student),
                    data = gpa)
summary(corr_res2)
VarCorr(corr_res2, sigma = .16707)
corr_res2$obj$env$report(corr_res2$fit$parfull)$corr[[2]]

test = brm(
  gpa ~ occasion + (1 | student),
  data = gpa,
  autocor = cor_ar(form = ~ occas),
  cores = 4
)

(lme_simple_fit <- lme(
  y ~ 1,
  random =  ~ 1 | f,
  data = d,
  correlation = corAR1()
))
glmmTMB_simple_fit <-
  glmmTMB(y ~ 1 + (1 | f) + ar1(tt - 1 | f), data = d, family = gaussian)
glmmTMB_simple_fit

glmmTMB_simple_fit$obj$env$report(glmmTMB_simple_fit$fit$parfull)$corr[[2]][2, 1]
corr_res2$obj$env$report(corr_res2$fit$parfull)$corr[[2]][2, 1]
```


## Generalized Linear Mixed Models

Just as generalized linear models extend the standard linear model, we can generalize (linear) mixed models to *generalized linear mixed models*.  Furthermore, there is nothing restricting us to only the exponential family, as other packages would potentially allow for many other target distributions.

For this example, we'll do a logistic mixed model with a binary outcome. In this case, we'll use the speed dating data set. In this demonstration, the experimental setting had randomly assigned each participant to ten short (four minutes) dates with other participants. For each date, each person rated six attributes (attractive, sincere, intelligent, fun, ambitious, shared interests) of the other person on a 10-point scale and wrote down whether he or she would like to see the other person again.

Our target variable is whether the participant would be willing to date the person again (`decision`, recorded as Yes or No).  To keep things simple, the predictors will be limited to the sex of the participant (`sex`), whether the partner was of the same race (`samerace`), and three of the attribute ratings the participant gave of their partner- attractiveness (`attractive`), sincerity (`sincere`), and intelligence (`intelligent`). The latter have been scaled to have zero mean and standard deviation of .5, which puts them on a more even footing with the binary covariates (`_sc`)[^scalecontbin].


```{r speed_dating, echo=FALSE, eval=FALSE}
speed_dating0 = readr::read_csv('data/raw_data/ARM_Data/Speed Dating Data.csv')

speed_dating = speed_dating0 %>%
  select(1:17, attr, sinc, intel, fun, amb, shar, dec) %>%
  rename(
    id_win_wave = id,
    sex = gender,
    partner_id = pid,
    n_met_in_wave = round,
    partner_age = age_o,
    partner_race = race_o,
    attractive = attr,
    sincere = sinc,
    intelligent = intel,
    fun = fun,
    ambitious = amb,
    shared_interests = shar,
    decision = dec
  ) %>%
  mutate(
    decision = factor(decision, labels = c('No', 'Yes')),
    sex = factor(sex, labels = c('Female', 'Male')),
    samerace = factor(samerace, labels = c('No', 'Yes')),
    attractive_sc = scale(attractive, scale = .5)[, 1],
    sincere_sc = scale(sincere, scale = .5)[, 1],
    intelligent_sc = scale(intelligent, scale = .5)[, 1],
    fun_sc = scale(fun, scale = .5)[, 1],
    ambitious_sc = scale(ambitious, scale = .5)[, 1],
    shared_interests_sc = scale(shared_interests, scale = .5)[, 1]
  ) %>%
  group_by(iid) %>%
  mutate(never_always = if_else(all(decision == 'Yes') |
                                  all(decision == 'No'), 1, 0)) %>%
  ungroup() %>%
  filter(never_always == 0) %>%  # as in Fahrmeier
  select(-never_always)
# describeAll(speed_dating)

save(speed_dating, file = 'data/speed_dating.RData')
```

```{r glmm_init, eval=FALSE, echo=FALSE}
# pretty much dupes fahrmeier although their table has a typo, and their would be 500, not 390 individuals after getting rid of constant
# sd_model = glmer(decision ~ sex*attractive_sc + sex*shared_interests_sc
#                  + (1|iid), data=speed_dating, family=binomial)   
load('data/speed_dating.RData')
sd_model = glmer(
  decision ~ 
    sex 
  + samerace 
  + attractive_sc 
  + sincere_sc
  + intelligent_sc
  + (1 | iid),
  data = speed_dating,
  family = binomial
)
summary(sd_model, correlation = F)
glmm_var_cor = tidy(VarCorr(sd_model)) %>%   # because for some reason knitr can't find an object it just used in the previous chunk.
  select(-var2) %>%
  rename(variance = vcov, sd = sdcor) %>%
  mutate_if(is.numeric, arm::fround, digits = 2)
save(sd_model, glmm_var_cor, file = 'data/speed_dating_model.RData')
```

To fit the model, we'll use the <span class="func" style = "">glmer</span> function and specify the family as `binomial`, just as we would for the usual logistic setting using <span class="func" style = "">glm</span> in base R.

```{r glmm_speed_dating, eval=FALSE}
load('data/speed_dating.RData')

sd_model = glmer(
  decision ~ 
    sex 
  + samerace 
  + attractive_sc 
  + sincere_sc 
  + intelligent_sc
  + (1 | iid),
  data   = speed_dating,
  family = binomial
)

summary(sd_model, correlation = FALSE)
```


```{r glmm_fixed, echo=FALSE}
load('data/speed_dating_model.RData')

extract_fixed_effects(sd_model) %>% 
  kable_df()
```
<br>

The fixed effects results are as expected for the attributes, with attractiveness being a very strong effect in particular, while sex of the participant was statistically negligible.  You are free to exponentiate the coefficients to get the odds ratios if desired, just as you would with standard logistic regression. Now for the random effects result.

<br>

```{r glmm_random, echo=FALSE, eval=TRUE}
extract_vc(sd_model, ci_level = 0) %>% 
  select(-var_prop) %>% 
  kable_df()
```

<br>

For the variance components, notice that there is no residual variance. This is because we are not modeling with the normal distribution for the target variable, thus there is no $\sigma$ to estimate.  However, the result suggests that there is quite a bit of variability from person to person on the logit scale.


## Exercises for Extensions


### Sociometric data

In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size.


```{r socio_setup, echo=FALSE, eval=FALSE}
soc = read_spss('data/raw_data/joop_hox_data2/9 CrossClass/SocsLong.sav')
glimpse(soc)


sociometric = soc %>% 
  mutate(sexsend = factor(sexsend, labels = c('Male', 'Female')),  # from text 0 male, 1 female
         sexrec = factor(sexrec, labels = c('Male', 'Female')))

save(sociometric, file='data/sociometric.RData')
```

To run a mixed model, we will have three sources of structure to consider:

- senders (within group)
- receivers (within group)
- group

First, load the sociometric data. 

```{r load_socio}
load('data/sociometric.RData')
```


To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run.

- Model 1: No covariates, only sender and receiver random effects. Note that even though we don't add group yet, still use the nesting approach to specify the effects (e.g. `1|group:receiver`)
- Model 2: No covariates, add group random effect
- Model 3: Add all covariates: `agesend/rec`, `sexsend/rec`, and `grsize` (group size)
- Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model `sexsend:sexrec`.
- Compare models with AIC (see the note about [model comparison][model comparison]), e.g. `AIC(model_1, model_2, ...)`. A lower value would indicate the model is preferred.


```{r socio, echo=F, eval=FALSE}
model1 = lmer(rating ~ (1|group:sender) + (1|group:receiver), 
              data = sociometric)
summary(model1, correlation = FALSE) 

model2 = lmer(rating ~ (1|group:sender) + (1|group:receiver) + (1|group), 
              data = sociometric)
summary(model2, correlation = F) 

model3 = lmer(rating ~ sexsend + sexrec + agesend + agerec + grsize + (1|group:sender) + (1|group:receiver) + (1|group), 
             data = sociometric)
summary(model3, correlation = FALSE)

model4 = lmer(
  rating ~ sexsend*sexrec + agesend + agerec + grsize +
    (1|group:sender) + (1|group:receiver) + (1|group), 
  data = sociometric)
summary(model4, correlation = FALSE)

c(AIC(model1), AIC(model2), AIC(model3), AIC(model4))
```


### Patents

Do a Poisson mixed effect model using the [patent data][Data].  Model the number of citations (`ncit`) based on whether there was opposition (`opposition`) and if it was for the biotechnology/pharmaceutical industry (`biopharm`). Use year as a random effect to account for unspecified economic conditions.  

```{r patent_setup, echo=FALSE, eval=FALSE}
patents0 = readr::read_tsv('data/raw_data/patent.raw')
patents = patents0 %>% 
  rename(opposition = opp) 
save(patents, file='data/patents.RData')
glmer(ncit ~ opposition +  biopharm + (1|year), data=patents, family='poisson')
```


```{r patent_starter, eval=FALSE}
load('data/patents.RData')
```


Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (`subject` in the data).  In other words, no specific clustering or grouped structure is necessary, but we can use the random effect approach to get at the extra variance beyond what would be assumed with a Poisson distribution.


[^residstruct]: One reason to do so would be that you expect variability to decrease over time, e.g. due to experience.  You might also allow that variance to be different due to some other grouping factor entirely (e.g. due to treatment group membership).

[^crossed_notation]: I don't show the formal model here as we did before, but this is why depicting mixed models solely as 'multilevel' becomes a bit problematic in my opinion. In the standard mixed model notation it's straightforward though, you just add an additional random effect term, just as we do in the actual model syntax.

[^signflip]: Setting aside our discussion to take a turn regarding regression modeling more generally, this is a good example of 'surprising' effects not being so surprising when you consider them more closely.  Take a look at the effect of experience. More experience means less stress, this is probably not surprising.  Now look at the age effect. It's positive! But wouldn't older nurses have more experience? What's going on here? When interpreting experience, it is with age *held constant*, thus more experience helps with lowering stress no matter what your age.  With age, we're holding experience constant.  If experience doesn't matter, being older is affiliated with more stress, which might be expected given the type of very busy and high pressure work often being done (the mean age is `r median(nurses$age)`).  A good way to better understand this specifically is to look at predicted values when age is young, middle, and older vs. experience levels at low, middle, and high experience, possibly explicitly including the interaction of the two in the model.  Also note that if you take experience out of the model, the age effect is negative, which is expected, as it captures experience also.

[^crossnest]: Just a reminder, it *does* matter if you label your data in a less than optimal fashion.  For example, if in the nesting situation you start your id variable at 1 for each nested group, then you have to use the nested notation in <span class="pack">lme4</span>, otherwise, e.g. it won't know that id = 1 in group 1 is different from id 1 in group 2.  In our hospital example, this would be akin to using `ward` instead of `wardid` as we did.  Again though, this wouldn't be an issue if one practices good data habits.  Note also the `:` syntax. In other modeling contexts in R this denotes an interaction, and that is no different here.  In some contexts, typically due to experimental designs, one would want to explore random effects of the sort 1|A, 1|B and 1|A:B.  However, this is relatively rare.

[^lmerho]: This feature request has been made by its users for over a decade at this point- it's not gonna happen.  The issue is that the way <span class="pack">lmer</span> works by default employs a method that won't allow it (this is why it is faster and better performing than other packages). Unfortunately the common response to this issue is 'use <span class="pack">nlme</span>'.  However many other packages work with <span class="pack">lme4</span> model results, but not <span class="pack">nlme</span>, and if you aren't going to use <span class="pack">lme4</span> for mixed models you might as well go Bayesian with <span class="pack">rstanarm</span> or <span class="pack">brms</span> instead of <span class="pack">nlme</span>.  I would even prefer <span class="pack">mgcv</span> to <span class="pack">nlme</span> because of the other capabilities it provides, and the model objects created are easier to work with in my opinion.

[^scalecontbin]:  Note that for a balanced binary variable, the mean `p=.5` and standard deviation is `sqrt(p*(1-p)) = .5`