14-lda.qmd

# Linear discriminant analysis {#sec-lda}
\index{classification!linear discriminant analysis (LDA)}

Linear discriminant analysis (LDA) dates to the early 1900s. It's one of the most elegant and simple techniques for both modeling separation between groups, and as an added bonus, producing a low-dimensional representation of the differences between groups. LDA has two strong assumptions: the groups are samples from multivariate normal distributions, and each have the same variance-covariance. If the latter assumption is relaxed, a slightly less elegant solution results from quadratic discriminant analysis.

Useful explanations can be found in @VR02 and @Ri96. A good general treatment of parametric methods for supervised classification can be found in @JW02 or another similar multivariate analysis textbook. It's also useful to know that hypothesis testing for the difference in multivariate means using multivariate analysis of variance (MANOVA) has similar assumptions to LDA. Also model-based clustering assumes that each cluster arises from a multivariate normal distribution, and is related to LDA. The methods described here can be used to check these assumptions when applying these methods, too.\index{classification!multivariate analysis of variance (MANOVA)} \index{cluster analysis!model-based} 

::: {.content-visible when-format="html"}
::: info
Because LDA is a parametric model it is important to check that these assumptions are reasonably satisfied:

- shape of clusters are elliptical.
- spread of the observations are the same.
:::
:::

::: {.content-visible when-format="pdf"}
\infobox{Because LDA is a parametric model it is important to check that these assumptions are reasonably satisfied:
\begin{itemize} \itemsep 0in
\item shape of clusters are elliptical.
\item spread of the observations are the same.
\end{itemize}
}
:::

<!-- Algorithmic methods have overtaken parametric methods in the practice of supervised classification.  A parametric method such as linear discriminant analysis (LDA) yields a set of interpretable output parameters, so it leaves a clear trail helping us to understand what was done to produce the results.  An algorithmic method, on the other hand, is more or less a black box, with various input parameters that are adjusted to tune the algorithm.  The algorithm's input and output parameters do not always correspond in any obvious way to the interpretation of the results.  Because if it's simplicity, if the assumptions (approximately) hold, LDA can provide an elegant classification solution.-->

## Extracting the key elements of the model

LDA builds the model on the between-group sum-of-square matrix

$$B=\sum_{k=1}^g n_k(\bar{X}_k-\bar{X})(\bar{X}_k-\bar{X})^\top$$
which measures the differences between the class means, 
compared with the overall data mean $\bar{X}$ and the within-group sum-of-squares matrix,

$$
W =
\sum_{k=1}^g\sum_{i=1}^{n_k}
(X_{ki}-\bar{X}_k)(X_{ki}-\bar{X}_k)^\top
$$

which measures the variation of values around each class mean. The linear discriminant space is generated by computing the eigenvectors (canonical coordinates) of $W^{-1}B$, and this is the $(g-1)$-D space where the group means are most separated with respect to the
pooled variance-covariance. For each class we compute
\index{classification!discriminant space}

$$
\delta_k(x) = (x-\mu_k)^\top W^{-1}\mu_k + \log \pi_k
$$

where $\pi_k$ is a prior probability for class $k$ that might be based on unequal sample sizes, or cost of misclassification. The LDA classifier rule is to *assign a new observation to the class with the largest value* of $\delta_k(x)$.

We can fit an LDA model using the `lda()` function from the `MASS` package. Here we have used the `penguins` data, assuming equal prior probability, to illustrate. 

```{r}
#| message: false
#| code-fold: false
# Code to fit the model
library(dplyr)
library(mulgar)
library(MASS)
load("data/penguins_sub.rda")

p_lda <- lda(species~bl+bd+fl+bm, 
             data=penguins_sub,
             prior=c(1/3, 1/3, 1/3))
options(digits=2)
# p_lda
```

Because there are three classes the dimension of the discriminant space is 2D. We can easily extract the group means from the model. 

```{r}
#| code-fold: false
# Extract the sample means
p_lda$means
```

The coefficients to project the data into the discriminant space, that is the eigenvectors of $W^{-1}B$ are: 

```{r}
#| code-fold: false
# Extract the discriminant space
p_lda$scaling
```

and the predicted values, which include class predictions, and coordinates in the discriminant space are generated as:

```{r}
#| code-fold: false
# Extract the fitted values
p_lda_pred <- predict(p_lda, penguins_sub)
```

The best separation between classes can be viewed from this object, which can be shown to match the original data projected using the `scaling` component of the model object (see @fig-p-lda).

```{r}
#| label: fig-p-lda
#| warning: false
#| fig-cap: "Penguins projected into the 2D discriminant space, done two ways: (a) using the predicted values, (b) directly projecting using the model component. The scale is not quite the same but the projected data is identical in shape."
#| message: false
#| fig-width: 8
#| fig-height: 4
#| code-summary: "Code to generate LDA plots"
# Check calculations from the fitted model, and equations
library(colorspace)
library(ggplot2)
library(ggpubr)
# Using the predicted values from the model object
p_lda_pred_x1 <- data.frame(p_lda_pred$x)
p_lda_pred_x1$species <- penguins_sub$species
p_lda1 <- ggplot(p_lda_pred_x1, 
                 aes(x=LD1, y=LD2, 
                     colour=species)) + 
  geom_point() +
  xlim(-6, 8) + ylim(-6.5, 5.5) +
  scale_color_discrete_divergingx("Zissou 1") +
  ggtitle("(a)") +
  theme_minimal() +
  theme(aspect.ratio = 1, legend.title = element_blank()) 

# matches the calculations done manually
p_lda_pred_x2 <- data.frame(as.matrix(penguins_sub[,1:4]) %*%
                              p_lda$scaling)
p_lda_pred_x2$species <- penguins_sub$species
p_lda2 <- ggplot(p_lda_pred_x2, 
                 aes(x=LD1, y=LD2, 
                     colour=species)) + 
  geom_point() +
  xlim(-6, 8) + ylim(-7, 5.5) +
  scale_color_discrete_divergingx("Zissou 1") +
  ggtitle("(b)") +
  theme_minimal() +
  theme(aspect.ratio = 1, legend.title = element_blank()) 
ggarrange(p_lda1, p_lda2, ncol=2, 
          common.legend = TRUE, legend = "bottom")

```

The $W$ and $B$ matrices cannot be extracted from the model object, so we need to compute these separately. We only need $W$ actually. It is useful to think of this as the pooled variance-covariance matrix. Because the assumption for LDA is that the population group variance-covariances are identical, we estimate this by computing them for each class and then averaging them to get the pooled variance-covariance matrix. It's laborious, but easy.

```{r}
#| code-fold: false
# Compute pooled variance-covariance
p_vc_pool <- mulgar::pooled_vc(penguins_sub[,1:4],
                               penguins_sub$species)
p_vc_pool
```

This can be used to draw an ellipse corresponding to the pooled variance-covariance that is used by the LDA model.

```{r}
#| echo: false
#| eval: false
ginv(as.matrix(p_vc_pool))
```

## Checking assumptions

This LDA approach is widely applicable, but it is useful
to check the underlying assumptions on which it depends: (1) that the cluster structure corresponding to each class forms an ellipse, showing that the class is consistent with a sample from a multivariate normal distribution, and (2) that the variance of values around each mean is nearly the same. @fig-lda-assumptions1 and @fig-lda-assumptions2 illustrates two datasets, of which only one is consistent with these assumptions. Other parametric models, such as quadratic discriminant analysis or logistic regression, also depend on assumptions about the data which should be validated.  \index{classification!quadratic discriminant analysis (QDA)} \index{classification!logistic regression}

::: {.content-visible when-format="html"}
::: info
To check the equal and elliptical variance-covariance assumption, generate points on the surface of an ellipse corresponding to the variance-covariance for each group. When watching these ellipses in a tour, they should similar in all projections.
:::
:::

::: {.content-visible when-format="pdf"}
\infobox{To check the equal and elliptical variance-covariance assumption, generate points on the surface of an ellipse corresponding to the variance-covariance for each group. When watching these ellipses in a tour, they should similar in all projections.
}
:::

\index{classification!variance-covariance}
\index{variance-covariance}

```{r}
# Generate ellipses for each group's variance-covariance
p_ell <- NULL
for (i in unique(penguins_sub$species)) {
  x <- penguins_sub %>% dplyr::filter(species == i)
  e <- gen_xvar_ellipse(x[,1:2], n=150, nstd=1.5)
  e$species <- i
  p_ell <- bind_rows(p_ell, e)
}
```

```{r echo=knitr::is_html_output()}
#| label: fig-lda-assumptions1
#| fig-cap: "Scatterplot of flipper length by bill length of the penguins data, and corresponding variance-covariance ellipses. There is a small amount of difference between the ellipses, but they are similar enough to be confident in assuming the population variance-covariances are equal."
#| message: false
#| fig-width: 8
#| fig-height: 4
#| code-summary: "Code for penguins data and ellipse plots"
#| code-fold: true
lda1 <- ggplot(penguins_sub, aes(x=bl, 
                         y=bd, 
                         colour=species)) +
  geom_point() +
  scale_color_discrete_divergingx("Zissou 1") +
  xlim(-2.5, 3) + ylim(-2.5, 2.5) +
  ggtitle("(a)") +
  theme_minimal() +
  theme(aspect.ratio = 1) 
lda2 <- ggplot(p_ell, aes(x=bl, 
                         y=bd, 
                         colour=species)) +
  geom_point() +
  scale_color_discrete_divergingx("Zissou 1") +
  xlim(-2.5, 3) + ylim(-2.5, 2.5) +
  ggtitle("(b)") +
  theme_minimal() +
  theme(aspect.ratio = 1)
ggarrange(lda1, lda2, ncol=2, 
          common.legend = TRUE, legend = "bottom")
```

```{r echo=knitr::is_html_output()}
#| label: fig-lda-assumptions2
#| fig-cap: "Scatterplot of distance to cfa and road for the bushfires data, and corresponding variance-covariance ellipses. There is a lot of difference between the ellipses, so it cannot be assumed that the population variance-covariances are equal."
#| message: false
#| fig-width: 8
#| fig-height: 4
#| code-summary: "Code for bushfires data and ellipse plots"
#| code-fold: true
# Now repeat for a data set that violates assumptions
data(bushfires)
lda3 <- ggplot(bushfires, aes(x=log_dist_cfa, 
                         y=log_dist_road, 
                         colour=cause)) +
  geom_point() +
  scale_color_discrete_divergingx("Zissou 1") +
  xlim(6, 11) + ylim(-1, 10.5) +
  ggtitle("(a)") +
  theme_minimal() +
  theme(aspect.ratio = 1)
b_ell <- NULL
for (i in unique(bushfires$cause)) {
  x <- bushfires %>% dplyr::filter(cause == i)
  e <- gen_xvar_ellipse(x[,c(57, 59)], n=150, nstd=2)
  e$cause <- i
  b_ell <- bind_rows(b_ell, e)
}
lda4 <- ggplot(b_ell, aes(x=log_dist_cfa, 
                         y=log_dist_road, 
                         colour=cause)) +
  geom_point() +
  scale_color_discrete_divergingx("Zissou 1") +
  xlim(6, 11) + ylim(-1, 10.5) +
  ggtitle("(b)") +
  theme_minimal() +
  theme(aspect.ratio = 1)
ggarrange(lda3, lda4, ncol=2, 
          common.legend = TRUE, legend = "bottom")

```
\index{data!bushfires}

::: {.content-visible when-format="html"}
::: insight
The equal and elliptical variance-covariance assumption is reasonable for the penguins data because the ellipse shapes roughly match the spread of the data. It is not a suitable assumption for the bushfires data, because the spread is not elliptically-shaped and varies in size between groups.
:::
:::

::: {.content-visible when-format="pdf"}

\insightbox{The equal and elliptical variance-covariance assumption is reasonable for the penguins data because the ellipse shapes roughly match the spread of the data. It is not a suitable assumption for the bushfires data, because the spread is not elliptically-shaped and varies in size between groups.}
:::

This approach extends to any dimension. We would use the same projection sequence to view both the data and the variance-covariance ellipses, as in `r ifelse(knitr::is_html_output(), '@fig-penguins-lda-ellipses-html', '@fig-penguins-lda-ellipses-pdf')`. It can be seen that there is some difference in the shape and size of the ellipses between species, in some projections, and also with the spread of points in the projected data. However, it is the differences are small, so it would be safe to assume that the population variance-covariances are equal.

```{r echo=knitr::is_html_output()}
#| message: false
#| eval: false
#| code-summary: "Code for making animated gifs"
#| code-fold: true
library(tourr)
p_ell <- NULL
for (i in unique(penguins_sub$species)) {
  x <- penguins_sub %>% dplyr::filter(species == i)
  e <- gen_xvar_ellipse(x[,1:4], n=150, nstd=1.5)
  e$species <- i
  p_ell <- bind_rows(p_ell, e)
}
p_ell$species <- factor(p_ell$species)
load("data/penguins_tour_path.rda")
animate_xy(p_ell[,1:4], col=factor(p_ell$species))
render_gif(penguins_sub[,1:4], 
           planned_tour(pt1), 
           display_xy(half_range=0.9, axes="off", col=penguins_sub$species),
           gif_file="gifs/penguins_lda1.gif",
           frames=500,
           loop=FALSE)
render_gif(p_ell[,1:4], 
           planned_tour(pt1), 
           display_xy(half_range=0.9, axes="off", col=p_ell$species),
           gif_file="gifs/penguins_lda2.gif",
           frames=500,
           loop=FALSE)
```

::: {.content-visible when-format="html"}

::: {#fig-penguins-lda-ellipses-html layout-ncol=2}

![Data](gifs/penguins_lda1.gif){#fig-lda-4D-assumptions1 fig-alt="Animation showing a tour of the penguins data, with colour indicating species. The spread of points in each group is reasonably similar regardless of projection." width=300}

![Variance-covariance ellipses](gifs/penguins_lda2.gif){#fig-lda-4D-assumptions2 fig-alt="Animation showing a tour of the ellipses corresponding to variance-covariance matrices for each species. The shape of the ellipse for each group is reasonably similar regardless of projection." width=300}

Checking the assumption of equal variance-covariance matrices for the 4D penguins data. Each ellipse corresponds to the sample variance-covariance for each species.
:::
:::

::: {.content-visible when-format="pdf"}

::: {#fig-penguins-lda-ellipses-pdf layout-ncol=2}
![Data](images/penguins_lda1.png){#fig-lda-4D-assumptions1 fig-alt="Single frame from a tour of the penguins data, with colour indicating species. Each group has slightly negatively elliptical shaped spread."}

![Variance-covariance ellipses](images/penguins_lda2.png){#fig-lda-4D-assumptions2 fig-alt="Single frame from a tour of the ellipses corresponding to variance-covariance matrices for each species. Each ellipse has a north-west to south-east oriention. The red group is slightly smaller and more elliptical than the other two groups."}

Checking the assumption of equal variance-covariance matrices for the 4D penguins data. Each ellipse corresponds to the sample variance-covariance for each species.
:::
:::

As a further check, we could generate three ellipses corresponding to the pooled variance-covariance matrix, as would be used in the model, centered at each of the means. Overlay this with the data, as done in `r ifelse(knitr::is_html_output(), '@fig-penguins-lda-pooled-html', '@fig-penguins-lda-pooled-pdf')`. Now you will compare the spread of the observations in the data, with the elliptical shape of the pooled variance-covariance. If it matches reasonably we can safely use LDA. This can also be done group by group when multiple groups make it difficult to view all together. 

::: {.content-visible when-format="html"}

::: info
To check the fit of the equal variance-covariance assumption, simulate points on the ellipse corresponding to the **pooled sample variance-covariance matrix**. Generate one for each group centered at the group mean, and compare with the data.
:::
:::

::: {.content-visible when-format="pdf"}
\infobox{To check the fit of the equal variance-covariance assumption, simulate points on the ellipse corresponding to the \emph{pooled sample variance-covariance matrix}. Generate one for each group centered at the group mean, and compare with the data.}
:::

\index{classification!pooled variance-covariance}
\index{pooled variance-covariance}

```{r}
#| message: false
#| code-summary: "Code for adding ellipses to data"
#| code-fold: true
# Create an ellipse corresponding to pooled vc
pool_ell <- gen_vc_ellipse(p_vc_pool, 
                           xm=rep(0, ncol(p_vc_pool)))

# Add means to produce ellipses for each species
p_lda_pool <- data.frame(rbind(
  pool_ell +
    matrix(rep(p_lda$means[1,],
      each=nrow(pool_ell)), ncol=4),
  pool_ell +
    matrix(rep(p_lda$means[2,],
      each=nrow(pool_ell)), ncol=4),
  pool_ell +
    matrix(rep(p_lda$means[3,],
      each=nrow(pool_ell)), ncol=4)))
# Create one data set with means, data, ellipses
p_lda_pool$species <- factor(rep(levels(penguins_sub$species),
                          rep(nrow(pool_ell), 3)))
p_lda_pool$type <- "ellipse"
p_lda_means <- data.frame(
  p_lda$means,
  species=factor(rownames(p_lda$means)),
                          type="mean")
p_data <- data.frame(penguins_sub[,1:5], 
                     type="data")
p_lda_all <- bind_rows(p_lda_means,
                       p_data,
                       p_lda_pool)
p_lda_all$type <- factor(p_lda_all$type, 
   levels=c("mean", "data", "ellipse"))
shapes <- c(3, 4, 20)
p_pch <- shapes[p_lda_all$type]
```

```{r echo=knitr::is_html_output()}
#| eval: false
#| message: false
#| code-summary: "Code to generate animated gifs"
#| code-fold: true
# Code to run the tour
animate_xy(p_lda_all[,1:4], col=p_lda_all$species, pch=p_pch)
load("data/penguins_tour_path.rda")
render_gif(p_lda_all[,1:4], 
           planned_tour(pt1), 
           display_xy(col=p_lda_all$species, pch=p_pch, 
                      axes="off", half_range = 0.7),
           gif_file="gifs/penguins_lda_pooled1.gif",
           frames=500,
           loop=FALSE)

# Focus on one species
render_gif(p_lda_all[p_lda_all$species == "Gentoo",1:4], 
           planned_tour(pt1), 
           display_xy(col="#F5191C", 
                      pch=p_pch[p_lda_all$species == "Gentoo"], 
                      axes="off", half_range = 0.7),
           gif_file="gifs/penguins_lda_pooled2.gif",
           frames=500,
           loop=FALSE)
```

::: {.content-visible when-format="html"}

::: {#fig-penguins-lda-pooled-html layout-ncol=2}

![All species](gifs/penguins_lda_pooled1.gif){#fig-lda-pooled1 fig-alt="Animation showing a tour of the pooled variance-covariance ellipse, computed for each species, overlaid on the data. The shape of the ellipse for each group is reasonably similar to the spread of the points in all projections." width=300}


![Gentoo](gifs/penguins_lda_pooled2.gif){#fig-lda-pooled1 fig-alt="Animation showing a tour of the pooled variance-covariance ellipse overlaid on the data for the Gentoo penguins (red). The shape of the ellipse is reasonably similar to the spread of the points in all projections." width=300}

Checking how the pooled variance-covariance matches the spread of points in each group.
:::
:::

::: {.content-visible when-format="pdf"}

::: {#fig-penguins-lda-pooled-pdf layout-ncol=2}

![All species](images/penguins_lda_pooled1.png){#fig-lda-pooled1 fig-alt="A single frame from a tour of the pooled variance-covariance ellipse, computed for each species, overlaid on the data. The ellipse has a north-east to south-west orientation, and is almost circular, and reasonably matched the spread of the points in each group."}

![Gentoo](images/penguins_lda_pooled2.png){#fig-lda-pooled1 fig-alt="A single frame from a tour of the pooled variance-covariance ellipse overlaid on the data for the Gentoo penguins (red). The ellipse has a north-west to south-east orientation, and is relatively skinny. This roughly matches the spread of the points."}

Checking how the pooled variance-covariance matches the spread of points in each group.
:::

:::

::: {.content-visible when-format="html"}

::: insight
From the tour, we can see that the assumption of equal elliptical variance-covariance is a reasonable assumption for the penguins data. In all projections the ellipse is reasonably matching the spread of the observations.
:::
:::

::: {.content-visible when-format="pdf"}

\insightbox{From the tour, we can see that the assumption of equal elliptical variance-covariance is a reasonable assumption for the penguins data. In all projections the ellipse is reasonably matching the spread of the observations.
}
:::

## Examining results

The boundaries for a classification model can be examined by: 

1. generating a large number of test points in the domain of the data
2. predicting the class for each test point

We'll look at this for 2D using the LDA model fitted to `bl`, and `bd` of the `penguins` data.

```{r}
#| message: false
#| code-fold: false
p_bl_bd_lda <- lda(species~bl+bd, data=penguins_sub, 
                                  prior = c(1/3, 1/3, 1/3))
```

The fitted model means 
$\bar{x}_{Adelie} = ($ `r p_bl_bd_lda$means[1,1]`, `r p_bl_bd_lda$means[1,2]`$)^\top$, $\bar{x}_{Chinstrap} = ($ `r p_bl_bd_lda$means[2,1]`, `r p_bl_bd_lda$means[2,2]`$)^\top$, and $\bar{x}_{Gentoo} = ($ `r p_bl_bd_lda$means[3,1]`, `r p_bl_bd_lda$means[3,2]`$)^\top$ can be added to the plots. 

The boundaries can be examined using the `explore()` function from the `classifly` package, which generates observations in the range of all values of`bl` and `bd` and predicts their class. @fig-lda-2D-boundary shows the resulting prediction regions, with the observed data and the sample means overlaid.

```{r}
#| message: false
#| label: fig-lda-2D-boundary
#| fig-cap: "Prediction regions of the LDA model for two variables of the three species of penguins indicated by the small points. Large points are the observations, and the sample mean of each species is represented by the plus. The boundaries between groups can be seen to be roughly half-way between the means, taking the elliptical spread into account, and mostly distinguishes the three species."
#| fig-alt: "Square divided into three regions by the coloured points, primarily a wedge of yellow (Chinstrap) extending from the top right divides the other two groups. The regions mostly contain the observed values, with just a few over the boundary in the wrong region."
#| fig-width: 3
#| fig-height: 3
#| out-width: 70%
#| code-fold: false
# Compute points in domain of data and predict
library(classifly)

p_bl_bd_lda_boundaries <- explore(p_bl_bd_lda, penguins_sub)
p_bl_bd_lda_m1 <- ggplot(p_bl_bd_lda_boundaries) +
  geom_point(aes(x=bl, y=bd, 
                 colour=species, 
                 shape=.TYPE), alpha=0.8) + 
  scale_color_discrete_divergingx("Zissou 1") +
  scale_shape_manual(values=c(46, 16)) +
  theme_minimal() +
  theme(aspect.ratio = 1, legend.position = "none")

p_bl_bd_lda_means <- data.frame(p_bl_bd_lda$means,
                        species=rownames(p_bl_bd_lda$means))
p_bl_bd_lda_m1 +   
  geom_point(data=p_bl_bd_lda_means, 
             aes(x=bl, y=bd), 
             colour="black", 
             shape=3,
             size=3) 
```

\index{classification!boundaries}
\index{tour!slice}

This approach can be readily extended to higher dimensions. One first fits the model with all four variables, and uses the `explore()` to generate points in the 4D space with predictions, generating a representation of the prediction regions. `r ifelse(knitr::is_html_output(), '@fig-penguins-lda-boundaries-html', '@fig-penguins-lda-boundaries-pdf')`(a) shows the results using a slice tour [@slicetour]. Points inside the slice are shown in larger size. The slice is made in the centre of the data, to show the boundaries in this neighbourhood. As the tour progresses we see a thin slice through the centre of the data, parallel with the projection plane. In most projections there is some small overlap of points between groups, which happens because we are examining a 4D object with 2D. The slice helps to alleviate this, allowing a focus on the boundaries in the centre of the cube. In all projections the boundaries between groups is linear, as would be expected when using LDA. We can also see that the model roughly divides the cube into three relatively equally-sized regions.

`r ifelse(knitr::is_html_output(), '@fig-penguins-lda-boundaries-html', '@fig-penguins-lda-boundaries-pdf')`(b) shows the three prediction regions, represented by points in 4D, projected into the discriminant space. Linear boundaries neatly divide the full space, which is to be expected because the LDA model computes it's classification rules in this 2D space.  
\index{classification!discriminant space}

```{r echo=knitr::is_html_output()}
#| code-fold: false
p_lda <- lda(species ~ ., penguins_sub[,1:5], prior = c(1/3, 1/3, 1/3))
p_lda_boundaries <- explore(p_lda, penguins_sub)
```

```{r echo=knitr::is_html_output()}
#| eval: false
#| code-summary: "Code for generating slice tour"
# Code to run the tour
p_lda_boundaries$species
animate_slice(p_lda_boundaries[p_lda_boundaries$.TYPE == "simulated",1:4], col=p_lda_boundaries$species[p_lda_boundaries$.TYPE == "simulated"], v_rel=0.02, axes="bottomleft")
render_gif(p_lda_boundaries[p_lda_boundaries$.TYPE == "simulated",1:4],
           planned_tour(pt1),
           display_slice(v_rel=0.02, 
             col=p_lda_boundaries$species[p_lda_boundaries$.TYPE == "simulated"], 
             axes="bottomleft"),                     gif_file="gifs/penguins_lda_boundaries.gif",
           frames=500,
           loop=FALSE
           )
```

```{r echo=knitr::is_html_output()}
#| code-summary: "Code for projecting into LDA space"
# Project the boundaries into the 2D discriminant space
p_lda_b_sub <- p_lda_boundaries[
  p_lda_boundaries$.TYPE == "simulated", 
  c(1:4, 6)]
p_lda_b_sub_ds <- data.frame(as.matrix(p_lda_b_sub[,1:4]) %*%
  p_lda$scaling)
p_lda_b_sub_ds$species <- p_lda_b_sub$species
p_lda_b_sub_ds_p <- ggplot(p_lda_b_sub_ds, 
       aes(x=LD1, y=LD2, 
           colour=species)) +
  geom_point(alpha=0.5) +  
  geom_point(data=p_lda_pred_x1, aes(x=LD1, 
                               y=LD2, 
                               shape=species),
             inherit.aes = FALSE) +
  scale_color_discrete_divergingx("Zissou 1") +
  scale_shape_manual(values=c(1, 2, 3)) +
  theme_minimal() +
  theme(aspect.ratio = 1, 
        legend.position = "bottom",
        legend.title = element_blank()) 
```

::: {.content-visible when-format="html"}

::: {#fig-penguins-lda-boundaries-html layout-ncol=2}

![4D](gifs/penguins_lda_boundaries.gif){#fig-lda-4D-boundaries fig-alt="Animation plot, where three groups of coloured points can be seen. They roughly break the square into three regions with linear boundaries, which varies in each projection." width=300}

```{r}
#| echo: false
#| label: fig-lda-2D-boundaries
#| fig-width: 5
#| fig-height: 5
#| fig-cap: "Discriminant space"
#| fig-alt: "Scatterplot of points divided neatly into three colour groups by linear boundaries. The regions are split at the middle, and with the boundary between blue and red, blue and yellow, and yellow and red, starting near 11, 4, and o'clock, respectively."
p_lda_b_sub_ds_p
```

Examining the boundaries produced by the LDA model in the full 4D with a slice tour (left) and in the discriminant space (right). Large points indicate observations within the slice, and dots are observations outside the slice. Focusing on the within-slice points, there is some overlap of points between regions in most views which represents the occlusion of 4D shapes when examining projections with thin slices. The linear boundaries are seen exactly in the discriminant space, that is they are orthogonal to these two dimensions.
:::
:::

::: {.content-visible when-format="pdf"}

::: {#fig-penguins-lda-boundaries-pdf layout-ncol=2 fig-alt="FIX-ME"}

![4D](images/penguins_lda_boundaries.png){#fig-lda-4D-boundaries fig-alt="A scatterplot with large points showing roughly linear boundaries between three groups." width=200}

![Discriminant space](images/fig-lda-2D-boundaries-1.png){#fig-lda-4D-ds fig-alt="Scatterplot of points divided neatly into three colour groups by linear boundaries. The regions are split at the middle, and with the boundary between blue and red, blue and yellow, and yellow and red, starting near 11, 4, and o'clock, respectively." width=200}

```{r}
#| echo: false
#| eval: false
#| fig-width: 5
#| fig-height: 5
#| out-width: 50%
#| fig-cap: "Discriminant space"
# Note: had to use the image above between quarto compile
# broke when this chunk was evaluated.
p_lda_b_sub_ds_p
```

Examining the boundaries produced by the LDA model in the full 4D with a slice tour (left shows a single frame) and in the discriminant space (right). Large points indicate observations within the slice, and dots are observations outside the slice. Focusing on the within-slice points, there is some overlap of points between regions in most views which represents the occlusion of 4D shapes when examining projections with thin slices. The linear boundaries are seen exactly in the discriminant space, that is they are orthogonal to these two dimensions.
:::
:::


::: {.content-visible when-format="html"}
::: insight
From the tour, we can see that the LDA boundaries divide the classes only in the discriminant space. It is not using the space orthogonal to the 2D discriminant space. You can see this because the boundary is sharp in just one 2D projection, while most of the projections show some overlap of regions.
:::
:::

::: {.content-visible when-format="pdf"}

\insightbox{From the tour, we can see that the LDA boundaries divide the classes only in the discriminant space. It is not using the space orthogonal to the 2D discriminant space. You can see this because the boundary is sharp in just one 2D projection, while most of the projections show some overlap of regions.}
:::

## Exercises {-}

1. For the `simple_clusters` compute the LDA model, and make a plot of the data, with points coloured by the class. Overlay variance-covariance ellipses, and a $+$ indicating the sample mean for each class. Is it reasonable to assume that the two classes are sampled from populations with the same variance-covariance?
2. Examine the clusters corresponding to the classes in the `clusters` data set, using a tour. Based on the shape of the data is the assumption of equal variance-covariance reasonable?
3. Examine the pooled variance-covariance for the `clusters` data, overlaid on the data in a tour on the 5D. Does it fit the variance of each cluster nicely?
4. Fit an LDA model to the `simple_clusters` data. Examine the boundaries produced by the model, in 2D. 
5. Fit an LDA model to the `clusters` data. Examine the boundaries produced by the model in 5D.
6. Assess the LDA assumptions for the `multicluster` data. Is LDA an appropriate model?
7. Compute the first 12 PCs of the `sketches` data. Check the assumption of equal, elliptical variance-covariance of the 6 groups. Regardless of whether you decide that the assumption is satisfied or not, fit an LDA to the 12 PCs. Extract the discriminant space (the `x` component of the `predict` object), and examine the separation (or not) of the 6 groups in this 5D space. Is LDA providing a good classification model for this data?
8. Even though the `bushfires` data does not satisfy the assumptions for LDA, fit LDA to the first five PCs. Examine the class differences in the 3D discriminant space. 
9. Compute the boundary between classes, for the LDA model where the prior probability reflects the sample size, and the LDA model where the priors are equal for all groups. How does the boundary between lightning caused fires and the other groups change?

```{r}
#| eval: false
#| echo: false
library(mulgar)
library(dplyr)
library(tourr)
library(MASS)
data("sketches_train")
sketches_pca <- prcomp(sketches_train[,1:784])
ggscree(sketches_pca, q=25, guide=FALSE)
sketches_pc <- as.data.frame(sketches_pca$x[,1:12])
sketches_pc$word <- sketches_train$word

animate_xy(sketches_pc[,1:12], col=sketches_pc$word)

sketches_lda <- lda(word~., data=sketches_pc)
sketches_pred <- predict(sketches_lda, sketches_pc)
sketches_ds <- data.frame(sketches_pred$x) %>%
  mutate(word = sketches_pc$word)

animate_xy(sketches_ds[,1:5], col=sketches_ds$word)

# bushfires
data(bushfires)
bushfires_std <- bushfires %>%
  dplyr::select(`rf`:`log_dist_road`) %>%
  mutate_if(is.numeric, function(x) (x-
      mean(x, na.rm=TRUE))/
      sd(x, na.rm=TRUE))
bushfires_pca <- prcomp(bushfires_std)
bushfires_pc <- data.frame(bushfires_pca$x[,1:5]) %>%
  mutate(cause = bushfires$cause)

bushfires_lda <- lda(cause~., data=bushfires_pc)
bushfires_pred <- predict(bushfires_lda, bushfires_pc)
bushfires_ds <- data.frame(bushfires_pred$x) %>%
  mutate(cause = factor(bushfires$cause))
animate_xy(bushfires_ds[,1:3], col=bushfires_ds$cause)

```