-
Notifications
You must be signed in to change notification settings - Fork 2
/
workshop_template.qmd
598 lines (454 loc) · 21.7 KB
/
workshop_template.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
---
title: "Workshop template"
author:
name: "Quinn Asena, Jack Williams, and Tony Ives"
date: today
bibliography: refs.bib
from: markdown+emoji
format:
html:
code-fold: false
toc: true
embed-resources: true
execute:
cache: false
theme:
light: flatly
dark: darkly
---
### Install packages {.unlisted .hidden}
```{r}
source("./install_workshop.R")
```
# Introduction
One of the primary goals of this model is to be able to test multiple hypotheses about the data and lend statistical support to the different hypotheses. For example which environmental drivers are having the strongst effects on which taxa? Or, are interactions among taxa or abiotic variables driving change in the system? This state-space approach allows us to estimate coefficients for taxa interactions and driver-taxa relationships, that we don't get from methods such as ordination or cluster analysis. We recommend this method as complimentary to other methods, as both have their advantages.
## This template
This document is the template we will work through today for learning how to parameterize and fit `multinomialTS`. A more detailed walkthrough is also in this repository under the file 'state-space-walkthrough.qmd'
### Sequence for today
This document will take us through:
- Choosing a resolution for `mnTS()`
- Finding initial conditions with `mnGLMM()`
- Fitting `mnTS()` with and without species interactions
- assess the resilting models
# Fitting `mnTS()`
For today, we will be working with the data that have focal taxa and groups already wrangled. This part is covered in the slides and the code is all in the extended walkthrough.
```{r}
# Read-in the wrangled data
# Read in our X variables
story_char_wide <- readRDS("./data/story_char_wide.rds")
# Read in the long data for plotting
state_variables_wide <- readRDS("./data/state_variables_wide.rds")
```
Here is what the state variable ($Y$) data look like:
```{r}
head(state_variables_wide)
```
And our covariate ($X$):
```{r}
head(story_char_wide)
```
For the moment, we are keeiping the age and depth columns in both tibbles. These columns are so that we have a common variable to match the two to tibbles, and will be removed a little lates.
## State variables $Y$
We will start off by chosing a resolution for the model predictions (_e.g.,_ predicting ecological process at 50 or 100 year intervals).
```{r}
# Time-span of the data divided by the numer of observations
max(state_variables_wide$age) / nrow(state_variables_wide)
# Age differences between successive observations
diff(state_variables_wide$age)
# The range of the age differences
range(diff(state_variables_wide$age))
# the average of the age differences
mean(diff(state_variables_wide$age))
sd(diff(state_variables_wide$age))
```
This all looks great! If we bin the data at around 50-75 years, we will still have most of our data in independent bins.
Now that we know roughly what bin-width to use to maximise the number of observations in the state variables, we will apply the same bin-width to both the state variables and the covariate data.
::: {.callout-tip}
## Ask me questions!
What questions do you have so far?
:::
## Handling the covariates $X$
Covariates need to be handled according to decisions that were made around the state variables, that is, if the state variables were binned at a century-level resolution then the covariates must match that resolution. We _can_ have a finer temporal resolution of the covariates than we do of the state variables. In this dataset, our state variables have `r nrow(state_variables_wide)` observations, and our covariate has `r nrow(story_char_wide[story_char_wide$age <= max(state_variables_wide$age), ])` observations over the same time-span. Having more observations of environmental covariates is common in palaeo-data and works well with fitting the model.
Since we need to apply the same binning procedure to both the state variables and the covariate, I like to join both datasets. Charcoal was sampled to a greater depth than the pollen data, so we are going to clip the covariate to the extent of the state variables and join the two datasets by their common variable, age.
```{r}
#| warning: false
# Clip covariate data to the extent of the state variables
# Join the two by age so that we get a square matrix
story_join <- story_char_wide %>%
filter(age <= max(state_variables_wide$age)) %>%
left_join(state_variables_wide)
# Always double check dimensions before/after joining!
# dim(story_join)
# tail(story_join)
```
Now we have all our data joined. Note that the covariate $X$ (`char_acc`) has observations every cm, whereas pollen was sampled at less frequent intervals (this is OK! :smiley:).
```{r}
head(story_join, n = 10)
```
I always do lots of quick checks like `head()`, `tail()`, `dim()` (and many more!) to make sure to catch any coding mistakes!
```{r}
tail(story_join, n = 10)
```
This all looks good, and we can now bin the data to an appropriate temporal resolution for fitting the model. Chosing a bin-width may take some trial and error, I'm going with a bin-width of 50 years.
```{r}
# This code chunks the data into bins and gives us a grouping variable "bins"
bin_width <- 50
bins <- cut(story_join$age,
breaks = seq(from = min(story_join$age),
to = max(story_join$age + bin_width),
by = bin_width), include.lowest = TRUE, labels = FALSE)
```
We now apply the bins to our data. For most types of covariates we want the _average_ value of the covariate in each bin. But for the pollen count data, we want the _sum_. This is because of the underlying multinomial distribution in the model.
```{r}
# The following code
story_binned <- bind_cols(bins = bins, story_join) %>%
group_by(bins) %>% # Group the data by the bins so that we calculate per time bin
summarise(
age = mean(age, na.rm = T), # the center of the bin
char_acc = mean(char_acc, na.rm = T), # mean charcoal accumulation rate per bin
other = sum(other, na.rm = T), # the sums of the count data
hardwood = sum(hardwood, na.rm = T),
`Fagus grandifolia` = sum(`Fagus grandifolia`, na.rm = T),
Ulmus = sum(Ulmus, na.rm = T),
Quercus = sum(Quercus, na.rm = T)
) %>%
arrange(desc(age))
# Be aware that the gaps in the pollen data are now filled with 0's not NA
```
Let's see what the data look like binned at a 50 year resolution:
```{r}
head(story_binned, n = 10)
```
This is looking good, we have reduced the time-intervals over which to run the model. The data are now `r nrow(story_binned)` rows with complete covariate data, containing `r nrow(story_binned[which(rowSums(story_binned[ ,4:7]) != 0), ])` state variable observations. The original number of pollen observations was `r nrow(state_variables_wide)`, so we have not summed many observations within the same bins.
`multinomialTS` requires two matrices, a site-by-species matrix $Y$, and a covariate matrix $X$, so we will leave `tibbles` behind and split the data. $X$ variables may be of different types (e.g., continuous, categorical...) but should be scaled relative to each other.
```{r}
story_pollen_matrix <- story_binned %>% # Select taxa
select(other, hardwood, Ulmus, `Fagus grandifolia`, Quercus) %>%
rename("Fagus" = "Fagus grandifolia") %>%
as.matrix()
# Replacing 0 with NA is not strictly necessary the way we use the data today
# But it is a good safety to avoid 0-count data where it zhould be no observation
story_pollen_matrix[which(rowSums(story_pollen_matrix) == 0), ] <- NA
head(story_pollen_matrix)
story_char_matrix_scaled <- story_binned %>% # select covariates
select(char_acc) %>%
as.matrix() %>%
scale() # Scale covariates
head(story_char_matrix_scaled)
```
::: {.callout-tip}
## Ask me questions!
What questions do you have so far?
:::
## Fitting `multinomialTS`
Ok, now we have:
- Chosen a temporal resolution for the model of `r bin_width` year bins
- Organised the data into a site-by-species matrix $Y$ at `r bin_width` year bins
- Binned and scaled covariate matrix $X$
With that, we can fit the model.
### Finding initial conditions using `mnGLMM()`
The `mnTS()` function will provide estimates for biotic interactions ($C$ matrix), taxa-driver relationships ($B$ matrix), and cross-correlated error ($V$ matrix). But the model needs initial starting values for these parameters to begin with. We get initial starting conditions from the data by running the `mnGLMM()` funcion. `mnGLMM()` returns estimates for the $B$ and $V$ matrices (not the $C$ matrix) and assumes that there are no time gaps in the data.
::: {.callout-tip}
The arguments of the _starting_ values in both the `mnGLMM()` and `mnTS()` are all suffixed with `.start` (e.g., `B.start` will be the starting values for the $B$ matrix).
The arguments of the parameters to be _estimated_ are all suffixed with `.fixed` (e.g., `B.fixed` will be parameters that are estimated from $B$ matrix).
:::
#### Setting-up parameters for `mnGMLL()`
Now, lets set up the parameters for `mnGLMM()`. We need:
- A vector that indicates the row indices where there are state variable observations: `sample_idx`
- An integer number of covariates (+ 1 for `mnGLMM()`): `p`
- An integer of the number of state variables: `n`
- A matrix of starting values for $B$: `B.start.glmm`
- A matrix of $B$ parameters to estimate: `B.fixed.glmm`
- A matrix of $V$ parameters to estimate: `V.fixed.glmm`
```{r}
# set-up sample index
sample_idx <- which(rowSums(story_pollen_matrix) != 0)
# make sure it works
head(story_pollen_matrix[sample_idx, ])
```
Set-up the remaining parameters:
```{r}
# Set-up parameters
p <- ncol(story_char_matrix_scaled) + 1 # Number of independent variables plus intercept
n <- ncol(story_pollen_matrix) # number of taxa
V.fixed.glmm <- diag(n)
diag(V.fixed.glmm) <- NA
V.fixed.glmm[1] <- 1
B.fixed.glmm <- matrix(c(rep(0,p),rep(NA, (n - 1) * p)), p, n) # reference taxa [,1] are set to 0
B.start.glmm <- matrix(c(rep(0,p),rep(.01, (n - 1) * p)), p, n) # reference taxa [,1] are set to 0
```
These parameters are used as arguments to the `mnGLMM()` function. Check them out by printing them in your console. Each matrix needs to be the correct dimensions given the number of taxa and number of covariates. The position elements of each matrix reflect the species and/or the covariates, as we will see later in the output of the model.
::: {.callout-tip}
## Ask me questions!
What questions do you have so far?
:::
#### Fitting `mnGLMM()`
Remember that `mnGLMM()` does not handle gaps in the data and only fits complete $X$ and $Y$ matrices. We have created a variable for out observation indices (`sample_idx`), so for `mnGLMM()` we will index the matrices by this variable: e.g., `story_pollen_matrix[sample_idx, ]`.
```{r}
#| warning: false
#| eval: false
# fit glmm
start_time <- Sys.time()
glmm_mod <- mnGLMM(Y = story_pollen_matrix[sample_idx, ],
X = story_char_matrix_scaled[sample_idx, ,drop = F],
B.start = B.start.glmm,
B.fixed = B.fixed.glmm,
V.fixed = V.fixed.glmm)
end_time <- Sys.time()
end_time - start_time
saveRDS(glmm_mod, "./outputs/glmm_mod.rds")
```
The outputs of `mnGLMM()` can be examined with the `summary(glmm_mod)` and `coef(glmm_mod)` functions.
```{r}
glmm_mod <- readRDS("./outputs/glmm_mod.rds")
summary(glmm_mod)
```
#### Setting-up parameters for `mnTS()`
Now, lets set up the parameters for `mnTS()`. `mTS()` needs:
- row indices as before: `sample_idx`
- the number of covariates: `p`
- the number of state variables: `n`
- starting values for each matrix:
- $BO$: `B0.start.mnTS` (intercept)
- $B$: `B.start.mnTS` (driver-taxa)
- $C$: `C.start.mnTS` (taxa interactions)
- $V$: `V.start.mnTS` (cross-correlated error)
- parameters to estimate for each matrix:
- $BO$: `B0.fixed.mnTS`
- $B$: `B.fixed.mnTS`
- $C$: `C.fixed.mnTS`
- $V$: `V.fixed.mnTS`
We are using the output from `mnGLMM()` as starting values for the matrices $B0$, $B$, and $V$. `mnGLMM()` does not provide estimates for $C$, so we handle $C$ a little differently and we input values close to zero for each parameter estimated and let `mnTS()` do the rest.
```{r}
# B.start etc
B0.start.mnTS <- glmm_mod$B[1, , drop = F]
B.start.mnTS <- glmm_mod$B[2, , drop = F]
sigma.start.mnTS <- glmm_mod$sigma
V.fixed.mnTS = matrix(NA, n, n) # Covariance matrix of environmental variation in process eq
V.fixed.mnTS[1] = 1
V.start.mnTS = V.fixed.mnTS
V.start.mnTS <- glmm_mod$V
B.fixed.mnTS <- matrix(NA, p-1, n)
B.fixed.mnTS[,1] <- 0
B0.fixed.mnTS = matrix(c(0, rep(NA, n - 1)), nrow = 1, ncol = n)
# Set-up C without interactions
C.start.mnTS = .5 * diag(n)
C.fixed.mnTS <- C.start.mnTS
C.fixed.mnTS[C.fixed.mnTS != 0] <- NA
# Set-up C with interactions between Fagus and Quercus
C.start.int.mnTS = .5 * diag(n)
C.start.int.mnTS[5, 4] <- .001
C.start.int.mnTS[4, 5] <- .001
C.fixed.int.mnTS <- C.start.int.mnTS
C.fixed.int.mnTS[C.fixed.int.mnTS != 0] <- NA
```
#### Fitting `mnTS()`
Remember, `mnTS()` _does_ handle gaps in the state-variables where there are data in the covariate matrix. In the following code, we use the complete (no gaps) $Y$ matrix `story_pollen_matrix[sample_idx, ]` with dimensions: `r dim(story_pollen_matrix[sample_idx, ])`. And the full $X$ matrix: `story_char_matrix_scaled` with dimensions: `r dim(story_char_matrix_scaled)`
We will fit the model twice:
- without taxa interactions
- and without taxa interactions.
::: {.panel-tabset}
## Without interactions
```{r}
#| eval: false
start_time <- Sys.time()
mnTS_mod <- mnTS(Y = story_pollen_matrix[sample_idx, ],
X = story_char_matrix_scaled, Tsample = sample_idx,
B0.start = B0.start.mnTS, B0.fixed = B0.fixed.mnTS,
B.start = B.start.mnTS, B.fixed = B.fixed.mnTS,
C.start = C.start.mnTS, C.fixed = C.fixed.mnTS,
V.start = V.start.mnTS, V.fixed = V.fixed.mnTS,
dispersion.fixed = 1, maxit.optim = 1e+6)
# maxit.optim is the max number of iterations the optimiser will complete before stopping.
# increase maxit.optim if the model needs a lot of time to fit.
end_time <- Sys.time()
end_time - start_time
saveRDS(mnTS_mod, "./outputs/mnTS_mod.rds")
```
## With interactions
```{r}
#| eval: false
start_time <- Sys.time()
mnTS_mod_int <- mnTS(Y = story_pollen_matrix[sample_idx, ],
X = story_char_matrix_scaled, Tsample = sample_idx,
B0.start = mnTS_mod$B0, B0.fixed = B0.fixed.mnTS,
B.start = mnTS_mod$B, B.fixed = B.fixed.mnTS,
C.start = mnTS_mod$C, C.fixed = C.fixed.int.mnTS,
V.start = mnTS_mod$V, V.fixed = V.fixed.mnTS,
dispersion.fixed = 1, maxit.optim = 1e+6)
end_time <- Sys.time()
end_time - start_time
saveRDS(mnTS_mod_int, "./outputs/mnTS_mod_int.rds")
```
```{r}
mnTS_mod <- readRDS("./outputs/mnTS_mod.rds")
mnTS_mod_int <- readRDS("./outputs/mnTS_mod_int.rds")
```
:::
::: {.callout-tip}
## Ask me questions!
What questions do you have so far?
:::
### Interpreting outputs
The `coef()` and `summary()` functions will show the model outputs. Let's check out the coefficients of interaction model:
```{r}
coef(mnTS_mod_int)
```
We have found that bootstrapping provides better estimates of standard erros (and subsequent P-values). The `boot.mnTS()` function will bootstrap the model, but may take a very long time. We won't do this today, but we strongly recommend bootstrapping your final models.
### Comparing models
The `summary` of the model provides both the log likelihood and the AIC (akaike information criterion). These can be used for comparing models. Today we will use the AIC. AIC is penalised for the number of parameters estimated, and so can be better for comparing models where one has more parameters estimated (i.e., the interaction model is estimating more parameters than the model without interactions).
The AIC (and log likelihood) values can be accessed directly with `mnTS_mod_int$AIC`.
```{r}
data.frame(without_interactions = mnTS_mod$AIC,
with_interactions = mnTS_mod_int$AIC)
```
::: {.callout-tip}
## Feedback
Before the final section, [please fill out a quick one up (positive comment) one down (something we can do better next time) feedback](https://forms.gle/e3RGnZQqJUYfjYWL6). This helps us a lot (makes reporting and grant applications look good)! It will also help me refine future workshops.
:::
# Your turn!
## Question 1
Ok, let's try estimate a different interaction in the data. The answer is in the folded block. Give it a crack yourself before peeking at the answer!
::: {.callout-tip}
## Question
Create new input matrices for `C.start`, and `C.fixed` that estimates an interaction between only and _Quercus_.
Remember that filled elements of the `.start` matrix need to match the locations of estimated (`NA`) elements of the `.fixed` matrix.
:::
::: {.callout-note collapse="true"}
## Answer
```{r}
C.start.mnTS53 = 0.5 * diag(n) # creates an n x n matrix with diagonal elements
C.start.mnTS53[5, 3] = 0.5
C.fixed.mnTS53 <- C.start.mnTS53
C.fixed.mnTS53[C.fixed.mnTS53 != 0] <- NA
```
:::
## Question 2
Ok, we have a new `C.fixed` argument. Let's use it:
::: {.callout-tip}
## Question
Fit `mnTS()` using the new inputs to `C.start` and `C.fixed`.
:::
::: {.callout-note collapse="true"}
## Answer, no peeking... :grin:
```{r}
start_time <- Sys.time()
mnTS_mod53 <- mnTS(Y = story_pollen_matrix[sample_idx, ],
X = story_char_matrix_scaled, Tsample = sample_idx,
B0.start = B0.start.mnTS, B0.fixed = B0.fixed.mnTS,
B.start = B.start.mnTS, B.fixed = B.fixed.mnTS,
C.start = C.start.mnTS53, C.fixed = C.fixed.mnTS53,
V.start = V.start.mnTS, V.fixed = V.fixed.mnTS,
dispersion.fixed = 1, maxit.optim = 1e+6)
end_time <- Sys.time()
end_time - start_time
```
:::
## Question 3
We now have three models, two from the tutorial, and the one you just ran. Let's have a look at all three model AIC values.
::: {.callout-tip}
## Question
Create a `dataframe`, `vector`, or `tibble` (whatever you prefer) that displays each model's AIC.
:::
::: {.callout-note collapse="true"}
## Answer
```{r}
data.frame(without_interactions = mnTS_mod$AIC,
with_interactions = mnTS_mod_int$AIC,
with_interaction53 = mnTS_mod53$AIC)
```
:::
Which model is giving the lowest AIC? Does this agree with what you might expect from the data, and what hypothesis component are we testing?
## Question 4 (if time permits!)
We can take a look at running the model with an additional covariate. For ease, I'm going to bind an increasing trend to the existing data.
```{r}
# Run me!
new_X <- cbind(story_char_matrix_scaled,
trend = as.vector(scale(1:nrow(story_char_matrix_scaled))))
head(new_X)
```
With another covariate, both the `mnGLMM()` and `mnTS()` need to be parameterised and run:
::: {.callout-tip}
## Question 4a
Create a new parameters for `mnGLMM()`. You can follow the code earlier in the template.
:::
::: {.callout-note collapse="true"}
## Answer 4a
```{r}
p <- ncol(new_X) + 1 # Number of independent variables plus intercept
n <- ncol(story_pollen_matrix) # number of taxa
V.fixed.glmm.new_X <- diag(n)
diag(V.fixed.glmm.new_X) <- NA
V.fixed.glmm.new_X[1] <- 1
B.fixed.glmm.new_X <- matrix(c(rep(0,p),rep(NA, (n - 1) * p)), p, n) # reference taxa [,1] are set to 0
B.start.glmm.new_X <- matrix(c(rep(0,p),rep(.01, (n - 1) * p)), p, n) # reference taxa [,1] are set to 0
```
:::
::: {.callout-tip}
## Question 4b
fit `mnGLMM()`.
:::
::: {.callout-note collapse="true"}
## Answer 4b
```{r}
#| warning: false
#| eval: true
# fit glmm
start_time <- Sys.time()
glmm_mod_new_X <- mnGLMM(Y = story_pollen_matrix[sample_idx, ],
X = new_X[sample_idx, ,drop = F],
B.start = B.start.glmm.new_X,
B.fixed = B.fixed.glmm.new_X,
V.fixed = V.fixed.glmm.new_X)
end_time <- Sys.time()
end_time - start_time
```
:::
::: {.callout-tip}
## Question 4c
Set up parameters for `mnTS()` we will not include interactions in this one.
:::
::: {.callout-note collapse="true"}
## Answer 4c
Note the change in `glmm_mod_new_X$B[2:3, , drop = F]`. With the additional X variable, the $B$ matrix now has two rows.
```{r}
#| warning: false
# fit glmm
B0.start.mnTS.new_X <- glmm_mod_new_X$B[1, , drop = F]
B.start.mnTS.new_X <- glmm_mod_new_X$B[2:3, , drop = F] # note the change in row indeces here to 2:3
sigma.start.mnTS.new_X <- glmm_mod_new_X$sigma
V.fixed.mnTS.new_X = matrix(NA, n, n) # Covariance matrix of environmental variation in process eq
V.fixed.mnTS.new_X[1] = 1
V.start.mnTS.new_X = V.fixed.mnTS.new_X
V.start.mnTS.new_X <- glmm_mod_new_X$V
B.fixed.mnTS.new_X <- matrix(NA, p-1, n)
B.fixed.mnTS.new_X[,1] <- 0
B0.fixed.mnTS.new_X = matrix(c(0, rep(NA, n - 1)), nrow = 1, ncol = n)
# Set-up C without interactions
C.start.mnTS.new_X = .5 * diag(n)
C.fixed.mnTS.new_X <- C.start.mnTS.new_X
C.fixed.mnTS.new_X[C.fixed.mnTS.new_X != 0] <- NA
```
:::
::: {.callout-tip}
## Question 4d
Fit `mnTS()`.
:::
::: {.callout-note collapse="true"}
## Answer 4c
```{r}
#| eval: true
start_time <- Sys.time()
mnTS_mod_new_X <- mnTS(Y = story_pollen_matrix[sample_idx, ],
X = new_X, Tsample = sample_idx,
B0.start = B0.start.mnTS.new_X, B0.fixed = B0.fixed.mnTS.new_X,
B.start = B.start.mnTS.new_X, B.fixed = B.fixed.mnTS.new_X,
C.start = C.start.mnTS.new_X, C.fixed = C.fixed.mnTS.new_X,
V.start = V.start.mnTS.new_X, V.fixed = V.fixed.mnTS.new_X,
dispersion.fixed = 1, maxit.optim = 1e+6)
# maxit.optim is the max number of iterations the optimiser will complete before stopping.
# increase maxit.optim if the model needs a lot of time to fit.
end_time <- Sys.time()
end_time - start_time
```
:::