-
Notifications
You must be signed in to change notification settings - Fork 0
/
db.qmd
293 lines (257 loc) · 11.2 KB
/
db.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# Dataset {#sec-dataset}
```{r,include=FALSE,echo=FALSE}
Echo=FALSE
Eval=TRUE
Cache=TRUE
Warng=FALSE
Msg=FALSE
Incl=FALSE
library(tidyverse)
library(cmdstanr)
library(posterior)
library(ggdist)
library(ggpubr)
library(ggrepel)
library(ggmap)
library(DT)
library(gt)
```
```{r,echo=Echo,eval=Eval,cache=Cache,warning=Warng,message=Msg}
data_path <- readLines('_data.path')
treeData <- readRDS(file.path(data_path, 'treeData.RDS'))
spIds <- read_csv(file.path(data_path, 'species_id.csv')) |>
filter(sp_to_analyze)
```
This chapter outlines the two data sources and the steps to organize and create a tidy dataset for our analysis.
We focus on two open inventory data from eastern North America: the Forest Inventory and Analysis (FIA) dataset in the United States [@OConnell2007] and the Forest Inventory of Québec [@Naturelles2016].
From the FIA dataset, we utilized information collected from 37 states out of the total 50: Alabama, Arkansas, Connecticut, Delaware, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Nebraska, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Vermont, Virginia, West Virginia, and Wisconsin.
At the plot level, we retained plots sampled at least twice, excluding those that had undergone harvesting to focus solely on natural dynamics.
Specifically, we selected surveys conducted for the FIA dataset using the modern standardized methodology implemented since 1999.
After applying these filters, our final dataset encompassed nearly 26,000 plots spanning a latitude range from 26° to 53° (@fig-plotCoverage).
Each plot within the dataset was measured between 1970 and 2021, with observation frequencies ranging from 2 to 7 times and an average of 3 measurements per plot.
The time intervals between measurements varied from 1 to 40 years, with a median interval of 7 years (@fig-plotCoverage).
```{r,echo=Echo,eval=Eval,cache=Cache,warning=Warng,message=Msg}
#| label: fig-plotCoverage
#| fig-width: 9
#| fig-height: 7
#| fig-cap: "Spatial (top left) and temporal (top right) coverage of the dataset incorporating data from the USA and Quebec. The top right panel shows the distribution of observations per class of latitude for the 31 species used in this study."
treeData |>
filter(species_id %in% spIds$species_id_old) |>
group_by(plot_id) |>
slice_head(n = 1) |>
mutate(
db_origin = case_match(
db_origin,
'qc' ~ 'Quebec',
'fia' ~ 'FIA'
)
) ->
plot_dt
# get Eas North America map
# brds <- c(left = -109, bottom = 25, right = -53, top = 53)
# my_map <- get_stamenmap(brds, zoom = 3, maptype = 'terrain-background')
# saveRDS(my_map, file.path('data', 'eastNorthAmericaMap.RDS'))
readRDS(file.path('data', 'eastNorthAmericaMap.RDS')) |>
ggmap() +
geom_point(
data = plot_dt,
aes(
x = longitude, y = latitude,
color = db_origin
),
size = 0.05,
alpha = 0.6
) +
scale_color_manual(
values = c(rgb(158, 35, 137, maxColorValue = 255), rgb(86, 43, 143, maxColorValue = 255))
) +
labs(x = '', y = '', color = '') +
theme_minimal() +
theme(
legend.position = 'none',
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()
) +
guides(color = guide_legend(override.aes = list(size=2, alpha = 1))) ->
p1
treeData |>
filter(species_id %in% spIds$species_id_old) |>
filter(!is.na(latitude)) |>
mutate(
latc = cut(
latitude,
seq(min(latitude), max(latitude), length.out = 15),
include.lowest = TRUE
),
db_origin = case_match(
db_origin,
'qc' ~ 'Quebec',
'fia' ~ 'FIA'
)
) |>
ggplot() +
aes(year_measured, latc) +
aes(fill = db_origin) +
ggridges::geom_density_ridges2(color = 'transparent', alpha = 0.8) +
theme_classic() +
labs(
x = 'Year of measurement',
y = 'Latitude classes',
fill = ''
) +
scale_fill_manual(
values = c(rgb(158, 35, 137, maxColorValue = 255), rgb(86, 43, 143, maxColorValue = 255))
) +
theme(legend.position = 'none') ->
p2
treeData |>
group_by(plot_id) |>
reframe(nbYear = length(unique(year_measured))) |>
filter(nbYear > 1) |>
pull(plot_id) ->
plots_to_keep
treeData |>
filter(plot_id %in% plots_to_keep) |>
group_by(plot_id) |>
reframe(
db_origin = unique(db_origin),
nbYear = length(unique(year_measured))
) |>
mutate(
db_origin = case_match(
db_origin,
'qc' ~ 'Quebec',
'fia' ~ 'FIA'
)
) |>
ggplot() +
aes(nbYear) +
aes(fill = db_origin) +
geom_histogram(alpha = 0.8) +
theme_classic() +
labs(
x = 'Number of measurements per plot',
y = '',
fill = ''
) +
scale_fill_manual(
values = c(rgb(158, 35, 137, maxColorValue = 255), rgb(86, 43, 143, maxColorValue = 255))
) ->
p3
readRDS(file.path(data_path, 'fec_dt.RDS')) |>
group_by(plot_id) |>
reframe(
db_origin = unique(db_origin),
uq = unique(deltaYear_plot)
) |>
ggplot() +
aes(uq) +
aes(fill = db_origin) +
geom_histogram(alpha = 0.8) +
theme_classic() +
labs(
x = 'Time interval between measurements (years)',
y = ''
) +
scale_fill_manual(
values = c(rgb(158, 35, 137, maxColorValue = 255), rgb(86, 43, 143, maxColorValue = 255))
) +
theme(legend.position = 'none') ->
p4
ggarrange(
p1, p2, p3, p4,
nrow = 2, ncol = 2
)
```
These datasets provide individual-level information on the diameter at breast height (dbh) and the status (dead or alive) of more than 200 species.
From this pool, we selected the 31 most abundant species (@tbl-tabSpecies).
This selection comprises 9 conifer species and 21 hardwood species.
We ensured an even distribution of species across the continuous shade tolerance trait, with 3 species classified as very intolerant, 9 as intolerant, 8 as intermediate, 8 as tolerant, and 5 as very tolerant [@burns1990silvics].
While our analysis focuses on these 31 selected species, we retained the data for the remaining species for potential future exploration and analysis.
We initially included the *Ostrya virginiana* species in our analysis; however, we encountered substantial challenges with several growth, survival, and recruitment parameters, with substantial confidence intervals.
As a result, we decided to remove this species from our analysis, even though the fit files are still accessible for potential future investigations.
```{r,echo=Echo,eval=Eval,cache=Cache,warning=Warng,message=Msg}
#| label: tbl-tabSpecies
#| tbl-cap: "List of species and their frequency across the dataset."
treeData |>
filter(species_id %in% spIds$species_id_old) |>
left_join(
spIds,
by = c('species_id' = 'species_id_old')
) |>
group_by(species_name) |>
reframe(
`Species ID` = unique(species_id),
`Number of\n plots` = length(unique(plot_id)),
`Number of\n individual` = length(unique(tree_id)),
`Number of\n observation` = n()
) |>
rename(
`Species` = species_name
) |>
mutate(
Species = paste0('*', Species, '*')
) |>
arrange(desc(`Number of\n plots`)) |>
gt() |>
fmt_markdown(columns = Species) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
)
```
## Covariates
### Competition index
To assess competition effects, we calculated the basal area (BA) of living individuals for each species-plot-year combination based on the dbh of each $i$.
We first calculated individual BA in $m^2$ as follows:
$$
BA^{ind}_i = \pi * (\frac{dbh_i}{2 * 1000})^2
$$
where the dbh is measured in millimeters.
For a given plot $j$, the total plot basal area in square meters per hectare ($m^2 ha^{-1}$) is defined as:
$$
BA^{plot}_j = \sum_i^{n_j}{BA^{ind}_i} \times \frac{10000}{plot~area_j}
$$
To compute the basal area of larger individuals (BAL), we summed the total basal area of individuals whose dbh exceeded that of the focal individual $i$ ($n_j > BA_i$):
$$
BAL^{plot}_{ij} = \sum_i^{n_j > BA_i}{BA^{ind}_i} \times \frac{10000}{plot~area_j}
$$
Note that BAL is now specific to the plot ($j$) and the individual ($i$).
### Climate
We obtained the 19 bioclimatic variables with a 10 $km^2$ (300 arcsec) resolution grid, covering the period from 1970 to 2018.
These climate variables were modeled using the ANUSPLIN interpolation method [@mckenney2011].
We used the plot's longitude and latitude coordinates to extract the mean annual temperature (MAT) and mean annual precipitation (MAP).
To incorporate the climate covariates in the dataset, we used each plot's latitude and longitude coordinates to extract the mean annual temperature (MAT) and mean annual precipitation (MAP).
In cases where plots did not fall within a valid pixel of the climate variable grid, we interpolated the climate condition using the eight neighboring cells.
Due to the transitional nature of the dataset, we considered both the average and standard deviation of MAT and MAP over the years within each time interval.
This approach allows us to account for climate's averaged and temporal variability effects on tree demography.
Finally, we normalized MAT and MAP within the 0 and 1 range, facilitating comparisons of optimal climate conditions and climate breadth among species.
## R objects
All the scripts for dataset preparation are available at [https://github.com/willvieira/TreesDemography](https://github.com/willvieira/TreesDemography) within the `R` folder.
These scripts detail the complete process of downloading, cleaning, and merging the data, resulting in the final datasets used in our study.
Metadata associated with each script is described in Table @tbl-tabMetadata.
- `treeData.RDS`: This file contains the complete dataset in a tidy format, with one observation per row.
From the `treeData.RDS` dataset, we derived four additional datasets, each representing the transition between time $t$ and time $t + \Delta t`:
- `growth_dt.RDS`: Contains growth rate information between time $t$ and $t + \Delta t$.
- `status_dt.RDS`: Contains individuals' status (alive or dead) at both time $t$ and time $t + \Delta t$.
- `fec_dt.RDS`: Contains ingrowth rate (number of individuals entering the population at 127 mm) between time $t$ and $t + \Delta t$.
- `sizeIngrowth_dt.RDS`: Contains the size distribution of all individuals that entered the population for each remeasurement.
You can access all the data objects mentioned in the ownCloud folder via this link: [doc.ielab.usherbrooke.ca/s/83YSAiLAGeLm682](https://doc.ielab.usherbrooke.ca/s/83YSAiLAGeLm682).
In addition to the data objects, this folder contains three extra files:
- `climate_scaleRange.RDS` contains the maximum and minimum values required to scale the climate covariates.
- `db_metadata.csv` contains the csv description used in the table below.
- `species_id.csv` contains species descriptions and additional traits used in the analysis.
```{r,echo=Echo,eval=Eval,cache=Cache,warning=Warng,message=Msg}
#| label: tbl-tabMetadata
#| tbl-cap: "Description of metadata for each variable linked to its corresponding dataset object (R object). Note that replicated variables shared among the R objects have been excluded. Consequently, there is no detailed description for the `sizeIngrowth_dt` dataset."
read_csv(file.path(data_path, 'db_metadata.csv')) |>
gt() |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
)
```