-
Notifications
You must be signed in to change notification settings - Fork 0
/
Southern Poverty Project.Rmd
524 lines (406 loc) · 25.1 KB
/
Southern Poverty Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
---
title: Analysis of Southern Poverty <br><small>An Examination with Spatial Regression</small></br>
author: "Drs. Trevor Brooks, Kadi Bliss, Melissa Gomez, and Christopher Gentry"
output:
html_notebook:
df_print: paged
highlight: breezedark
number_sections: yes
rows.print: 10
theme: cosmo
toc: yes
toc_float:
collapsed: no
smooth_scroll: yes
html_document:
df_print: paged
toc: yes
pdf_document: default
editor_options:
chunk_output_type: inline
---
<style type="text/css">
h1.title {
font-size: 40px;
font-family: "Times New Roman", Times, serif;
color: DarkBlue;
text-align: center;
}
h4.author { /* Header 4 - and the author and data headers use this too */
font-size: 20px;
font-family: "Times New Roman", Times, serif;
color: DarkBlue;
text-align: center;
}
</style>
# Packages used in this project
There are several specialty packages that will be used in this project due to the specific nature of the analyses. Some of these packages you will need to install while several others you may already have installed.
<p align="center">
|```cowplot``` | ```dplyr``` | ```geosphere``` | ```ggplot2``` | ```ggExtra``` | ```maps``` | <br></br> | ```maptools``` | ```readxl``` | ```rgdal``` | ```rgeos``` | ```sf``` | ```sp``` | <br></br> |```spatialreg``` | ```spdep``` | ```stringr``` | ```tidyr``` | ```viridis```|
</p>
To begin, we will install the following:
```{r Packages, message=FALSE, warning=FALSE, results='hide'}
packages<-c("cowplot", "dplyr", "geosphere", "ggplot2", "ggExtra", "maps", "maptools", "readxl", "rgdal", "rgeos", "sf", "sp", "spatialreg", "spdep", "stringr","tidyr", "viridis")
sapply(packages, require, character.only=T)
```
> You can change **require**, in the line of script above that begins with ```sapply```, to **install.packages** so that it reads ```sapply(packages, install.packages, character.only=T)``` and it will install all of the necessary packages. Then you can revert back to **require or library** to load them.
# Introduction to the data
This data, collected by Dr. Brooks, examines the effect of numerous socioeconomic factors as they relate to the percentage of children under the age of 18 who are living below the poverty line in the Southern United States. The data was collected from the U.S. Census Bureau, American Community Survey, Bureau of Labor Force Statistics, U.S. Department of Agriculture Economic Research Service, and the County Health Rankings. Independent variables include: rural, urban, manufacturing, agriculture, retail, healthcare, construction, less than high school degree, unemployment, income ratio, teen birth, unmarried, single mother, uninsured, as well as Black and Hispanic race variables. The data is stored as a comma delimited file and can be found in the *data folder* of this project [repository](https://github.com/chrismgentry/southeast_poverty). To load the data we will use the following script to ensure all of the data is appropriately formatted.
```{r Data, messages=FALSE, warning=FALSE}
se.data <- read.csv("./Data/childpov18_southfull.csv",
colClasses = c("character", "character", "character",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric"))
#rormat variables
names(se.data)[names(se.data)=="X2016.child.poverty"] <- "child.pov.2016"
se.data$FIPS <- str_pad(se.data$FIPS, 5, "left", pad = 0)
```
In the formatting portion, a "0" was added to each FIPS code that was only 4-digits in length creating a 5-digit code with a preceding 0.
using this primary dataset, we can create subsets for each of the regions represented include the South Atlantic, East South Central, and West South Central states.
```{r Data Subsets, message=FALSE, warning=FALSE}
#create subsets
satl.data <- subset(se.data, State %in% c("DE", "DC", "FL", "GA", "MD", "NC", "SC", "VA", "WV"))
esc.data <- subset(se.data, State %in% c("AL", "KY", "MS", "TN"))
wsc.data <- subset(se.data, State %in% c("AR", "LA", "OK", "TX"))
```
Next, we import shapefiles from an ESRI dataset to use as the basemap for the region. I attempted to use a dataset from R however, the number of counties did not match the number of counties in the \*.csv file so we resorted to an external data source. These shapefiles can be found in the *shapefiles folder* of the project [repository](https://github.com/chrismgentry/southeast_poverty).
```{r Polygon Subsets, message=FALSE, warning=FALSE, results='hide'}
#creating counties from shapefiles
se.shape<-readOGR(dsn="./Shapefiles",layer="SE_Counties_2016", stringsAsFactors = FALSE)
se.shape@data <- se.shape@data[-c(6:54,57:63)]
names(se.shape@polygons) <- se.shape@data$FIPS
#create subsets
satl.shape <- subset(se.shape, STATE_NAME %in% c("Delaware", "District of Columbia", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "West Virginia"))
esc.shape <- subset(se.shape, STATE_NAME %in% c("Alabama", "Kentucky", "Mississippi", "Tennessee"))
wsc.shape <- subset(se.shape, STATE_NAME %in% c("Arkansas", "Louisiana", "Oklahoma", "Texas"))
```
Because this shapefile had extraneous data, we removed several columns and set polygons names as the FIPS codes for each county. Additionall, we created subsets for each of the three regions.
# Spatial Regression
For this analysis we will:
1. Expore the data using OLS
2. Determine the presence of autocorrelation using Moran's and LaGrange Multipliers
3. Create spatial weights based on contiguity and distance for the full dataset as well as the separate regions
4. Run various spatial models to determine the most appropriate analysis
5. Create a bivariate map of the results
We will try to use straight-forward naming conventions to easily keep track of the data and specific analyses.
# Creating Spatial Weights
In order to determine if there is any underlaying spatial relationships in our data we can run a various tests, however, in order to do this we need to provide a spatial weights matrix. These spatial weights can be based on *contiguity* (common neighbors) or *distance* (k-nearest neighbors).
```{r Spatial Weights Matrix, message=FALSE, warning=FALSE}
#Create contiguity neighbors
se.neighbors<-poly2nb(se.shape, queen=T, row.names = se.shape$FIPS)
names(se.neighbors) <- names(se.shape$FIPS)
#Create list of neighbors based on contiguity
se.neighbors.list<-nb2listw(se.neighbors, style="W", zero.policy = TRUE,
row.names(names(se.neighbors)))
#Create xy for all counties for distance weights
county.xy<-cbind(se.shape$Longitude,se.shape$Latitude)
colnames(county.xy)<-cbind("x","y")
rownames(county.xy)<-se.shape$FIPS
#Create distance centroid for the k=1 nearest neighbor
county.k1 <-knn2nb(knearneigh(county.xy, k=1, longlat = TRUE))
#Determine max k distance value for nearest neighbor
county.max.k1 <-max(unlist(nbdists(county.k1, county.xy, longlat=TRUE)))
#Calculate neighbors based on max k=1 distance
county.dist.k1 <-dnearneigh(county.xy, d1=0, d2=1 * county.max.k1, longlat = TRUE)
#Create neighbor list for each county based on k=1 distance
county.k1.neighbors <-nb2listw(county.dist.k1, style="W", zero.policy = TRUE)
```
With both the contiguity and distance weights created, we can now move on with the analyses. To begin with, we will determine if the model variables have any ability to explain child poverty in the southern United States by using an OLS.
To make further analyses easier, we can create an object that contains the equation. Additionally, to make reading the results easier, we will change the scientific notation to five significant digits,
```{r Equation, message=FALSE, warning=FALSE}
#Creating the equation
equation <- child.pov.2016 ~ rural + urban + lnmanufacturing + lnag +
lnretail + lnhealthss + lnconstruction + lnlesshs + lnunemployment +
lnsinglemom + lnblack + lnhispanic + lnuninsured + lnincome_ratio +
lnteenbirth + lnunmarried
#scientific notation
options(scipen = 5)
```
# OLS
```{r}
ols <- lm(equation, data=se.data)
summary(ols)
```
From the OLS we can see not only are there a number of significant variables but the model itself is significant with a r<sup>2</sup> value of 0.64. To test for spatial dependency in the residuals we can run a Global Moran's I. This requires that we include the spatial weights matrix in the analysis.
```{r Morans Contiguity, message=FALSE, warning=FALSE}
#Morans test, distance based on contiguity
cont.morans <- lm.morantest(ols, se.neighbors.list)
cont.morans
```
```{r Morans Distance, message=FALSE, warning=FALSE}
#Morans test, distance based on distance
dist.morans <- lm.morantest(ols, county.k1.neighbors)
dist.morans
```
From the analysis above, we can conclude there is some spatial autocorrelation in the residuals when we use the distance model. However, we can also use LaGrange Multipliers to assist with model selection.
```{r LM for Contiguity, message=FALSE, warning=FALSE}
#LaGrange Multiplier
cont.lm.tests <- lm.LMtests(ols, se.neighbors.list, test="all")
cont.lm.tests
```
From the LM tests for contiguity we can see there is no spatial autocorrelation in the models. So next we can run LM tests for the distance matrix.
```{r LM for Distance, message=FALSE, warning=FALSE}
dist.lm.tests <- lm.LMtests(ols, county.k1.neighbors, test="all")
dist.lm.tests
```
Following this analysis, we can see that there is spatial dependencies in the spatial error (```r dist.lm.tests[["LMerr"]][["p.value"]]```) and spatial lag (```r dist.lm.tests[["LMlag"]][["p.value"]]```) models. When comparing the two, the spatial lag distance model has a much smaller p-value and therefore should result in the best model for the data.
# Spatial Lag Model with Distance Weights (k-nearest neighbors)
Continuing the analysis using the distance lag model, we can examine the results of the new model.
```{r Dist Lag Model, message=FALSE, warning=FALSE}
dist.lag.model <- spatialreg::lagsarlm(equation, data=se.data,
county.k1.neighbors)
dist.lag.summary <- summary(dist.lag.model, Nagelkerke = TRUE)
dist.lag.summary
```
Similar to the OLS, we can see that there are a number of significant variables. Additionally, the model has a pseudo r<sup>2</sup> of 0.64546 and a p-value of ```r dist.lag.summary[["LR1"]][["p.value"]]```. To examine the results of the Distance Lag Model, we can create a bivariate map to examine the modeled values versus the predicted values.
# Mapping the Data
Because there is no built-in function for creating bivariate maps, we will need to create a number of custom object. So a brief description will precede each step below, but specific details can be found in the [Spatial Regression](https://chrismgentry.github.io/Spatial-Regression/) exercise created for the Advanced Data Analytics course.
To begin making the map we will need to combine data from the original dataset with results from the model.
```{r Combine the data, warning=FALSE, message=FALSE}
dist.lag.data <- cbind.data.frame(se.data$FIPS,
dist.lag.summary$fitted.values,
dist.lag.summary$residual,
se.data$child.pov.2016,
se.data$urban,
se.data$lnretail,
se.data$lnhealthss,
se.data$lnconstruction,
se.data$lnlesshs,
se.data$lnunemployment,
se.data$lnsinglemom,
se.data$lnhispanic,
se.data$lnuninsured,
se.data$lnincome_ratio,
se.data$lnunmarried,
stringsAsFactors = FALSE)
#Renaming columns
colnames(dist.lag.data) <- c("fips","fitted","resid","childpov", "urban","retail", "healthcare","construction","less_hs","unemployed", "single_mom","hispanic","uninsured","income_ratio","unmarried")
```
Next we need to break the variables we want to map, in this case the actual and predicted values, in to quantiles and rank each value to create a scale.
```{r Breaks, warning=FALSE, message=FALSE}
#quantiles
quantiles_fit <- dist.lag.data %>%
pull(fitted) %>%
quantile(probs = seq(0, 1, length.out = 4), na.rm = TRUE)
quantiles_pov <- dist.lag.data %>%
pull(childpov) %>%
quantile(probs = seq(0, 1, length.out = 4), na.rm = TRUE)
#ranks
fit_rank <- cut(dist.lag.data$fitted,
breaks= quantiles_fit,
labels=c("1", "2", "3"),
na.rm = TRUE,
include.lowest = TRUE)
pov_rank <- cut(dist.lag.data$childpov,
breaks= quantiles_pov,
labels=c("1", "2", "3"),
na.rm = TRUE,
include.lowest = TRUE)
#Join ranks and combined column to dataset
dist.lag.data$fit_score <- as.numeric(fit_rank)
dist.lag.data$pov_score <- as.numeric(pov_rank)
dist.lag.data$fit_pov <- paste(as.numeric(dist.lag.data$fit_score),
"-",
as.numeric(dist.lag.data$pov_score))
```
We will need to build a custom legend for this map that includes custom labels.
```{r}
#legend
legend_colors <- tibble(
x = c(3,2,1,3,2,1,3,2,1),
y = c(3,3,3,2,2,2,1,1,1),
z = c("#574249", "#627f8c", "#64acbe", "#985356", "#ad9ea5", "#b0d5df", "#c85a5a", "#e4acac", "#e8e8e8"))
xlabel <- "Modeled,Low \u2192 High"
xlabel <- gsub(",", "\n", xlabel)
ylabel <- "Observed,Low \u2192 High"
ylabel <- gsub(",", "\n", ylabel)
legend <- ggplot(legend_colors, aes(x,y)) +
geom_tile(aes(fill=z)) +
theme_minimal() + theme(legend.position = "none") +
theme(panel.grid.major = element_blank(), panel.grid.minor =element_blank())+ theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
labs(x = xlabel, y = ylabel) +
scale_fill_identity() +
ggtitle("Legend") +
theme(axis.title.y = element_text(face = "italic", hjust = 0.5, size = 8)) +
theme(axis.title.x = element_text(face = "italic", hjust = 0.5, size = 8)) +
theme(plot.title = element_text(face="bold", hjust = 0.5, size = 10))
```
We will also need to join the data with the county polygons and attach the color codes for each ranked value.
```{r data match, warning=FALSE, message=FALSE}
#attach data to counties
county.data <- southern_counties %>%
left_join(dist.lag.data, by = c("fips" = "fips")) %>% fortify
#attach colors
bivariate_color_scale <- tibble(
"3 - 3" = "#574249",
"2 - 3" = "#627f8c",
"1 - 3" = "#64acbe",
"3 - 2" = "#985356",
"2 - 2" = "#ad9ea5",
"1 - 2" = "#b0d5df",
"3 - 1" = "#c85a5a",
"2 - 1" = "#e4acac",
"1 - 1" = "#e8e8e8") %>%
gather("group", "fill")
county.data <- county.data %>%
left_join(bivariate_color_scale, by = c("fit_pov" = "group"))
```
We are now ready to create the map. Because we want the map to have certain cartographic qualities we need to obtain additional base layers and create the map.
```{r fitted map, warning=FALSE, message=FALSE}
#The FIPS
fips <- county.fips
fips$fips <- str_pad(fips$fips, 5, "left", pad = 0)
#The Polygons
world <- map_data("world")
states <- map_data("state")
counties <- map_data("county")
#The Join
counties$polyname <- paste(counties$region, counties$subregion, sep = ",")
counties <- counties %>% left_join(fips, by = c("polyname" = "polyname"))
counties <- counties %>% left_join(se.data, by = c("fips" = "FIPS"))
#Subsets
southern_states <- subset(states, region %in%
c("texas", "arkansas", "louisiana", "mississippi",
"alabama", "georgia", "florida", "north carolina", "south carolina", "tennessee", "oklahoma", "kentucky", "west virginia", "virginia", "maryland", "delaware", "district of columbia"))
southern_counties <- subset(counties, region %in%
c("texas", "arkansas", "louisiana", "mississippi", "alabama", "georgia", "florida", "north carolina", "south carolina", "tennessee", "oklahoma", "kentucky", "west virginia", "virginia", "maryland", "delaware", "district of columbia"))
#fit map
fit_pov_map <- ggplot() +
geom_polygon(data = world, aes(x=long,y=lat, group=group), fill = "gray95", color = "gray30") +
geom_polygon(data = states, aes(x=long,y=lat, group=group), fill = "gray", color = "gray30") +
geom_polygon(data = county.data, aes(x=long, y=lat, group=group, fill = fill)) +
geom_polygon(data = southern_counties, aes(x=long,y=lat, group=group), fill = NA, color = "black", size = 0.05) +
geom_polygon(data = southern_states, aes(x=long,y=lat, group=group), fill = NA, color = "white") +
coord_map("conic", lat0 = 30, xlim=c(-106,-77), ylim=c(24.5,40.5)) +
scale_fill_identity() +
theme_grey() + theme(legend.position="bottom") + theme(legend.title.align=0.5) +
theme(panel.background = element_rect(fill = 'deepskyblue'),
panel.grid.major = element_line(colour = NA)) +
labs(x = "Longitude", y = "Latitude", fill = "Child Poverty",
title = "Bivariate Map of Observed vs Modeled Poverty Values") +
theme(plot.title = element_text(face = "bold", hjust = 0.5))
```
And finally, we can combined the final map and the legend to create the final output.
```{r final map, message=FALSE, warning=FALSE, fig.height = 6, fig.width = 8}
#final map
final_map <- ggdraw() +
draw_plot(fit_pov_map, x = 0, y = 0, width = 1, height = 1) +
draw_plot(legend, x = 0.55, y = 0.12, width = 0.15, height = 0.25)
final_map
```
# Models Comparisons and the Spatial Durbin Models
Table 1 | | Southeast | | ESC | | SATL | | WSC | |
-------- | ----- | -------- | ------ | ------- | ----- | ------- | ------ | ------- | ----- |
*Neighbors*|*Model*|*P Value* |*r2* |*P Value*|*r2* |*P Value*|*r2* |*P Value*|*r2* |
OLS |-|2.20E-16|0.6397|2.20E-16|0.6477|2.20E-16|0.6789|2.20E-16|0.5708|
Contiguity |Lag|-|-|-|-|-|-|-|-|
Contiguity |Err|-|-|-|-|-|-|-|-|
Contiguity |Pre-Lag|-|-|-|-|-|-|-|-|
Contiguity |Pre-Err|0.02186|0.55169|-|-|0.02580|0.74402|-|-|
K = 1 | Lag |**0.00000**|**0.65721**|-|-|**0.00012**|**0.69546**|0.00002|0.60085|
K = 1 | Err |0.00000|0.65611|-|-|0.00256|0.69249|**0.00001**|**0.60206**|
K = 1 | Pre-Lag |0.00000|0.55732|-|-|0.00060|0.74697|0.01166|0.46203|
K = 1 | Pre-Err |**0.00000**|**0.58516**|-|-|0.00064|0.74692|0.00000|0.48571|
K = 2 | Lag |0.00000|0.65269|0.02404|0.66789|0.00045|0.69416|0.00076|0.59529|
K = 2 | Err |0.00000|0.65232|-|-|0.00611|0.69167|0.00298|0.59312|
K = 2 | Pre-Lag |0.00005|0.55521|-|-|0.00141|0.74629|0.00776|0.46286|
K = 2 | Pre-Err |0.00000|0.58475|0.01674|0.65211|**0.00002**|**0.74967**|**0.00000**|**0.48681**|
K = 3 | Lag |0.00000|0.65024|0.00062|0.67389|0.00135|0.69311|0.00414|0.59259|
K = 3 | Err |0.00000|0.65046|-|-|-|-|0.01582|0.59051|
K = 3 | Pre-Lag |0.00160|0.55317|-|-|0.01561|0.74441|0.00984|0.46237|
K = 3 | Pre-Err |0.00000|0.58085|0.00310|0.65499|0.00212|0.74596|0.00000|0.48368|
K = 4 | Lag |0.00000|0.64993|0.00060|0.67393|0.00681|0.69156|0.00335|0.59293|
K = 4 | Err |0.00000|0.65036|-|-|-|-|0.00686|0.59181|
K = 4 | Pre-Lag |0.00323|0.55277|-|-|-|-|0.01272|0.46185|
K = 4 | Pre-Err |0.00000|0.57933|**0.00271**|**0.65523**|0.00372|0.74552|0.00000|0.48213|
K = 5 | Lag |0.00000|0.64978|**0.00060**|**0.67395**|0.01313|0.69095|0.00341|0.59290|
K = 5 | Err |0.00000|0.64983|0.02000|0.66826|-|-|0.00754|0.59166|
K = 5 | Pre-Lag |0.00284|0.55284|-|-|-|-|0.00752|0.46292|
K = 5 | Pre-Err |0.00000|0.57890|0.00281|0.65516|0.00544|0.74522|0.00000|0.48345|
*See emailed excel sheet for model comparisons*
## Model Equations
### Spatially Lagged X Variable(s)
In a non-spatial OLS model, $y = X \beta + \varepsilon$, we examine the relationship between one or more independent variables and a dependent variable. With a spatially lagged, or SLX model, we see an addition of a spatial component to the equation that accounts for the average value of neighboring X values.
> $y = X \beta + WX\theta + \varepsilon$, where <span style="font-size:12px;">$WX\theta$</span> is the average value of our neighbors independent variable(s)
So the SLX equation examines whether variables that impact our neighbors also impact us. This is consdiered a *local model* as there is only one-way interaction between our neighbors and us.
### Spatial Lag Model
In the Spatial Lag model, we lose the lagged X values that were used in the SLX model and instead add a spatially lagged y value.
> $y = \rho W y + X \beta + \varepsilon$, where <span style="font-size:12px;">$\rho W y$</span> is the spatially lagged y value of our neighbors
In this *global model* the dependent variable among our neighbors influences our dependent variable. Therefore there is a feedback loop that occurs where affects on our neighbor(s) y affects our y and our neighbor(s) y variable.
### Spatial Error Model
The Spatial Error model does not include lagged dependent or independent variables, but instead includes a function of our unexplained error and that of our neighbors.
> $y = X \beta + u$, $u = \lambda W u + \varepsilon$, where <span style="font-size:12px;">$\lambda W u + \varepsilon$</span> is the function of our unexplained error (residuals) and our neighbors residual values
This is a *global model* where the higher than expected residual values suggest a missing explanatory variable that is spatially correlated. This would lead to unexplained clusters of spatially correlated values within our neighborhood that were not included as a predictor variable(s).
### Spatial Durbin Model
The Spatial Durbin model includes both lagged y and lagged x values. This model can be simplified into *global* SLX, Spatial Lag, Spatial Error models, or OLS.
> $y = \rho Wy + X\beta + WX\theta + \varepsilon$
Because of the inclusion of the lagged y, this model would be used if the relationships in your data are global, meaning that if the y variable is impacted in one region that impact will spill over to every region in the dataset.
### Spatial Durbin Error Model
This nested model does not contain a lagged y and can be simplified into a *local* Spatial Error, SLX, or OLS model.
> $y = X\beta + WX\theta + u$, $u = \lambda Wu + \varepsilon$
This nested model would be used if the relationships in your data are local, meaning changes in one region will only impact the immediate neighbors of the region and not all of the regions in the dataset.
## Spatial Durbin Model
```{r SDM, echo=TRUE, message=FALSE, warning=FALSE}
summary(spatialreg::lagsarlm(formula = equation, data = se.data, listw = county.k1.neighbors, type = "mixed"))
```
### Results of the Impacts Matrix for the Spatial Durbin Model
```{r SDM Impacts, echo=TRUE, message=FALSE, warning=FALSE}
sdm.impacts <- summary(spatialreg::impacts(sdm, listw = county.k1.neighbors, R = 100), zstats = TRUE) #[["pzmat"]]
sdm.impacts
```
### Likelihood Ratio Results
```{r SDM v Dist Lag, echo=TRUE, message=FALSE, warning=FALSE}
spatialreg::LR.sarlm(sdm,dist.lag.model)
```
```{r SDM v Dist Err, echo=TRUE, message=FALSE, warning=FALSE}
spatialreg::LR.sarlm(sdm,dist.err.model)
```
```{r SDM v SLX, echo=TRUE, message=FALSE, warning=FALSE}
spatialreg::LR.sarlm(sdm,se.SLX.model)
```
<p align="center">
![Plot Example](plot.png "Plot Example")
</p>
### Morans and Local Indicators of Spatial Association (LISA)
```{r LISA, message=FALSE, warning=FALSE, fig.height = 6, fig.width = 8}
mc.morans <- spdep::moran.mc(se.data$child.pov.2016, county.k1.neighbors, nsim = 99)
mc.morans
lc.morans <- spdep::localmoran(se.data$child.pov.2016, county.k1.neighbors)
lc.morans
pov.rate <- scale(se.data$child.pov.2016) %>% as.vector()
pov.lag.rate <- lag.listw(county.k1.neighbors, pov.rate)
lisa.data <- se.data %>%
mutate(quad_sig = case_when(
pov.rate >= 0 & pov.lag.rate >= 0 & lc.morans[, 5] <= 0.05 ~ "high-high",
pov.rate <= 0 & pov.lag.rate <= 0 & lc.morans[, 5] <= 0.05 ~ "low-low",
pov.rate >= 0 & pov.lag.rate <= 0 & lc.morans[, 5] <= 0.05 ~ "high-low",
pov.rate <= 0 & pov.lag.rate >= 0 & lc.morans[, 5] <= 0.05 ~ "low-high",
lc.morans[, 5] > 0.05 ~ "not significant"
))
lisa <- lisa.data %>% fortify()
```
<p align = "center">
![LISA Example](LISA.png "LISA Example")
</p>
## Spatial Durbin Error Model
```{r SDEM, echo=TRUE, message=FALSE, warning=FALSE}
sdem <- spatialreg::errorsarlm(equation, se.data, county.k1.neighbors, etype = "emixed")
summary(sdem, Nagelkerke = TRUE)
```
### SDEM Impacts Table
```{r SDEM Impacts Table, echo=TRUE, message=FALSE, warning=FALSE}
sdem.impacts <- summary(spatialreg::impacts(sdem, listw = county.k1.neighbors, R = 100), zstats = TRUE)#[["pzmat"]]
sdem.impacts
```