forked from Robinlovelace/Creating-maps-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
r-spatial-ref-card.Rmd
235 lines (175 loc) · 10.3 KB
/
r-spatial-ref-card.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
title: "R Spatial Reference Card"
author: "Robin Lovelace and Rachel Oldroyd"
output: pdf_document
geometry: margin=0.3in
---
```{r global_options, include=FALSE}
library(knitr)
opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
warning=FALSE, message=FALSE, results='hide')
# remove or change these options to see the output of the commands
```
*Getting Started**
R has a suite of powerful and effective spatial data processing tools. This reference card
summarises the fundamentals of R for geographical applications, in terms of packages and functions. The code has been tested on data from the
'Creating-maps-in-R' [GitHub repository](http://www.github.com/Robinovelace/Creating-maps-in-R). Run the commands from the [unzipped folder](https://github.com/Robinlovelace/Creating-maps-in-R/archive/master.zip) to see the commands in action!
R has extensive help documentation which can be utilised by using the `?` key; try:
```{r}
?summary #(for help on the summary function),
??summary #(to search for word in R's documentation)
example(summary) #(for pre-made examples of the summary function).
```
If you are working from within an R project, the working directory will be set automatically to the parent folder. Optionally, the working directory can be set using the `setwd` command:
```{r global_options, include=FALSE}
setwd("C:/.../.../Creating-maps-in-R/data")
To check the current working directory use `getwd`:
getwd() #Returns path of working directory
```
**Key packages**
Additional packages must be installed (using `install.packages('packageName')`) on top of
R's [default packages](http://www.hep.by/gnu/r-patched/r-admin/R-admin_33.html) to utilise R's spatial capabilities.
Load a package using the `library` function, for example, `library(rgdal)` will load the **rgdal** package which is R's interface to the 'Geospatial Abstraction Library' (GDAL):
```{r}
library(rgdal)
```
The most important and frequently used *general purpose* spatial packages are:
- **sp**: basis of most other spatial R packages - provides Spatial* classes
- **rgdal**: R's interface to the GDAL data import library
- **rgeos**: provides a number of spatial functions
- **maptools**: tools for handling spatial objects
- **raster**: A matrue and extremely useful spatial package, providing functions for raster applications.
- **spatstat**: provides functions for point-pattern applications.
There are hundreds of additional R packages described on the [CRAN website](http://cran.r-project.org) in the [*spatial view*](http://cran.r-project.org/web/views/Spatial.html).
**R's Spatial* classes**
To work with spatial data, R uses *classes* that are more complex than the default data classes
in R's base package; *S3* `vector`, `list` and `data.frame`. Spatial object classes can be
identified with the function `class(objectName)`. Each contains a number of different data
*slots*: sub-classes of specific types. Some important spatial classes provided by the `sp`
package include:
- `SpatialPoints`: point data, each point represented by an x and y coordinate
- `SpatialLines`: line data, composed of lists of the `Lines` sub-class or *slot*
- `SpatialPolygons`: polygon data, with `Polygons` and `Polygon` *slots*
- `SpatialPixels`: a spatial class for raster data comprised of pixels
The above classes cannot contain non-spatial attribute data (for example, an attribute table in a
conventional GIS package). Therefore, they must first be converted into a class of the same name, but suffixed with `DataFrame`. Thus `london <- SpatialPointsDataFrame(sp, df)` will create a new object called london containing points with spatial and non-spatial information. Likewise `SlDf <- SpatialLinesDataFrame` will create a new object called SlDf containing lines with spatial and non-spatial information.
The attribute data of the london object can be accessed in this instance by using `london@data`. The `@data` notation refers to the `data` slot of the new object. More fundamental spatial classes, which
are in fact subsets of the `Spatial*` type classes listed above, include `bbox` (the
object's bounding box) and `proj4string` (the projection of the object, which may be `NA`).
**Loading spatial data**
The `readOGR` function from the **rgdal** packages can load a wide variety of spatial data types. The following line loads a shapefile of sports participation in London:
```{r, warning=FALSE, message=FALSE, results='hide'}
london <- readOGR("data", "london_sport")
```
Plot the london object:
plot(london)
**Analysing Attribute Data**
Use the following functions to return basic attribute data:
```{r}
names(london) # Print column headings of the london object
print(london) # Print the entire london object:
head(london@data, n=2) # Print a defined number of rows from the london object (2 in this case):
```
Use the following functions to calculate some basic column statistics:
```{r}
mean(london$Partic_Per) #calculate the mean value of the Partic_Per column (replace mean with other functions such as min, max and sum)
nrow(london) #Return the number of rows in the dataset
ncol(london) #Return the number of columns in the dataset
```
Use the summary function to view additional information about the london object:
```{r}
summary(london)
```
Undertake attribute queries on the london object, note the use of the square brackets to select a subset of the data:
```{r}
london@data[london$Partic_Per < 15, ] # Select rows where sports participation is less than 15.
london@data[london$Pop_2001 > 250000, ] #Select rows where population exceeds 250000.
```
**Allocating and changing projection**
Before undertaking any further analysis it is useful to know the Coordinate Reference System(CRS) of the london object:
```{r}
proj4string(london)
```
Change the Coordinate Reference System (CRS) if it has been incorrectly assigned:
```{r}
proj4string(london) <-CRS("+init=epsg:27700")
```
Note the warning above which states that the coordinate reference system has been changed but the data has not been transformed. To transform the data use:
```{r}
london.wgs84 <-spTransform(london, CRS("+init=epsg:4326"))
```
**Attribute Joins**
Attribute joins are used to link additional pieces of information to pre-existing polygons. In the london object, for example, there are 5 attribute variables (`names(london)`) and we may want to know the number of crimes occuring in each one. Firstly the additonal dataset will be loaded and prepared so it can be joined to the existing london object.
Load the additonal dataset and the 'plyr' package:
```{r}
crime <-read.csv("data/mps-recordedcrime-borough.csv") #Create an object named crime to store the data
library(plyr)
```
Extract the "Theft & Handling" crimes and save to new object (Note the use of square brackets to extract a subset of the crime dataset and the use of the $ to query a specific column):
```{r}
crime_theft <- crime[crime$CrimeType == "Theft & Handling", ]
```
Calculate the sum of the crime count at district level and save to a new object (Note the use of ~ which means 'by'):
```{r}
crime_ag<- aggregate(CrimeCount~ Borough, FUN = sum, data = crime_theft)
```
Now the crime data is at the borough level it can be joined to the london object:
Firstly check that the 'name' column in the london object matches the 'Borough' column in the crime_ag
object:
```{r}
london$name %in% crime_ag$Borough
```
Rename 'Borough' to 'Name' so the two objects can be joined:
```{r}
crime_ag<- rename(crime_ag, replace = c("Borough" = "name"))
```
Join the datasets together:
```{r}
london@data <- join(london@data, crime_ag)
```
**Clipping and Spatial Joins**
In addition to joining by attributes, it is also possible to undertake spatial joins in R. We are going to join transport infrastructure points (tube stations etc.) to the London object with the aim of finding how many points fall within each London borough.
Load the transport infrastructure points and create a new object called stations:
```{r}
stations <- readOGR("data", "lnd-stns")
```
Transform the stations coordinate system so it matches the London coordinate system:
```{r}
stations <- spTransform(stations, CRSobj = CRS(proj4string(london)))
```
Take a spatially determined subset of the stations object. This subset will only contain stations that are completely within the London object bounds:
```{r}
stations <- stations[london, ] #Reassign the stations object to contain the next spatial subset
```
Alternatively complete the spatial join using the 'over' function which accepts two variable; the target layer (the layer to be altered) and the source layer (the reference layer). Over will assign a value of 'NA' to all points in the target which do not fall within the bounds of the refence object. The stations variable name is then reassigned so it contains only the points which do not have a value of 'NA':
```{r}
selection <- over(stations, london) #Create a new object to store the output of the over function.
stations <- stations[!is.na(selection[,1]),] #Reassign the stations object to contain only points which do not have a value of NA.
```
**Spatial aggregation**
R's base function `aggregate` provides summaries of variables based on a particular grouping variable, in this case how many stations lie in each London borough:
```{r}
stations_agg<- aggregate(x=stations["CODE"], by = london, FUN = length)
```
This will create a new object, if we simply want to append a variable which contained the station count data to the original london object we could use:
```{r}
london$n_points <- stations_agg$CODE
```
The spatial implementation of `aggregate` can also provide summary statistics of variables, as well as simple counts:
```{r}
lnd_n <- aggregate(stations["NUMBER"] , by = london, FUN = mean) #Take the 'number' variable and find its mean value for the stations in each ward.
```
Analyse and plot the results:
```{r}
q <- cut(lnd_n$NUMBER, breaks = c(quantile(lnd_n$NUMBER)), include.lowest=T)
summary(q)
clr <- as.character(factor(q, labels = paste0("grey", seq(from = 20, to = 80, by = 20))))
plot(lnd_n, col = clr)
legend(legend = paste0("q", 1:4), fill = paste0("grey", seq(from = 20,to = 80, by = 20)), "topright")
```
**Map Making with ggplot2**
ggplot2 is one of the most well documented R packages, it is an implementation of the Grammar of Graphics scheme for data visualisation and can serve as an alternative to R's base graphics.
Load the ggplot2 library:
```{r}
library(ggplot2)
```