-
Notifications
You must be signed in to change notification settings - Fork 4
/
03-plotting-data.Rmd
124 lines (90 loc) · 6.02 KB
/
03-plotting-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Plotting Data"
output:
html_document:
code_download: true
toc: true
toc_float: false
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
library(tidyverse)
penguins <- read_csv("data/penguins.csv")
```
Visualising data is useful not only to explore the data and identify the relationship between different variables, but also to communicate the result of the analysis. The **ggplot2** package allows us you to generate high quality plots in just a few lines of code. Any ggplot plot will have at least 3 components: the **data**, a **coordinate system** and a **geometry** (the visual representation of the data) and will be built in layers.
Let's start by making plots!
## First layer: the area of the graphic
The main function of ggplot2 is precisely `ggplot()`, which allows us to start the graph and also to define the global characteristics. The first argument of this function will be the data we want to visualize, always in a data.frame. In this case we use `penguins`.
The second argument is called `mapping` because it's where we define how columns of the data "map" to visual properties of the geometries. This mapping is defined by the `aes()` function. In this case we indicate that in the x axis we want to plot the variable `bill_length_mm` and in the y-axis the variable `bill_depth_mm`.
But this alone is not enough, it only generates the first layer: the area of the graph.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm))
```
## Second layer: geometries
We need to add a new layer to our chart, the geometric elements or "geoms" that will represent the data. To do this we add a geom function, for example if we want to represent the data with points we will use `geom_point()`.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point()
```
We have our first plot!
You may have noticed that the dots are clustered in groups. Perhaps some other variable explains this behaviour.
To include information from other variables in our plot we can take advantage of the aesthetic characteristics of the geometries. In this case, we can "paint" the points according to the penguin species.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species))
```
Again, we use the `aes()` function to map a variable in our data to an element of the plot. And aha! Each species of penguins has different characteristics!
## Adding geometries
Very often it is not enough to look at the raw data to identify the relationship between variables; it is necessary to use some statistical transformation to highlight those relationships, either by fitting a model or calculating some statistics.
For this, ggplot2 has geoms that calculate some common statistical transformations. Let's try with `geom_smoth()` to fit a linear model to each species.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm")
```
By default `geom_smooth()` fits the data using the loess method (local linear regression) when there are less than 1000 data. But it is very common that you want to fit a global linear regression. In that case, you have to set `method = "lm"`.
## Let's talk about the look of the plot
For now we used the default ggplot look. We could change the look of your plot to match the style of the institution where we work, of the journal where we are going to publish it or simply to make it more eye-catching.
Let's start with colour. To change the aesthetic appearance of a plot element, we add a new layer with the `scale_*` function. In this case we'll use `scale_color_manual()` to choose the colours of the points manually. We could also use previously defined colour palettes like Viridis or Color Brewer.
We'll need 3 colors for the 3 species, let's use `"darkorange"`, `"purple"` and `"cyan4"` following the beautiful visualizations made by Allison Horst.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4"))
```
We are getting there! Now, let's add some text elements with a new ggplot layer: `labs()`.
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin bill dimensions",
subtitle = "Bill length and depth for Adelie, Chinstrap and Gentoo, Penguins at Palmer Station LTER",
x = "Bill length (mm)",
y = "Bill depth (mm)",
color = "Penguin species",
shape = "Penguin species")
```
Now the axes labels are more legible and we have a title and subtitle that explains what the plot is about.
We could keep changing this endlessly but we'll finish with the general look of your plot.
The overall look of a plot is defined by its theme. ggplot2 has many themes available and for all tastes. But there are also other packages that extend the possibilities, for example ggthemes. By default ggplot2 uses `theme_grey()`, let's try `theme_minimal()`:
```{r}
ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(aes(color = species), method = "lm") +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin bill dimensions",
subtitle = "Bill length and depth for Adelie, Chinstrap and Gentoo, Penguins at Palmer Station LTER",
x = "Bill length (mm)",
y = "Bill depth (mm)",
color = "Penguin species",
shape = "Penguin species") +
theme_minimal()
```
> Now it's your turn. Choose a theme you like and try it out. Also, if you can think of a better title, modify it!