forked from acatlin/SPRING2021TIDYVERSE
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Eric_Hirsch_607_GGPLOT2.Rmd
167 lines (119 loc) · 7.4 KB
/
Eric_Hirsch_607_GGPLOT2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "Eric_Hirsch_607_TidyVerse"
author: "Eric Hirsch"
date: "`r Sys.Date()`"
output: html_document
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
# ggplot2 - Creating Elegant and Useful Graphs in R
`ggplot2`, a package in the core tidyverse, is a system for easily and efficiently creating graphics in R. It is based on the "Grammar of Graphics", the notion that all graphs can be built from the same components: a data set, a coordinate system, and visual marks that represent data points. You specify these details to ggplot2 and the package creates the graph.
The possibilities with ggplot2 are endless. You can learn more at https://ggplot2.tidyverse.org/. I will describe here some of the basic core capabilities, and then do a deeper dive into themes.
## Loading the library and our sample data
We start by loading the tidyverse library. If tidyverse is not installed you must first install the tidyverse package ( install.packages("tidyverse"))
```{r, warning=F}
library(tidyverse)
```
Next we load the data that will be used for our examples. This data is from the article, "Marriage Isn't Dead - Yet", published on fivethirtyeight.com. The data represents the share of the population, aged 25 to 34, that has never been married, broken down by year and by other factors - we will be looking at level of education for the years from 2000 forward:
```{r load data}
dfMarriage <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master//marriage/both_sexes.csv", header= TRUE)
dfMarriage_Subset <- dfMarriage %>%
subset(select = c("year", "all_2534", "HS_2534", "SC_2534", "BAo_2534", "GD_2534")) %>%
rename(c(all_individuals = "all_2534", High_School_Or_Less = "HS_2534", Some_College = "SC_2534", Bachelors_NoGradDegree="BAo_2534", Graduate_Degree="GD_2534")) %>%
filter(year >= 2000)
knitr::kable(head(dfMarriage_Subset))
```
## Using the `ggplot2` library
`ggplot2` can create a vast array of graphs with many features. We will focus here on a few: creating different types of graphs, adding simple features like titles and labels, and creating themes.
#### Simple Boxplot
We begin with a boxplot of the share of never-married adults in the population with a high school education or less, for all the years in the dataset:
```{r boxplot3}
ggplot(data = dfMarriage_Subset, aes(x ="", y=High_School_Or_Less)) +
geom_boxplot()
```
Note the grammar of graphics at work - we supply a dataset (*dfMarriage_subset*), a coordinate system, or "aesthetic" (*aes(x = "", y=High_School_Or_Less))*) and a type of graph (*geom_boxplot()*) - and ggplot2 does the rest.
We can change or add to any of the parameters for this plot and generate a new graph. For example, we can improve the y axis label by specifying new text with *ylab()*, add a title with *ggtitle* and a new font size with *theme(text = element_text(size = 9))*.
```{r boxplot4}
ggplot(data = dfMarriage_Subset, aes(x ="", y=High_School_Or_Less)) +
geom_boxplot() +
ylab("Share of adults who have never married") +
ggtitle("Marriage Rates for Adults with a High School Education or Less") +
theme(text = element_text(size = 9))
```
#### Using Our Boxplot Specifications to Make A Bar Chart
What about a bar chart of the share of 'never-married adults with a high school education or less' against 'year'? True to the grammar of graphics, we can use virtually the same code with just two adjustments - change the x axis to 'year' and the plot type to 'geom_col.' The plot will retain our y axis label, title and other features:
```{r boxplot7}
g1 <- ggplot(data = dfMarriage_Subset, aes(x =year, y=High_School_Or_Less)) +
geom_col() +
ylab("Share of adults who have never married") +
xlab("Year") +
ggtitle("Marriage Rates for Adults with a High School Education or Less") +
theme(text = element_text(size = 10))
g1
```
We can see that among adults between 24 and 34 with a high school education or less, the share of those who have never been married has increased steadily over time.
There are many other plots possible with ggplot2 - column, scatter, histogram - virtually any plot a data scientist might need. We will not explore them here.
## Exploring the 'Themes' Component
We have already seen how much we can do with 'theme' in order to change font size. But there are over 100 elements that can be controlled using 'theme' to control the look of your graph. The best way to understand it is to see it in action.
Let's use theme elements on the above graph to
- Increase the size of the title and bold it
- Remove the gray background
- Remove the vertical gridlines
- Soften the horizontal gridlines
- Add a colored border
- Change and bold the color of the axis elements
- Change the angle of the x axis elements
- Decrease the size of and bold the axis titles
- Move the y axis title back off the axis elements
- move the x axis title up closer to the axis elements
- Remove the x axis ticks
```{r abc}
g2 <- g1 +
theme(plot.title = element_text(size = 12, face="bold"),
panel.background = element_rect(fill = "white"),
panel.grid.major.x = element_blank() ,
panel.grid.major.y = element_line( size=.2, color="gray" ),
plot.background = element_rect(fill = "cornsilk3"),
axis.text = element_text(colour = "darkslateblue", face="bold"),
axis.text.x = element_text(angle=65),
axis.title = element_text(size = 10, face="bold"),
axis.title.y = element_text(vjust=3),
axis.title.x = element_text(vjust=7),
axis.ticks.x = element_blank())
g2
```
Beware of chartjunk. It is always recommended to use ornament sparingly and only when it contributes to the overall communication of data. But many of these elements can come in handy to make charts more readable and understandable.
## Reusing your Custom Theme
If we want to keep consistent with this theme, we can put it in a function and call it again within our markdown file:
```{r theme function}
my_theme <- function()
{
theme(plot.title = element_text(size = 12, face="bold"),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "cornsilk3"),
panel.grid.major.x = element_blank() ,
panel.grid.major.y = element_line( size=.2, color="gray" ),
axis.text = element_text(colour = "darkslateblue", face="bold"),
axis.text.x = element_text(angle=65),
axis.title = element_text(size = 10, face="bold"),
axis.title.y = element_text(vjust=3),
axis.title.x = element_text(vjust=7),
axis.ticks.x = element_blank())
}
```
Now if we look at marriage rates for individuals with a Graduate degree we can use our theme for a consistent look like this:
```{r graduate degree}
ggplot(data = dfMarriage_Subset, aes(x =year, y=Graduate_Degree)) +
geom_col() +
ylab("Share of adults who have never married") +
xlab("Year") +
ggtitle("Marriage Rates for Adults with a Graduate Degree") +
my_theme()
```
Here we see that the pattern for adults with a graduate degree is different - marriage rates increased until 2005 and then began to decline.
This ends are introduction to ggplot2 and themes. There is so much more we can do! But this introduction should get you started.