-
Notifications
You must be signed in to change notification settings - Fork 9
/
Part2-workingWithFactors.Rmd
162 lines (106 loc) · 4.58 KB
/
Part2-workingWithFactors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: "Part 2 - Working with Factors"
author: "Ted Laderas (Solutions)"
date: "5/18/2017"
output:
html_document:
code_download: true
code_folding: hide
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```
In this section, we'll learn some skills about manipulating factor (categorical) data.
We'll do this by making a bar plot and a box plot and progressively making it more complex.
## Reviewing Factors
Factors are how R represents categorical data.
There are two kinds of factors:
+ `factor` - used for *nominal* data ("Ducks","Cats","Dogs")
+ `ordered`- used for *ordinal* data ("10-30","31-40","41-60")
We'll manipulate our barplot and add more information using factors.
Here's the simple dataset we'll use to investigate how to work with factors in `ggplot`.
For the `factor` and `ordered` variables, find the categories for each using `levels()`.
```{r}
library(tidyverse)
load("data/pets.rda")
pets
```
## A Basic Barplot using `geom_bar()`
The `geom_bar()` default is to count the number of values with each factor level. Note that you don't map to a y-aesthetic here, because the y values are the counts.
Given this dataset, we might want to ask how many pets have the same name. What is the most popular name?
Try mapping another variable to `fill` (try both `weight` and `animal`). Are the results what you expected?
```{r}
ggplot(data=pets, mapping=aes(x=name)) + geom_bar()
```
## Faceting
Say you have another `factor` variable and you want to stratify the plots based on that.
You can do that by supplying the name of that variable as a facet. Here, we facet our barplot by `shotsCurrent`.
```{r}
ggplot(data=pets, mapping=aes(x=name)) + geom_bar() +
## have to specify facets using notation
## try out facets=~ageCategory!
facet_wrap(facets=~shotsCurrent) +
## we make the x axis x angled for better legibility
theme(axis.text.x = element_text(angle=45))
```
You might notice that there are blank spots for the categories in each facet. We can restrict the factors in each by using `scale="free_x"` argument in `facet_wrap()`.
How many animals named "Morris" did not receive shots?
What happens when you replace the `scale` argument with "free_y"?
```{r}
ggplot(pets, aes(x=name)) + geom_bar() +
facet_wrap(facets=~shotsCurrent, scale="free_x") +
theme(axis.text.x = element_text(angle=45))
```
## Stacked Bars
Let's see how many of each animal got shots. We can do this by mapping `shotsCurrent` to `fill`.
```{r}
#we map color (the outline of the plot) to black to make it look prettier
ggplot(pets, aes(x=animal, fill=shotsCurrent)) +
geom_bar(color="black")
```
## Proportional Barchart
We may only be interested in the relative proportions between the different categories. Visualizing this is useful for various 2 x 2 tests on proportions.
What percent of dogs did not receive shots?
```{r}
ggplot(pets, aes(x=animal,fill=shotsCurrent)) +
geom_bar(position = "fill", color = "black")
```
## Dodge those bars!
Instead of stacking, we can also dodge the bars (move the bars so they're beside each other).
```{r}
ggplot(pets, aes(x=animal,fill=shotsCurrent)) +
geom_bar(position="dodge", color="black")
```
## Your Task: Bar Charts
Given the `pets` `data.frame`, plot a stacked proportional barchart that shows the age category counts by animal type. Is the proportion of animals receiving shots the same across each age category?
Hints: think about what to map to `x`, and what to map to `fill`.
Intermediate Folks: facet this plot by `shotsCurrent`.
```{r}
#Space for your answer here.
```
```{r}
#Space for your answer here.
```
## Boxplots
Boxplots allow us to assess distributions of a continuous variable conditioned on categorical variables.
What does this tell us?
```{r}
ggplot(pets, aes(x=shotsCurrent, y=weight)) + geom_boxplot()
```
## Violin Plots
Violin plots are another useful way to visualize the data. They provide a more nuanced look at the data. They're a density plot that's mirrored around the y-axis.
```{r}
ggplot(pets, aes(x=ageCategory, y=weight, fill=ageCategory)) + geom_violin()
```
## Your task: How heavy are our pets?
Visualize `weight` by `animal` type as both a boxplot and a violin plot. What do you conclude? Which kind of animal weighs more on average than the other?
Intermediate challenge: How would we plot both boxplots and a violin plot on the same graph?
```{r}
##Space for your answer here
```
## What you learned in this section
- Visualizing factor data
- Simple, stacked, stacked proportional, and dodged barplots
- Faceting a graph
- Boxplots and violin plots