-
-
Notifications
You must be signed in to change notification settings - Fork 60
/
11-plotting1.Rmd
1359 lines (989 loc) · 55.7 KB
/
11-plotting1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
html_document: default
pdf_document: default
---
# Plotting (I) {#plotting1}
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.align='center')
library(dplyr)
library(yarrr)
```
```{r, echo = FALSE}
text.outline <- function(x, y,
labels = 'test',
col = 'black',
font = 1,
bg = 'white',
r = 0.02,
h = 1,
w = 1,
cex = 1,
adj = .5,
pos = NULL){
# Draw background
is <- seq(0, 2 * pi, length = 72)
for(i in is){
xn = x + cos(i) * r * w
yn = y + sin(i) * r * h
text(xn, yn, labels = labels, col = bg, cex = cex, adj = adj, pos = pos, font = font)
}
# Foreground
text(x, y, labels = labels, col = col, cex = cex, adj = adj, pos = pos, font = font)
}
```
```{r sammy, fig.cap= "The great Sammy Davis Jr. Do yourself a favor and spend an evening watching videos of him performing on YouTube. Image used entirely without permission.", fig.margin = TRUE, echo = FALSE, out.width = "50%", fig.align='center'}
knitr::include_graphics(c("images/sammy.jpg"))
```
Sammy Davis Jr. was one of the greatest American performers of all time. If you don't know him already, Sammy was an American entertainer who lived from 1925 to 1990. The range of his talents was just incredible. He could sing, dance, act, and play multiple instruments with ease. So how is R like Sammy Davis Jr? Like Sammy Davis Jr., R is incredibly good at doing many different things. R does data analysis like Sammy dances, and creates plot like Sammy sings. If Sammy and R did just one of these things, they'd be great. The fact that they can do both is pretty amazing.
When you evaluate plotting functions in R, R can build the plot in different locations. The default location for plots is in a temporary plotting window within your R programming environment. In RStudio, plots will show up in the Plot window (typically on the bottom right hand window pane). In Base R, plots will show up in a Quartz window.
You can think of these plotting locations as canvases. You only have one canvas active at any given time, and any plotting command you run will put more plotting elements on your active canvas. Certain high--level plotting functions like `plot()` and `hist()` create brand new canvases, while other low--level plotting functions like `points()` and `segments()` place elements on top of existing canvases.
Don't worry if that's confusing for now -- we'll go over all the details soon.
Let's start by looking at a basic scatterplot in R using the `plot()` function. When you execute the following code, you should see a plot open in a new window:
```{r basicplot}
# A basic scatterplot
plot(x = 1:10,
y = 1:10,
xlab = "X Axis label",
ylab = "Y Axis label",
main = "Main Title")
```
Let's take a look at the result. We see an x--axis, a y--axis, 10 data points, an x--axis label, a y--axis label, and a main plot title. Some of these items, like the labels and data points, were entered as arguments to the function. For example, the main arguments x and y are vectors indicating the x and y coordinates of the (in this case, 10) data points. The arguments `xlab`, `ylab`, and `main` set the labels to the plot. However, there were many elements that I did not specify -- from the x and y axis limits, to the color of the plotting points. As you'll discover later, you can change all of these elements (and many, many more) by specifying additional arguments to the `plot()` function. However, because I did not specify them, R used **default** values -- values that R uses unless you tell it to use something else.
For the rest of this chapter, we'll go over the main plotting functions, along with the most common arguments you can use to customize the look of your plot.
## Colors
Most plotting functions have a color argument (usually `col`) that allows you to specify the color of whatever your plotting. There are many ways to specify colors in R, let's start with the easiest ways.
### Colors by name
The easiest way to specify a color is to enter its name as a string. For example `col = "red"` is R's default version of the color red. Of course, all the basic colors are there, but R also has tons of quirky colors like `"snow"`, `"papayawhip"` and `"lawngreen"`. Figure \@ref(fig:randomcolors) shows 100 randomly selected named colors.
```{r randomcolors, fig.width = 8, fig.height = 7, echo = FALSE, fig.cap="100 random named colors (out of all 657) in R."}
set.seed(100)
par(mar = c(0, 0, 0, 0))
plot(1, xlim = c(0, 11), ylim = c(0, 11),
type = "n", bty = "n", xaxt = "n", yaxt = "n",
xlab = "", ylab = "")
loc <- expand.grid(x = 1:10, y = 1:10)
col.vec <- colors()[sample(1:length(colors()), size = 100)]
for(i in 1:nrow(loc)) {
x.i <- loc$x[i]
y.i <- loc$y[i]
rect(x.i - .5, y.i - .5,
x.i + .5, y.i + .5,
col = col.vec[i], border = "white")
text(x.i, y.i, labels = col.vec[i], cex = .6)
}
```
To see all 657 color names in R, run the code `colors()`. Or to see an interactive demo of colors, run `demo("colors")`.
### gray()
| Argument| Description|
|:------------|:-------------------------------------------------|
|`level`|Lightness: `level = 1` = totally white, `level = 0` = totally black|
|`alpha`|Transparency: `alpha = 0` = totally transparent, `alpha = 1` = not transparent at all.|
Table: (\#tab:gray) `gray()` function arguments
```{r graylevels, echo = FALSE, fig.width = 7, fig.height = 6, fig.align='center', fig.cap = "Examples of gray(level, alpha)"}
par(mar = c(0, 4, 4, 4))
gray.dm <- expand.grid("level" = seq(0, 1, .1), "alpha" = seq(0, 1, .1))
plot(x = rep(1:11 - .1, times = 11),
y = rep(11:1, each = 11),
xlim = c(0, 12),
ylim = c(0, 12),
pch = 21,
bg = gray(gray.dm$level, gray.dm$alpha),
col = gray(.5),
bty = "n",
cex = 2.5,
ylab = "",
xlab = "",
xaxt = "n",
yaxt = "n")
points(x = rep(1:11 + .1, times = 11),
y = rep(11:1, each = 11),
pch = 21,
col = gray(.5),
bg = gray(gray.dm$level, gray.dm$alpha),
cex = 2.5
)
text(1:11, rep(12, 11), labels = seq(0, 1, .1))
text(rep(0, 11), 1:11, labels = seq(1, 0, -.1))
mtext("level = x", side = 3, line = 1.5, cex = 1.5)
mtext("Black", side = 3, las = 1, at = 1, cex = .8)
mtext("White", side = 3, las = 1, at = 11, cex = .8)
mtext("alpha = x", side = 2, line = 2, cex = 1.5)
mtext("Not at all\ntransparent", side = 2, las = 1, at = 1, cex = .8)
mtext("Completely\ntransparent", side = 2, las = 1, at = 11, cex = .8)
par("xpd" = TRUE)
arrows(-1.25, 10.5, -1.25, 1.5, lty = 1, length = .15)
arrows(1.5, 12.75, 10.5, 12.75, lty = 1, length = .15)
par("xpd" = FALSE)
#
# text(x = 1:11,
# y = rep(.5, 11),
# labels = seq(0, 1, .1),
# pos = 1,
# cex = .8
# )
```
If you're into erotic romance and BDSM, then you might be interested in [Shades of Gray](https://en.wikipedia.org/wiki/Fifty_Shades_of_Grey). If so, the function \texttt{gray(x)} is your answer. The `gray()` function takes two arguments, `level` and `alpha`, and returns a shade of gray. For example, `gray(level = 1)` will return white. The second `alpha` argument specifies how transparent to make the color on a scale from 0 (completely transparent), to 1 (not transparent at all). The default value for alpha is 1 (not transparent at all). See Figure \@ref(fig:graylevels) for examples.
### `yarrr::transparent()`
I don't know about you, but I almost always find transparent colors to be more appealing than solid colors. Not only do they help you see when multiple points are overlapping, but they're just much nicer to look at. Just look at the overlapping circles in the plot below.
```{r fig.width = 5, fig.height = 3.5, echo = FALSE}
par(mar = c(0, 0, 0, 0))
plot(1, xlim = c(0, 6), ylim = c(0, 1),
type = "n", xaxt = "n", yaxt = "n",
xlab = "", ylab = "", bty = "n")
text(1.5, .9, "Standard", cex = 1.5)
points(x = c(1, 2, 1.5),
y = c(.5, .5, .3),
col = c("red", "blue", "yellow"),
pch = 16, cex = 20)
text(4.5, .9, "Transparent", cex = 1.5)
library(yarrr)
points(x = c(4, 5, 4.5),
y = c(.5, .5, .3),
col = c(transparent("red", .5),
transparent("blue", .5),
transparent("yellow", .5)),
pch = 16, cex = 20)
```
Unfortunately, as far as I know, base-R does not make it easy to make transparent colors. Thankfully, there is a function in the `yarrr` package called `transparent` that makes it very easy to make any color transparent. To use it, just enter the original color as the main argument `orig.col`, then enter how transparent you want to make it (from 0 to 1) as the second argument `trans.val`.
Here is a basic scatterplot with standard (non-transparent) colors:
```{r, echo = TRUE}
# Plot with Standard Colors
plot(x = pirates$height,
y = pirates$weight,
col = "blue",
pch = 16,
main = "col ='blue'")
```
Now here's the same plot using the `transparent()` function in the `yarrr` package:
```{r}
# Plot with transparent colors using the transparent() function in the yarrr package
plot(x = pirates$height,
y = pirates$weight,
col = yarrr::transparent("blue", trans.val = .9),
pch = 16,
main = "col = yarrr::transparent('blue', .9)")
```
Later on in the book, we'll cover more advanced ways to come up with colors using color palettes (using the RColorBrewer package or the `piratepal()` function in the yarrr package) and functions that generate shades of colors based on numeric data (like the `colorRamp2()` function in the `circlize` package).
<!-- ## High vs. low-level plotting commands -->
<!-- There are two general types of plotting commands in R: high and low-level. High level plotting commands, like `plot()`, `hist()` and `pirateplot()` create entirely new plots. Within these high level plotting commands, you can define the general layout of the plot - like the axis limits and plot margins. -->
<!-- Low level plotting commands, like `points()`, `segments()`, and `text()` add elements to existing plots. These low level commands don't change the overall layout of a plot - they just add to what you've already created. Once you are done creating a plot, you can export the plot to a pdf or jpeg using the `pdf()` or `jpeg()` functions. Or, if you're creating documents in Markdown or Latex, you can add your plot directly to your document. -->
## Plotting arguments
Most plotting functions have *tons* of optional arguments (also called parameters) that you can use to customize virtually everything in a plot. To see all of them, look at the help menu for `par` by executing `?par`. However, the good news is that you don't need to specify all possible parameters you create a plot. Instead, there are only a few critical arguments that you must specify - usually one or two vectors of data. For any optional arguments that you do not specify, R will use either a default value, or choose a value that makes sense based on the data you specify.
In the following examples, I will to cover the main plotting parameters for each plotting type. However, the best way to learn what you can, and can't, do with plots, is to try to create them yourself!
I think the best way to learn how to create plots is to see some examples. Let's start with the main high-level plotting functions.
## Scatterplot: `plot()`
The most common high-level plotting function is `plot(x, y)`. The `plot()` function makes a scatterplot from two vectors x and y, where the x vector indicates the x (horizontal) values of the points, and the y vector indicates the y (vertical) values.
| Argument| Description|
|:------------|:-------------------------------------------------|
|`x, y`|Vectors of equal length specifying the x and y values of the points|
|`type`| Type of plot. `"l"` means lines, `"p"` means points, `"b"` means lines and points, `"n"` means no plotting|
|`main`, `xlab`, `ylab`| Strings giving labels for the plot title, and x and y axes|
|`xlim`, `ylim`| Limits to the axes. For example, `xlim = c(0, 100)` will set the minimum and maximum of the x-axis to 0 and 100.|
|`pch` | An integer indicating the type of plotting symbols (see `?points` and section below), or a string specifying symbols as text. For example, `pch = 21` will create a two-color circle, while `pch = "P"` will plot the character `"P"`. To see all the different symbol types, run `?points`.
|`col`| Main color of the plotting symbols. For example `col = "red"` will create red symbols.|
|`cex`| A numeric vector specifying the size of the symbols (from 0 to Inf). The default size is 1. `cex = 4` will make the points very large, while `cex = .5` will make them very small. |
Table: (\#tab:plot) `plot()` function arguments
```{r}
plot(x = 1:10, # x-coordinates
y = 1:10, # y-coordinates
type = "p", # Just draw points (no lines)
main = "My First Plot",
xlab = "This is the x-axis label",
ylab = "This is the y-axis label",
xlim = c(0, 11), # Min and max values for x-axis
ylim = c(0, 11), # Min and max values for y-axis
col = "blue", # Color of the points
pch = 16, # Type of symbol (16 means Filled circle)
cex = 1) # Size of the symbols
```
Aside from the x and y arguments, all of the arguments are optional. If you don't specify a specific argument, then R will use a default value, or try to come up with a value that makes sense. For example, if you don't specify the `xlim` and `ylim` arguments, R will set the limits so that all the points fit inside the plot.
### Symbol types: `pch`
When you create a plot with `plot()` (or points with `points()`), you can specify the type of symbol with the `pch` argument. You can specify the symbol type in one of two ways: with an integer, or with a string. If you use a string (like `"p"`), R will use that text as the plotting symbol. If you use an integer value, you'll get the symbol that correspond to that number. See Figure for all the symbol types you can specify with an integer.
Symbols differ in their shape and how they are colored. Symbols 1 through 14 only have borders and are always empty, while symbols 15 through 20 don't have a border and are always filled. Symbols 21 through 25 have both a border and a filling. To specify the border color or background for symbols 1 through 20, use the `col` argument. For symbols 21 through 25, you set the color of the border with `col`, and the color of the background using `bg`
```{r echo = FALSE, fig.cap = "The symbol types associated with the pch plotting parameter.", fig.width = 3, fig.height = 3, fig.align= 'center'}
par(mar = c(1, 1, 3, 1))
plot(x = rep(1:5 + .1, each = 5),
y = rep(5:1, times = 5),
pch = 1:25,
xlab = "", ylab = "", xaxt = "n", yaxt = "n",
xlim = c(.5, 5.5),
ylim = c(0, 6),
bty = "n", bg = "gray", cex = 1.4,
main = "pch = _"
)
text(x = rep(1:5, each = 5) - .35,
y = rep(5:1, times = 5),
labels = 1:25, cex = 1.2
)
```
Let's look at some different symbol types in action when applied to the same data:
```{r echo = FALSE}
par(mfrow = c(2, 2))
par(mar = c(0, 1, 6, 1))
x.data <- rnorm(25)
y.data <- x.data + rnorm(25)
# Plot 1
plot(x = x.data, y = y.data, xaxt = "n", yaxt = "n", xlab = "", ylab = "", main = "pch = 2,\ncol = 'blue'",
pch = 2, col = "blue", cex = 1.5, cex.main = 1.2)
# Plot 2
plot(x = x.data, y = y.data, xaxt = "n", yaxt = "n", xlab = "", ylab = "", main = "pch = 16,\ncol = 'orchid2'",
pch = 16, col = "orchid2", cex= 1.5, cex.main = 1.2)
# Plot 3
plot(x = x.data, y = y.data, xaxt = "n", yaxt = "n", xlab = "", ylab = "", main = "pch = 21,\ncol = 'black',\nbg = 'orangered2",
cex= 1.5, cex.main = 1.2,
pch = 21, col = "black", bg = "orangered2")
# Plot 4
plot(x = x.data, y = y.data, xaxt = "n", yaxt = "n", xlab = "", ylab = "", main = "pch = 25,\ncol = 'pink3',\nbg = 'plum3",
cex= 1.5, cex.main = 1.2,
pch = 25, col = "pink3", bg = "plum3")
```
## Histogram: `hist()`
| Argument| Description|
|:------------|:-------------------------------------------------|
|`x`|Vector of values|
|`breaks`| How should the bin sizes be calculated? Can be specified in many ways (see `?hist` for details)|
|`freq`| Should frequencies or probabilities be plotted? `freq = TRUE` shows frequencies, `freq = FALSE` shows probabilities.|
|`col`, `border`| Colors of the bin filling (`col`) and border (`border`)|
Table: (\#tab:hist) `hist()` function arguments
Histograms are the most common way to plot a vector of numeric data. To create a histogram we'll use the `hist()` function. The main argument to `hist()` is a `x`, a vector of numeric data. If you want to specify how the histogram bins are created, you can use the `breaks` argument. To change the color of the border or background of the bins, use `col` and `border`:
Let's create a histogram of the weights in the ChickWeight dataset:
```{r}
hist(x = ChickWeight$weight,
main = "Chicken Weights",
xlab = "Weight",
xlim = c(0, 500))
```
We can get more fancy by adding additional arguments like `breaks = 20` to force there to be 20 bins, and `col = "papayawhip"` and `bg = "hotpink"` to make it a bit more colorful:
```{r}
hist(x = ChickWeight$weight,
main = "Fancy Chicken Weight Histogram",
xlab = "Weight",
ylab = "Frequency",
breaks = 20, # 20 Bins
xlim = c(0, 500),
col = "papayawhip", # Filling Color
border = "hotpink") # Border Color
```
If you want to plot two histograms on the same plot, for example, to show the distributions of two different groups, you can use the \texttt{add = TRUE} argument to the second plot.
```{r}
hist(x = ChickWeight$weight[ChickWeight$Diet == 1],
main = "Two Histograms in one",
xlab = "Weight",
ylab = "Frequency",
breaks = 20,
xlim = c(0, 500),
col = gray(0, .5))
hist(x = ChickWeight$weight[ChickWeight$Diet == 2],
breaks = 30,
add = TRUE, # Add plot to previous one!
col = gray(1, .8))
```
## Barplot: `barplot()`
A barplot typically shows summary statistics for different groups. The primary argument to a barplot is `height`: a vector of numeric values which will generate the height of each bar. To add names below the bars, use the `names.arg` argument. For additional arguments specific to `barplot()`, look at the help menu with `?barplot`:
```{r}
barplot(height = 1:5, # A vector of heights
names.arg = c("G1", "G2", "G3", "G4", "G5"), # A vector of names
main = "Example Barplot",
xlab = "Group",
ylab = "Height")
```
Of course, you should plot more interesting data than just a vector of integers with a barplot. In the plot below, I create a barplot with the average weight of chickens for each week:
```{r}
# Calculate mean weights for each time period
diet.weights <- aggregate(weight ~ Time,
data = ChickWeight,
FUN = mean)
# Create barplot
barplot(height = diet.weights$weight,
names.arg = diet.weights$Time,
xlab = "Week",
ylab = "Average Weight",
main = "Average Chicken Weights by Time",
col = "mistyrose")
```
### Clustered barplot
If you want to create a clustered barplot, with different bars for different groups of data, you can enter a matrix as the argument to `height`. R will then plot each column of the matrix as a separate set of bars. For example, let's say I conducted an experiment where I compared how fast pirates can swim under four conditions: Wearing clothes versus being naked, and while being chased by a shark versus not being chased by a shark. Let's say I conducted this experiment and calculated the following average swimming speed:
```{r, echo = FALSE}
swim.data <- data.frame(c(2.1, 3), c(1.5, 3))
names(swim.data) <- c("Naked", "Clothed")
rownames(swim.data) <- c("No Shark", "Shark")
knitr::kable(swim.data)
```
I can represent these data in a matrix as follows. In order for the final barplot to include the condition names, I'll add row and column names to the matrix with `colnames()` and `rownames()`
```{r}
swim.data <- cbind(c(2.1, 3), # Naked Times
c(1.5, 3)) # Clothed Times
colnames(swim.data) <- c("Naked", "Clothed")
rownames(swim.data) <- c("No Shark", "Shark")
# Print result
swim.data
```
Now, when I enter this matrix as the `height = swim.data` argument to `barplot()`, I'll get multiple bars.
```{r}
barplot(height = swim.data,
beside = TRUE, # Put the bars next to each other
legend.text = TRUE, # Add a legend
col = c(transparent("green", .2),
transparent("red", .2)),
main = "Swimming Speed Experiment",
ylab = "Speed (in meters / second)",
xlab = "Clothing Condition",
ylim = c(0, 4))
```
## `pirateplot()`
| Argument| Description|
|:------------|:-------------------------------------------------|
|`formula`|A formula specifying a y-axis variable as a function of 1, 2 or 3 x-axis variables. For example, `formula = weight ~ Diet + Time` will plot `weight` as a function of `Diet` and `Time`|
|`data`| A dataframe containing the variables specified in `formula`|
|`theme`| A plotting theme, can be an integer from 1 to 4. Setting `theme = 0` will turn off all plotting elements so you can then turn them on individually.|
|`pal`|The color palette. Can either be a named color palette from the `piratepal()` function (e.g. `"basel"`, `"xmen"`, `"google"`) or a standard R color. For example, make a black and white plot, set `pal = "black"`|
|`cap.beans` | If `cap.beans = TRUE`, beans will be cut off at the maximum and minimum data values |
Table: (\#tab:pirateplot) `pirateplot()` function arguments
A pirateplot a plot contained in the `yarrr` package written specifically by, and for R pirates The pirateplot is an easy-to-use function that, unlike barplots and boxplots, can easily show raw data, descriptive statistics, and inferential statistics in one plot. Figure \@ref(fig:pirateplot) shows the four key elements in a pirateplot:
```{r pirateplot, echo = FALSE, fig.cap="The pirateplot(), an R pirate's favorite plot!"}
library(yarrr)
pirateplot(formula = weight ~ Diet,
data = ChickWeight,
theme = 1,
cap.beans = TRUE,
back.col = "white",
gl.col = "white",
bean.f.o = c(0, .1, .7, .1),
# bean.b.o = c(0, .1, 1, .1),
point.o = c(.4, .1, .1, .1),
avg.line.o = c(.3, 1, .3, .3),
inf.f.o = c(.1, .1, .1, .9),
bar.f.o = c(.1, .8, .1, .1),
inf.f.col = c("white", "white", "white", piratepal("xmen")[4]),
main = "4 Elements of a pirateplot",
pal = "xmen")
text(.7, 350, labels = "Points")
text(.7, 345, labels = "Raw Data", pos = 1, cex = .8)
arrows(.7, 310, .97, 270, length = .1)
text(1.4, 200, labels = "Bar/Line")
text(1.4, 200, labels = "Center", pos = 1, cex = .8)
arrows(1.4, 170, 1.54, 125, length = .1)
text(2.4, 250, labels = "Bean")
text(2.4, 250, labels = "Density", pos = 1, cex = .8)
arrows(2.4, 220, 2.85, 200, length = .1)
text(3.55, 300, labels = "Band")
text(3.55, 290, labels = "Inference\n95% HDI or CI", pos = 1, cex = .8)
arrows(3.55, 240, 3.8, 150, length = .1)
```
| Element| Description|
|:------------|:-------------------------------------------------|
|Points|**Raw** data.|
|Bar / Line| **Descriptive** statistic, usually the mean or median|
|Bean| Smoothed density curve showing the full data distribution.|
|Band| **Inference** around the mean, either a Bayesian Highest Density Interval (HDI), or a Confidence Interval (CI)|
Table: (\#tab:pirateplotelements) 4 elements of a `pirateplot()`
The two main arguments to `pirateplot()` are `formula` and `data`. In `formula`, you specify plotting variables in the form `y ~ x`, where `y` is the name of the dependent variable, and `x` is the name of the independent variable. In `data`, you specify the name of the dataframe object where the variables are stored.
Let's create a pirateplot of the ChickWeight data. I'll set the dependent variable to `weight`, and the independent variable to `Time` using the argument `formula = weight ~ Time`:
```{r chickpirateplot, fig.width = 8, fig.height = 6}
yarrr::pirateplot(formula = weight ~ Time, # dv is weight, iv is Diet
data = ChickWeight,
main = "Pirateplot of chicken weights",
xlab = "Diet",
ylab = "Weight")
```
### Pirateplot themes
There are many different pirateplot themes, these themes dictate the overall look of the plot. To specify a theme, just use the `theme = x` argument, where `x` is the theme number:
```{r fig.width = 8, fig.height = 8, echo = FALSE}
par(mfrow = c(2, 2))
yarrr::pirateplot(formula = weight ~ Diet, # dv is weight, iv is Diet
data = ChickWeight,
main = "theme = 1",
xlab = "Diet",
ylab = "Weight",
theme = 1)
yarrr::pirateplot(formula = weight ~ Diet, # dv is weight, iv is Diet
data = ChickWeight,
main = "theme = 2",
xlab = "Diet",
ylab = "Weight",
theme = 2)
yarrr::pirateplot(formula = weight ~ Diet, # dv is weight, iv is Diet
data = ChickWeight,
main = "theme = 3",
xlab = "Diet",
ylab = "Weight",
theme = 3)
yarrr::pirateplot(formula = weight ~ Diet, # dv is weight, iv is Diet
data = ChickWeight,
main = "theme = 4",
xlab = "Diet",
ylab = "Weight",
theme = 4)
```
For example, here is a pirateplot height data from the `pirates` dataframe using `theme = 3`. Here, I'll plot pirates' heights as a function of their sex and whether or not they wear a headband. I'll also make the plot all grayscale by using the `pal = "gray"` argument:
```{r}
yarrr::pirateplot(formula = height ~ sex + headband, # DV = height, IV1 = sex, IV2 = headband
data = pirates,
theme = 3,
main = "Pirate Heights",
pal = "gray")
```
### Customizing pirateplots
Regardless of the theme you use, you can always customize the color and opacity of graphical elements. To do this, specify one of the following arguments. Note: Arguments with `.f.` correspond to the *filling* of an element, while `.b.` correspond to the *border* of an element:
```{r echo = FALSE}
pp.elements <- data.frame('element' = c("points", "beans", "bar", "inf", "avg.line"),
'color' = c("point.col, point.bg",
"bean.f.col, bean.b.col",
"bar.f.col, bar.b.col",
"inf.f.col, inf.b.col",
"avg.line.col"
),
"opacity" = c("point.o",
"bean.f.o, bean.b.o",
"bar.f.o, bar.b.o",
"inf.f.o, inf.b.o", "avg.line.o")
)
knitr::kable(pp.elements, caption = "Customising plotting elements")
```
For example, I could create the following pirateplots using `theme = 0` and specifying elements explicitly:
```{r}
pirateplot(formula = weight ~ Time,
data = ChickWeight,
theme = 0,
main = "Fully customized pirateplot",
pal = "southpark", # southpark color palette
bean.f.o = .6, # Bean fill
point.o = .3, # Points
inf.f.o = .7, # Inference fill
inf.b.o = .8, # Inference border
avg.line.o = 1, # Average line
bar.f.o = .5, # Bar
inf.f.col = "white", # Inf fill col
inf.b.col = "black", # Inf border col
avg.line.col = "black", # avg line col
bar.f.col = gray(.8), # bar filling color
point.pch = 21,
point.bg = "white",
point.col = "black",
point.cex = .7)
```
If you don't want to start from scratch, you can also start with a theme, and then make selective adjustments:
```{r}
pirateplot(formula = weight ~ Time,
data = ChickWeight,
main = "Adjusting an existing theme",
theme = 2, # Start with theme 2
inf.f.o = 0, # Turn off inf fill
inf.b.o = 0, # Turn off inf border
point.o = .2, # Turn up points
bar.f.o = .5, # Turn up bars
bean.f.o = .4, # Light bean filling
bean.b.o = .2, # Light bean border
avg.line.o = 0, # Turn off average line
point.col = "black") # Black points
```
Just to drive the point home, as a barplot is a special case of a pirateplot, you can even reduce a pirateplot into a horrible barplot:
```{r}
# Reducing a pirateplot to a (at least colorful) barplot
pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Reducing a pirateplot to a (horrible) barplot",
theme = 0, # Start from scratch
pal = "black",
inf.disp = "line", # Use a line for inference
inf.f.o = 1, # Turn up inference opacity
inf.f.col = "black", # Set inference line color
bar.f.o = .3)
```
There are many additional arguments to `pirateplot()` that you can use to complete customize the look of your plot. To see them all, look at the help menu with `?pirateplot` or look at the vignette at []()
|Element |Argument |Examples |
|:---------------------|:---------------------------|:-------------------------------------------------------|
|Background color |back.col |`back.col = 'gray(.9, .9)'` |
|Gridlines |gl.col, gl.lwd, gl.lty |`gl.col = 'gray', gl.lwd = c(.75, 0), gl.lty = 1` |
|Quantiles |quant, quant.lwd, quant.col |`quant = c(.1, .9), quant.lwd = 1, quant.col = 'black'` |
|Average line |avg.line.fun |`avg.line.fun = median` |
|Inference Calculation |inf.method |`inf.method = 'hdi'`, `inf.method = 'ci'` |
|Inference Display |inf.disp |`inf.disp = 'line'`, `inf.disp = 'bean'`, `inf.disp = 'rect'` |
Table: (\#tab:pirateplotcustomisation) Additional `pirateplot()` customizations.
```{r}
# Additional pirateplot customizations
pirateplot(formula = weight ~ Diet,
data = ChickWeight,
main = "Adding quantile lines and background colors",
theme = 2,
cap.beans = TRUE,
back.col = transparent("blue", .95), # Add light blue background
gl.col = "gray", # Gray gridlines
gl.lwd = c(.75, 0),
inf.f.o = .6, # Turn up inf filling
inf.disp = "bean", # Wrap inference around bean
bean.b.o = .4, # Turn down bean borders
quant = c(.1, .9), # 10th and 90th quantiles
quant.col = "black") # Black quantile lines
```
### Saving output
If you include the `plot = FALSE` argument to a pirateplot, the function will return some values associated with each bean in the plot. In the next chunk, I'll
```{r}
# Create a pirateplot
pirateplot(formula = tattoos ~ sex + headband,
data = pirates)
# Save data from the pirateplot to an object
tattoos.pp <- pirateplot(formula = tattoos ~ sex + headband,
data = pirates,
plot = FALSE)
```
Now I can access the summary and inferential statistics from the plot in the `tattoos.pp` object. The most interesting element is `$summary` which shows summary statistics for each bean (aka, group):
```{r}
# Show me statistics from groups in the pirateplot
tattoos.pp
```
Once you've created a plot with a high-level plotting function, you can add additional elements with *low-level* functions. For example, you can add data points with `points()`, reference lines with `abline()`, text with `text()`, and legends with `legend()`.
## Low-level plotting functions
Low-level plotting functions allow you to add elements, like points, or lines, to an existing plot. Here are the most common low-level plotting functions:
|Function |Outcome |
|:---------------------|:---------------------------|
|`points(x, y)` |Adds points |
|`abline()`, `segments()` |Adds lines or segments |
|`arrows()` |Adds arrows |
|`curve()` |Adds a curve representing a function |
|`rect()`,`polygon()` |Adds a rectangle or arbitrary shape |
|`text()`, `mtext()` |Adds text within the plot, or to plot margins |
|`legend()` |Adds a legend |
|`axis()` |Adds an axis |
Table: (\#tab:lowlevelplotting) Common low-level plotting functions.
### Starting with a blank plot
```{r canvas, fig.cap= "Sometimes it's nice to start with a blank plotting canvas, and then add each element individually with low-level plotting commands", fig.margin = TRUE, echo = FALSE, out.width = "75%", fig.align='center'}
knitr::include_graphics(c("images/canvas.jpg"))
```
Before you start adding elements with low-level plotting functions, it's useful to start with a blank plotting space like the one I have in Figure \@ref(fig:blankplot). To do this, execute the `plot()` function, but use the `type = "n"` argument to tell R that you don't want to plot anything yet. Once you've created a blank plot, you can additional elements with low-level plotting commands.
```{r blankplot, fig.cap="A blank plotting space, ready for additional elements!"}
# Create a blank plotting space
plot(x = 1,
xlab = "X Label",
ylab = "Y Label",
xlim = c(0, 100),
ylim = c(0, 100),
main = "Blank Plotting Canvas",
type = "n")
```
### `points()`
To add new points to an existing plot, use the `points()` function. The `points` function has many similar arguments to the `plot()` function, like `x` (for the x-coordinates), `y` (for the y-coordinates), and parameters like `col` (border color), `cex` (point size), and `pch` (symbol type). To see all of them, look at the help menu with `?points()`.
Let's use `points()` to create a plot with different symbol types for different data. I'll use the pirates dataset and plot the relationship between a pirate's age and the number of tattoos he/she has. I'll create separate points for male and female pirates:
```{r pointsexample, fig.cap="Using points() to add points with different colors"}
# Create a blank plot
plot(x = 1,
type = "n",
xlim = c(100, 225),
ylim = c(30, 110),
pch = 16,
xlab = "Height",
ylab = "Weight",
main = "Adding points to a plot with points()")
# Add coral2 points for male data
points(x = pirates$height[pirates$sex == "male"],
y = pirates$weight[pirates$sex == "male"],
pch = 16,
col = transparent("coral2", trans.val = .8))
# Add steelblue points for female data
points(x = pirates$height[pirates$sex == "female"],
y = pirates$weight[pirates$sex == "female"],
pch = 16,
col = transparent("steelblue3", trans.val = .8))
```
### `abline()`, `segments()`, `grid()`
|Argument |Outcome |
|:---------|:------------------------------------|
|`h, v` | Locations of horizontal and vertical lines (for `abline()` only) |
|`x0, y0, x1, y1` | Starting and ending coordinates of lines (for `segments()` only) |
|`lty` |Line type. 1 = solid, 2 = dashed, 3 = dotted, ...|
|`lwd` |Width of the lines specified by a number. 1 is the default (.2 is very thin, 5 is very thick) |
|`col` |Line color |
Table: (\#tab:linearguments) Arguments to `abline()` and `segments()`
To add straight lines to a plot, use `abline()` or `segments()`. `abline()` will add a line across the entire plot, while `segments()` will add a line with defined starting and end points.
For example, we can add reference lines to a plot with `abline()`. In the following plot, I'll add vertical and horizontal reference lines showing the means of the variables on the x and y axes, for the horizontal line, I'll specify `h = mean(pirates$height)`, for the vertical line, I'll specify `v = mean(pirates$weight)`
```{r}
plot(x = pirates$weight,
y = pirates$height,
xlab = "weight",
ylab = "height",
main = "Adding reference lines with abline",
pch = 16,
col = gray(.5, .2))
# Add horizontal line at mean height
abline(h = mean(pirates$height),
lty = 2) # Dashed line
# Add vertical line at mean weight
abline(v = mean(pirates$weight),
lty = 2) # Dashed line
```
To change the look of your lines, use the `lty` argument, which changes the type of line (see Figure \@ref(fig:ltytypes)), `lwd`, which changes its thickness, and `col` which changes its color
```{r ltytypes, echo = FALSE, fig.cap="Changing line type with the lty argument.", fig.width = 4, fig.height = 4}
par(mar = c(3, 0, 6, 0))
plot(1,
xlim = c(0, 7),
ylim = c(0, 1),
type = "n",
xlab = "lty values",
ylab = "",
xaxt = "n",
yaxt = "n",
bty = "n",
main = "")
abline(v = 1:6,
lty = 1:6,
lwd = 2)
mtext(1:6,
side = 3,
at = 1:6,
cex = 1.5,
line = 1)
mtext("lty = ...",
side = 3,
at = 3.5,
line = 4,
cex = 2)
```
You can also add a regression line (also called a line of best fit) to a scatterplot by entering a regression object created with `lm()` as the main argument to `abline()`:
```{r}
# Add a regression line to a scatterplot
plot(x = pirates$height,
y = pirates$weight,
pch = 16,
col = transparent("purple", .7),
main = "Adding a regression line to a scatterplot()")
# Add the regression line
abline(lm(weight ~ height, data = pirates),
lty = 2)
```
The `segments()` function works very similarly to `abline()` -- however, with the `segments()` function, you specify the beginning and end points of the segments with the arguments `x0`, `y0`, `x1`, and `y1`. In Figure \@ref(fig:segments) I use `segments()` to connect two vectors of data:
```{r segments, fig.cap="Connecting points with segments()."}
# Before and after data
before <- c(2.1, 3.5, 1.8, 4.2, 2.4, 3.9, 2.1, 4.4)
after <- c(7.5, 5.1, 6.9, 3.6, 7.5, 5.2, 6.1, 7.3)
# Create plotting space and before scores
plot(x = rep(1, length(before)),
y = before,
xlim = c(.5, 2.5),
ylim = c(0, 11),
ylab = "Score",
xlab = "Time",
main = "Using segments() to connect points",
xaxt = "n")
# Add after scores
points(x = rep(2, length(after)), y = after)
# Add connections with segments()
segments(x0 = rep(1, length(before)),
y0 = before,
x1 = rep(2, length(after)),
y1 = after,
col = gray(0, .5))
# Add labels
mtext(text = c("Before", "After"),
side = 1, at = c(1, 2), line = 1)
```
The `grid()` function allows you to easily add grid lines to a plot (you can customize your grid lines further with `lty`, `lwd`, and `col` arguments):
```{r}
# Add gridlines to a plot with grid()
plot(pirates$age,
pirates$beard.length,
pch = 16,
col = gray(.1, .2), main = "Add grid lines to a plot with grid()")
# Add gridlines
grid()
```
### `text()`
|Argument |Outcome |
|:---------|:----------------------------------------------------|
|`x`, `y` |Coordinates of the labels|
|`labels` |Labels to be plotted |
|`cex` |Size of the labels |
|`adj` |Horizontal text adjustment. `adj = 0` is left justified, `adj = .5` is centered, and `adj = 1` is right-justified |
|`pos` |Position of the labels relative to the coordinates. `pos = 1`, puts the label below the coordinates, while 2, 3, and 4 put it to the left, top and right of the coordinates respectively |
Table: (\#tab:textarguments) Arguments to `text()`
With `text()`, you can add text to a plot. You can use `text()` to highlight specific points of interest in the plot, or to add information (like a third variable) for every point in a plot. For example, the following code adds the three words "Put", "Text", and "Here" at the coordinates (1, 9), (5, 5), and (9, 1) respectively. See Figure \@ref(fig:puttexthere) for the plot:
```{r puttexthere, fig.cap = "Adding text to a plot with text()"}
plot(1,
xlim = c(0, 10),
ylim = c(0, 10),
type = "n")
text(x = c(1, 5, 9),
y = c(9, 5, 1),
labels = c("Put", "text", "here"))
```
You can do some cool things with `text()`, in Figure \@ref(fig:textlabels) I create a scatterplot of data, and add data labels above each point by including the `pos = 3` argument:
```{r textlabels, fig.cap = "Adding labels to points with text()"}
# Create data vectors
height <- c(156, 175, 160, 172, 159, 165, 178)
weight <- c(65, 74, 69, 72, 66, 75, 75)
id <- c("andrew", "heidi", "becki", "madisen", "david", "vincent", "jack")
# Plot data
plot(x = height,
y = weight,
xlim = c(155, 180),
ylim = c(65, 80),
pch = 16,
col = yarrr::piratepal("xmen"))
# Add id labels
text(x = height,
y = weight,
labels = id,
pos = 3) # Put labels above the points
```
When entering text in the `labels` argument, keep in mind that R will, by default, plot the entire text in one line. However, if you are adding a long text string (like a sentence), you may want to separate the text into separate lines. To do this, add the text `\n` where you want new lines to start. Look at Figure \@ref(fig:manylines) for an example.