forked from rstudio-education/stat545
-
Notifications
You must be signed in to change notification settings - Fork 0
/
39_appendix.Rmd
516 lines (345 loc) · 27.8 KB
/
39_appendix.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
# (APPENDIX) Appendix {-}
# Deprecated {#oldies}
```{r include = FALSE}
source("common.R")
```
```{r echo = FALSE, message = FALSE, warning = FALSE}
library(gt)
library(glue)
library(readr)
library(dplyr)
oldies_urls <- read_csv("admin/deprecated_stat545_urls.csv")
base_url <- "https://STAT545-UBC.github.io/STAT545-UBC.github.io"
oldies_formatted <- oldies_urls %>%
mutate(url = glue("{base_url}/{stat545_path}"),
md_formatted = glue("[{title}]({url})") %>% md()) %>%
select(md_formatted, last_updated)
oldies_formatted %>%
gt() %>%
tab_header(
title = "Deprecated STAT 545 Content"
) %>%
fmt_markdown(
columns = TRUE
) %>%
cols_label(
md_formatted = "Lesson",
last_updated = "Last Updated"
) %>%
cols_align(
align = "left"
)
```
# Short random things {#random}
## Draw the rest of the owl, a pep talk for building off simple examples {#draw-the-owl}
<!--Original content: https://stat545.com/bit007_draw-the-rest-of-the-owl.html-->
```{r echo = FALSE}
set.seed(328)
```
```{r draw-an-owl, echo = FALSE, fig.cap = '"How to draw an owl" from [imgur](http://imgur.com/gallery/RadSf)"', out.width = "50%"}
knitr::include_graphics("img/how-to-draw-an-own-imgur.jpg")
```
Figure \@ref(fig:draw-an-owl) is an image that is often used to illustrate how hard it can be to go from simple examples to the real thing you actually want.
I recently needed to draw a f\*cking owl in R, so I decided to record the process as an experiment.
When I teach [STAT545][stat-545] or [Software Carpentry][software-carpentry], I try to convey as much about *process* as anything else. You can always look up technical details, e.g., syntax, but you don't usually get to see how other people work. This is also how I approach teaching about [writing R functions](#functions-part1). Newcomers often look at finished code and assume it flowed perfectly formed out of someone's fingertips. It probably did not.
My *modus operandi*: start with something that works and add features in small increments, maniacally checking that everything still works. Other people undoubtedly move faster (and, therefore, travel faster but crash harder), but I'm OK with that.
### Context: writing a function factory
I have an R package [`googlesheets`][googlesheets-github] that gets Google Sheets in and out of R. Lately we've had a lot of trouble with `Internal Server Error (HTTP 500)`, which, as you might expect, is an error on the Google server side. All you can do as a user is try, try again. But this is a showstopper for unattended scripts or multi-step operations, like building and checking the package. A single error renders lots of other work moot, which is completely infuriating.
I want to catch these errors and automatically retry the request after an appropriate delay.
The brute force approach would be to literally drop little `for` or `while` loops all over the package, to inspect the response and retry if necessary. But I try to follow the [DRY principle][wiki-dry], so would prefer to write a new "retry-capable" version of the function that makes these http requests.
It also turns out there's more than one function for making these requests. I'm talking about the [HTTP verbs you use with REST APIs](https://www.restapitutorial.com/lessons/httpmethods.html): GET, POST, PATCH, etc. I potentially need to give them all the "retry" treatment. So what I really need is a *function factory*: an HTTP verb goes in and out comes a retry-capable version of the verb.
It turns out you can write R (or S) for ~20 years and not be very facile with this technique. I certainly am not! But I can read, which is how I got the two circles to start my owl.
### Start at the beginning
My reference is the section of Wickham's [Advanced R][adv-r] [-@wickham2015a] that is about [closures][adv-r-closures], "functions written by functions". Here's one of the two main examples: a function that creates an exponentiation function.
```{r}
power <- function(exponent) {
function(x) {
x ^ exponent
}
}
square <- power(2)
square(2)
square(4)
cube <- power(3)
cube(2)
cube(4)
```
I make myself type all this code in and run it. No shortcuts!
What have I learned? I can write a factory, `power()`, that takes some input, `exponent`, and gives me back a function, such as `square()` or `cube()`.
But let's be honest, this is pretty far from what I need to do.
### Can the input be a function?
My problem is different. My input is a *function*, not an exponent like 2 or 3. Can I even do that?
The simplest thing I could think of that sort of looks like my problem:
* The factory takes a function as input.
* It returns a function that executes the input function twice, with whatever inputs that function had in the first place.
```{r}
call_me_twice <- function(f) {
function(...) {
f(...)
f(...)
}
}
```
Now I need an input function to be the guinea pig. It needs to take input and be chatty, so I can tell if it's getting executed. Make sure it works as expected!
```{r}
jfun <- function(x) cat(x, "\n")
jfun("a")
jfun(1)
```
Put it all together.
```{r}
jfunner <- call_me_twice(jfun)
jfunner("a")
jfunner(1)
```
I won't lie, I'm pleasantly surprised this worked. Morale boost.
### Faux VERB
I need a placeholder for the HTTP verbs with these qualities:
* Takes some input.
* Generates a non-deterministic status.
* Returns a list with the input, status, and some content.
```{r}
VERB <- function(url = "URL!")
list(url = url,
status = sample(c(200, 500), size = 1, prob = c(0.6, 0.4)),
content = rnorm(5))
VERB()
VERB()
```
Oh wait, we have functions in R to do something over and over again.
```{r}
replicate(5, VERB())
```
Send `VERB()` off to the function factory.
```{r}
VERB_twice <- call_me_twice(VERB)
replicate(5, VERB_twice())
```
Hmmmm...I only see one output per call of `VERB_twice()`. But why? Is it because `VERB()` is only getting called once? That means I've screwed up. Or is `VERB()` getting called twice but I'm only seeing evidence of the second call?
### A better faux VERB
```{r}
VERB <- function(url = "URL!") {
req <- list(url = url,
status = sample(c(200, 500), size = 1, prob = c(0.6, 0.4)),
content = rnorm(5))
message(req$status)
req
}
replicate(5, VERB())
```
Why is this better? Each call of `VERB()` causes a message AND returns something.
Send new and improved `VERB()` off to the function factory.
```{r}
VERB_twice <- call_me_twice(VERB)
replicate(5, VERB_twice())
```
I like it! What do I like about it?
* 5 calls produce 10 messages, which tells me `VERB()` is getting called twice.
* 5 calls produce 5 outputs, which is good for my eventual goal, where I will only want to return the value of the last call of the enclosed function.
### Retry `n` times ... and a temporary setback
Instead of hard-wiring 2 calls of the enclosed function `f`, let's call it `n` times via a `for` loop.
```{r}
call_me_n <- function(f, n = 3) {
function(...) for (i in seq_len(n)) f(...)
}
```
Let's try my new function factory.
```{r}
VERB_3 <- call_me_n(VERB)
VERB_3()
```
That's disappointing. I see the message, but get no actual output. Is there really no output coming back? Or is it just invisible?
```{r}
x <- VERB_3()
str(x)
```
Nope, there really is no output. Fix that.
```{r}
call_me_n <- function(f, n = 3) {
function(...) {
for (i in seq_len(n)) out <- f(...)
out
}
}
VERB_3 <- call_me_n(VERB)
VERB_3()
```
YAY! Before I move on, let's make sure I can actually set the `n` argument to something other than 3.
```{r}
VERB_4 <- call_me_n(VERB, 4)
VERB_4()
```
### Conditional retries
Almost done!
My real factory needs to use the output of the enclosed HTTP verb to decide whether a retry is sensible. The new name reflects HTTP verb specificity. I now add the actual logic and behavior I need in real life.
```{r}
VERB_n <- function(VERB, n = 3) {
force(VERB)
force(n)
function(...) {
for (i in seq_len(n)) {
out <- VERB(...)
if (out$status < 499 || i == n) break
backoff <- runif(n = 1, min = 0, max = 2 ^ i - 1)
message("HTTP error ", out$status, " on attempt ", i,
" ... retrying after a back off of ", round(backoff, 2),
" seconds.")
Sys.sleep(backoff)
}
out
}
}
```
I send my existing faux `VERB()` off to the new and improved factory. Start providing input again, just to make sure that all still works.
```{r}
VERB_5 <- VERB_n(VERB, n = 5)
VERB_5("Owls can rotate their necks 270 degrees.")
VERB_5("Owls are cute.")
VERB_5("A group of owls is called a Parliament.")
```
And we have drawn some f\*cking owls, with retries!
```{r echo = FALSE, out.width = "50%", fig.cap = "From [BuzzFeed](https://img.buzzfeed.com/buzzfeed-static/static/2014-03/enhanced/webdr05/7/10/enhanced-22024-1394207918-30.jpg)"}
knitr::include_graphics("https://img.buzzfeed.com/buzzfeed-static/static/2014-03/enhanced/webdr05/7/10/enhanced-22024-1394207918-30.jpg")
```
### The final result is not that exciting
Now in real life, I create retry-capable HTTP verbs like so: `rGET <- VERB_n(httr::GET)`. Then just replace all instances of `httr::GET()` with `rGet()`. It's terribly anticlimactic.
The final version of the function factory is about a dozen lines of fairly pedestrian code. I probably wrote and discarded at least 10x that. This is typical, so don't be surprised if this is how it works for you too. Get a working example and take tiny steps to morph it into the thing you need.
*The __results__ of this effort are, however, pretty gratifying. I have had zero build/check failures locally and on Travis, since I implemented retries on `httr::GET()`. Or, to be honest, I've had failures, but for other reasons. So it was totally worth it! I also thank Konrad Rudolph and Kevin Ushey for [straightening me out](https://gist.github.com/jennybc/65c577f98c2bad7e2b3d0ccb773dfaf8) on the need to use `force()` inside the function factory.*
## How to obtain a bunch of GitHub issues or pull requests with R {#gh-package}
[Using dplyr + purrr + tidyr](https://github.com/jennybc/analyze-github-stuff-with-r) to analyze data about GitHub repos via the [gh package][gh-github]
## How to tame XML with nested data frames and purrr {#tame-google-sheets}
[Using dplyr + purrr + tidyr + xml2](https://github.com/jennybc/manipulate-xml-with-purrr-dplyr-tidyr) to tame the annoying XML from Google Sheets
## Make browsing your GitHub repos more rewarding {#github-browsability}
<!--Original content: https://stat545.com/bit006_github-browsability-wins.html-->
*The unreasonable effectiveness of GitHub browsability*
One of my favorite aspects of GitHub is the ability to inspect a repository's files in a browser. Certain practices make browsing more rewarding and can postpone the day when you must create a proper website for a project. Perhaps indefinitely.
### Be savvy about your files
Keep files in the plainest, web-friendliest form that is compatible with your main goals. Plain text is the very best. GitHub offers special handling for certain types of files:
* Markdown files, which may be destined for conversion into, e.g., HTML
* Markdown files named `README.md`
* HTML files, often the result of compiling Markdown files
* Source code, such as `.R` files
* Delimited files, containing data one might bring into R via `read.table()`
* PNG files
### Get over your hang ups re: committing derived products
Let's acknowledge the discomfort some people feel about putting derived products under version control. Specifically, if you've got an R Markdown document `foo.Rmd`, it can be `knit()` to produce the intermediate product `foo.md`, which can be converted to the ultimate output `foo.html`. Which of those files are you "allowed" to put under version control? Source-is-real hardliners will say only `foo.Rmd` but pragmatists know this can be a serious bummer in real life. Just because I *can* rebuild everything from scratch, it doesn't mean I *want* to.
The taboo of keeping derived products under version control originates from compilation of binary executables from source. Software built on a Mac would not work on Windows and so it made sense to keep these binaries out of the holy source code repository. Also, you could assume the people with access to the repository have the full development stack and relish opportunities to use it. None of these arguments really apply to the `foo.Rmd --> foo.md --> foo.html` workflow. We don't have to blindly follow traditions from the compilation domain!
In fact, looking at the diffs for `foo.md` or `foo-figure-01.png` can be extremely informative. This is also true in larger data analytic projects projects after a `make clean; make all` operation. By looking at the diffs in the downstream products, you often catch unexpected changes. This can tip you off to changes in the underlying data and/or the behavior of packages you depend on.
This is a note about cool things GitHub can do with various file types, if they happen to end up in your repo. I won't ask you how they got there.
### Markdown
You will quickly discover that GitHub renders Markdown files very nicely. By clicking on `foo.md`, you'll get a decent preview of `foo.html`. Yay!
Aggressively exploit this handy feature. Make Markdown your default format for narrative text files and use them liberally to embed notes to yourself and others in a repository hosted on GitHub. It's an easy way to get pseudo-webpages inside a project "for free". You may never even compile these files to HTML explicitly; in many cases, the HTML preview offered by GitHub is all you ever need.
What does this mean for R Markdown files? **Keep intermediate Markdown.** Commit both `foo.Rmd` and `foo.md`, even if you choose to `.gitignore` the final `foo.html`. As of [September 2014](https://github.com/github/markup/pull/343), GitHub renders R Markdown files nicely, like Markdown, and with proper syntax highlighting, which is great. But, of course, the code blocks just sit there un-executed, so my advice about keeping intermediate Markdown still holds. You want YAML frontmatter that looks something like [this](https://gist.github.com/jennybc/402761e30b9be8023af9#file-yaml_frontmatter_rmd_keep_md-yml) for `.Rmd`:
```{r include = FALSE}
rinline <- function(code) {
sprintf('`r %s`', code)
}
```
``` yaml
---
title: "Something fascinating"
author: "Jenny Bryan"
date: "`r rinline("format(Sys.Date())")`"
output:
html_document:
keep_md: TRUE
---
```
or like [this](https://gist.github.com/jennybc/402761e30b9be8023af9#file-yaml_frontmatter_r_keep_md-yml) for `.R`:
``` yaml
#' ---
#' title: "Something fascinating"
#' author: "Jenny Bryan"
#' date: "`r rinline("format(Sys.Date())")`"
#' output:
#' html_document:
#' keep_md: TRUE
#' ---
```
In RStudio, when editing `.Rmd`, click on the gear next to "Knit HTML" for YAML authoring help
For a quick, stand-alone document that doesn't fit neatly into a repository or project (yet), make it a [Gist](https://gist.github.com).
__Example:__ Hadley Wickham's [advice on what you need to do to become a data scientist](https://gist.github.com/hadley/820f09ded347c62c2864). Gists can contain multiple files, so you can still provide the R script or R Markdown source __and__ the resulting Markdown, as I've done in this write-up of [Twitter-sourced tips for cross-tabulation](https://gist.github.com/jennybc/04b71bfaaf0f88d9d2eb).
### `README.md`
You probably already know that GitHub renders `README.md` at the top-level of your repo as the *de facto* landing page. This is analogous to what happens when you point a web browser at a directory instead of a specific web page: if there is a file named `index.html`, that's what the server will show you by default. On GitHub, files named `README.md` play exactly this role for directories in your repo.
Implication: for any logical group of files or mini project-within-your-project, create a sub-directory in your repository. And then create a `README.md` file to annotate these files, collect relevant links, etc. Now when you navigate to the sub-directory on GitHub the nicely rendered `README.md` will simply appear.
Some repositories consist solely of `README.md`. __Examples:__ Jeff Leek's write-ups on [How to share data with a statistician](https://github.com/jtleek/datasharing) or [Developing R packages](https://github.com/jtleek/rpackages). I am becoming a bigger fan of `README`-only repos than gists because repo issues trigger notifications, whereas comments on gists do not.
If you've got a directory full of web-friendly figures, such as PNGs, you can use [code like this](https://gist.github.com/jennybc/0239f65633e09df7e5f4) to generate a `README.md` for a quick DIY gallery, as Karl Broman has done with [his FruitSnacks](https://github.com/kbroman/FruitSnacks/blob/master/PhotoGallery.md). I have also used this device to share Keynote slides on GitHub (*mea culpa!*). Export them as PNGs images and throw 'em into a README gallery: slides on [file organization](https://github.com/Reproducible-Science-Curriculum/rr-organization1/tree/27883c8fc4cdd4dcc6a8232f1fe5c726e96708a0/slides/organization-slides) and some on [file naming](https://github.com/Reproducible-Science-Curriculum/rr-organization1/tree/27883c8fc4cdd4dcc6a8232f1fe5c726e96708a0/slides/naming-slides).
### Finding stuff
OK these are pure GitHub tips but if you've made it this far, you're obviously a keener.
* Press `t` to activate [the file finder](https://github.com/blog/793-introducing-the-file-finder) whenever you're in a repo's file and directory view. AWESOME, especially when there are files tucked into lots of subdirectories.
* Press `y` to [get a permanent link](https://help.github.com/articles/getting-permanent-links-to-files/) when you're viewing a specific file. Watch what changes in the URL. This is important if you are about to *link* to a file or [to specific lines](http://stackoverflow.com/questions/23821235/how-to-link-to-specific-line-number-on-github). Otherwise your links will break easily in the future. If the file is deleted or renamed or if lines get inserted or deleted, your links will no longer point to what you intended. Use `y` to get links that include a specific commit in the URL.
### HTML
If you have an HTML file in a GitHub repository, simply visiting the file shows the raw HTML. Here's a nice ugly example:
* <https://github.com/STAT545-UBC/STAT545-UBC.github.io/blob/master/bit003_api-key-env-var.html>
No one wants to look at that. ~~You can provide this URL to [rawgit.com](http://rawgit.com) to serve this HTML more properly and get a decent preview.~~
~~You can form two different types of URLs with [rawgit.com](http://rawgit.com):~~
* ~~For sharing low-traffic, temporary examples or demos with small numbers of people, do this:~~
- ~~<https://rawgit.com/STAT545-UBC/STAT545-UBC.github.io/master/bit003_api-key-env-var.html>~~
- ~~Basically: replace `https://github.com/` with `https://rawgit.com/`~~
* ~~For use on production websites with any amount of traffic, do this:~~
- ~~<https://cdn.rawgit.com/STAT545-UBC/STAT545-UBC.github.io/master/bit003_api-key-env-var.html>~~
- ~~Basically: replace `https://github.com/` with `https://cdn.rawgit.com/`~~
*2018-10-09 update: RawGit [announced](https://rawgit.com/) that it is in a sunset phase and will soon shut down. They recommended: [jsDelivr](https://www.jsdelivr.com/rawgit), [GitHub Pages](https://pages.github.com/), [CodeSandbox](https://codesandbox.io/), and [unpkg](https://unpkg.com/#/) as alternatives.*
You may also want to check out this [Chrome extension](https://chrome.google.com/webstore/detail/github-html-preview/cphnnfjainnhgejcpgboeeakfkgbkfek?hl=en) or [GitHub & BitBucket HTML Preview](https://htmlpreview.github.io).
This sort of enhanced link might be one of the useful things to put in a `README.md` or other Markdown file in the repo.
Sometimes including HTML files will cause GitHub to think that your R repository is HTML. Besides being slightly annoying, this can make it difficult for people to find your work if they are searching specifically for R repos. You can exclude these files or directories from GitHub's language statistics by [adding a .gitattributes file](https://github.com/github/linguist#using-gitattributes) that marks them as 'documentation' rather than code. [See an example here](https://github.com/jennybc/googlesheets/blob/master/.gitattributes).
### Source code
You will notice that GitHub does automatic syntax highlighting for source code. For example, notice the coloring of this [R script](https://github.com/jennybc/ggplot2-tutorial/blob/master/gapminder-ggplot2-stripplot.r). The file's extension is the primary determinant for if/how syntax highlighting will be applied. You can see information on recognized languages, the default extensions and more at [github/linguist](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml). You should be doing it anyway, but let this be another reason to follow convention in your use of file extensions.
Note you can click on "Raw" in this context as well, to get just the plain text and nothing but the plain text.
### Delimited files
GitHub will nicely render tabular data in the form of `.csv` (comma-separated) and `.tsv` (tab-separated) files. You can read more in the [blog post](https://github.com/blog/1601-see-your-csvs) announcing this feature in August 2013 or in [this GitHub help page](https://help.github.com/articles/rendering-csv-and-tsv-data).
Advice: take advantage of this! If something in your repo can be naturally stored as delimited data, by all means, do so. Make the comma or tab your default delimiter and use the file suffixes GitHub is expecting. I have noticed that GitHub is more easily confused than R about things like quoting, so always inspect the GitHub-rendered `.csv` or `.tsv` file in the browser. You may need to do light cleaning to get the automagic rendering to work properly. Think of it as yet another way to learn about imperfections in your data.
Here's an example of a tab delimited file on GitHub: [lotr_clean.tsv](https://github.com/jennybc/lotr/blob/master/lotr_clean.tsv), originally found ~~here~~ (nope, IBM shut down manyeyes July 2015).
Note you can click on "Raw" in this context as well, to get just the plain text and nothing but the plain text.
### PNGs
PNG is the "no brainer" format in which to store figures for the web. But many of us like a vector-based format, such as PDF, for general purpose figures. Bottom line: PNGs will drive you less crazy than PDFs on GitHub. To reduce the aggravation around viewing figures in the browser, make sure to have a PNG version in the repo.
Examples:
* [This PNG figure](https://github.com/jennybc/STAT545A/blob/master/hw06_scaffolds/01_justR/stripplot_wordsByRace_The_Fellowship_Of_The_Ring.png) just shows up in the browser
* A different figure [stored as PDF](https://github.com/jennybc/ggplot2-tutorial/blob/master/gapminder-country-colors.pdf) ~~produces the dreaded, annoying "View Raw" speed bump. You'll have to click through and, on my OS + browser, wait for the PDF to appear in an external PDF viewer.~~ *2015-06-19 update: since I first wrote this GitHub has [elevated its treatment of PDFs](https://github.com/blog/1974-pdf-viewing) so YAY. It's slow but it works.*
Hopefully we are moving towards a world where you can have "web friendly" and "vector" at the same time, without undue headaches. As of [October 2014](https://github.com/blog/1902-svg-viewing-diffing), GitHub provides enhanced viewing and diffing of SVGs. So don't read this advice as discouraging SVGs. Make them! But consider keeping a PNG around as emergency back up for now.
### Linking to a ZIP archive of your repo
The browsability of GitHub makes your work accessible to people who care about your content but who don't (yet) use Git themselves. What if such a person wants all the files? Yes, there is a clickable "Download ZIP" button offered by GitHub. But what if you want a link to include in an email or other document? If you add `/archive/master.zip` *to the end* of the URL for your repo, you construct a link that will download a ZIP archive of your repository. Click here to try this out on a very small repo:
<https://github.com/jennybc/lotr/archive/master.zip>
Go look in your downloads folder!
### Links and embedded figures
* To link to another page in your repo, just use a relative link:
+ `[admin](courseAdmin/)` will link to the `courseAdmin/` directory inside the current directory.
+ `[admin](/courseAdmin/)` will link to the top-level `courseAdmin/` directory from any where in the repo.
* The same idea also works for images. `![](image.png)` will include `image.png` located in the current directory.
### Let people correct you on the internet
They love that!
You can create a link that takes people directly to an editing interface in the browser. Behind the scenes, assuming the clicker is signed into GitHub but is not you, this will create a fork in their account and send you a pull request. When I click the link below, I am able to actually commit directly to `master` for this repo.
[CLICK HERE to suggest an edit to this page!](https://github.com/rstudio-education/stat545/edit/master/39_appendix.Rmd)
Here's what that link looks like in the Markdown source:
```
[CLICK HERE to suggest an edit to this page!](https://github.com/rstudio-education/stat545/edit/master/39_appendix.Rmd)
```
and here it is with placeholders:
```
[INVITATION TO EDIT](<URL to your repo>/edit/master/<path to your md file>)
```
AFAIK, to do that in a slick automatic way across an entire repo/site, you need to be using Jekyll or some other automated system. But you could easily handcode such links on a small scale.
## How to send a bunch of emails from R {#email-in-r}
[Workflow](https://github.com/jennybc/send-email-with-r) for sending email with R and [`gmailr`](https://CRAN.R-project.org/package=gmailr).
## Store an API key as an environment variable {#store-api-key}
<!--Original content: https://stat545.com/bit003_api-key-env-var.html-->
This can be found [here](https://happygitwithr.com/credential-caching.html).
## Data Carpentry lesson on tidy data {#data-carp-tidy-data}
<!--Original content: https://stat545.com/bit002_tidying-lotr-data.html-->
*A lesson I contributed to [Data Carpentry](https://software-carpentry.org/blog/2014/05/our-first-data-carpentry-workshop.html) on tidying data.*
This is a lesson on tidying data. Specifically, what to do when a conceptual variable is spread out over 2 or more variables in a data frame.
Data used: words spoken by characters of different races and gender in the Lord of the Rings movie trilogy
* [Directory of this lesson](https://github.com/datacarpentry/datacarpentry/tree/master/lessons/tidy-data) in the Data Carpentry GitHub repo.
* [01-intro](https://github.com/datacarpentry/datacarpentry/blob/master/lessons/tidy-data/01-intro.md) shows untidy and tidy data. Then we demonstrate how tidy data is more useful for analysis and visualization. Includes references, resources, and exercises.
* [02-tidy](https://github.com/datacarpentry/datacarpentry/blob/master/lessons/tidy-data/02-tidy.md) shows __how__ to tidy data, using `gather()` from the tidyr package. Includes references, resources, and exercises.
* [03-tidy-bonus-content](https://github.com/datacarpentry/datacarpentry/blob/master/lessons/tidy-data/03-tidy-bonus-content.md) is not part of the lesson but may be useful as learners try to apply the principles of tidy data in more general settings. Includes links to packages used.
Learner-facing dependencies:
* Files in the `tidy-data/` sub-directory of the Data Carpentry `data/` directory.
* tidyr package (only true dependency).
* ggplot2 is used for illustration but is not mission critical.
* dplyr and reshape2 are used in the bonus content.
Instructor dependencies:
* curl if you execute the code to grab the Lord of the Rings data used in examples from GitHub. Note that the files are also included in the `datacarpentry/data/tidy-data/` directory, so data download is avoidable.
* rmarkdown, knitr, and xtable if you want to compile the `Rmd` to `md` and `html`.
```{r links, child="links.md"}
```