forked from dan87134/quobyexample
-
Notifications
You must be signed in to change notification settings - Fork 0
/
QuosuresByExamples.Rmd
422 lines (299 loc) · 17.5 KB
/
QuosuresByExamples.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
---
title: "Quosures By Examples"
author: "Dan Sullivan / danr on community.rstudio.com "
date: "11/2/2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
>First sighting of a quosure in the wild...
>*quo... that isn't a word!*
>1998 Kramer, Sienfeld Season 1, Episode 1
>Scrabble scene
This tutorial is brought to by the rlang functions f_text, enquo, eval_tidy, lang_head, land_tail, expr_interp, and is_lang
##Introduction
This is a tutorial about quosures and the some of the basics of using them. Quosures are objects that are the basis for non-standard evaluation of expressions in the tidyverse.
You may have never heard of quosures before but if you have used packages like dplyr and many others you are using quosures under the covers. It's quosures that make the convienent and compact syntax of a function like dplyr::select possible:
```{r dplyr}
t1 <- tibble::tribble(
~col1, ~col2, ~col3,
1, 2, 3,
4, 5, 6,
7, 8, 9
)
# you just specify the columns you want with literal names
dplyr::select(t1, first=col2, last=col3)
```
`select` is certainly a nicer syntax than something like:
```{r subset}
t1 <- data.frame(
col1 = 1:3,
col2 = 2:4,
col3 = 3:5
)
t2 <- subset(t1, T, select = c(col2, col3))
colnames(t2) <- c("first", "last")
t2
````
This tutorial focuses on the behaviour, *i.e.* the mechanics of how quosures work, and the structure of quosures, not necessarily particular applications of quosures.
However it does include an example of applying quosures to a function that uses non-standard evaluation to build a part number. That, at first glance, might not seem like much but it lets part numbers use their native syntax... you'll see that shortly.
By the end of this tutorial you should be able to think out of the box about some ways to make use of quosures to make function that has a clean and simple syntax for input, similar to the way that dplyr::select does.
But before we dive into the details of quosures we need to get a big picture of why we care about them and non-standard evaluation.
##Big Picture
Most of the time in R expressions are evaluated and replaced with the value that the evaluation produces. This is what R calls standard evaluation. Here is an example of a function takes `exp` as it's sole argument and then just returns it. However it does not return literally what is passed into it... it returns the result of R's standard evaluation of what is passed into it :
```{r standard-evaluation}
a <- 21
b <- 1
f1 <- function(exp){
exp
}
f1(a-b)
```
So calling f1 in this example produces, as you might expect, `20`. But f1 didn't do much to compute that result. It depended on R's standard evaluation of `a - b` and returned the result that produced.
But what if you wanted f1(a-b) to produce the string "21-1", that is the values of `a` and `b` concatonated with a dash? You might want to do this if you wanted a function that produced part numbers in a significant part numbering system.
In a significant part numbering system the part number describes the part it represents as opposed to being the next part number available when the part was added to the system. <https://en.wikipedia.org/wiki/Part_number#Significant_versus_non-significant_part_numbers>
For example the "21" in our made up part number means the part is a hex nut and the "1" means the part class is "expendable". In effect a dash separates the attributes of the part. Of course for a real world part numbering system there would have to be a lot more attributes for a hex nut but we're using just two here to keep the example simple.
So what we want is a `make_part_number` function that works like this:
<div style="display:none">
```{r make-pn, cache=TRUE}
make_part_number <- function(expr) {
"21-1"
}
```
</div>
```{r sig-pn}
part_type <- 21
stock_class <- 1
make_part_number(part_type-stock_class)
```
It produces the string concatonation of values of `part_type` and `stock_class` we were looking for.
But it might seem like it's not possible to write a function like this and not even necessary.
It might seem impossible because R's standard evaluation of arguments, which is what we are all used to, seems to evaluate an argument before the body of the function gets to look at it. However quosures, as we'll see as we go though this tutorial, give you a way to circumvent R's standard evaluation of arguments and work with the literal expression in argument directly.
But it might seem unnecessary to even consider creating a function like this. For example this function could fill the bill:
```{r pn-args}
make_part_number_alt <- function(part_type, stock_class) {
glue::glue("{part_type}-{stock_class}")
}
part_type <- 21
stock_class <- 1
make_part_number_alt(a, b)
```
And as you can see it has no trouble making a part number.
`make_part_number_alt` is an example of a developer centered implementation of a product requirement to have a function that can make a part number. It leverages the developers understanding of the R language but not necessarily the users understanding of part numbers.
The first part number function we looked at, `make_part_number`, is an example of a user centered implementation of a product requirement. It takes into account the users understanding of the syntax of part numbers. For example the documentation for our made up part numbering system has the format of a screw type fastener defined as:
`part type-part class`
... which, again, is just the attributes of the part separated by a dash.
There is a computer science term for what `make_part_number` is doing. `make_part_number`'s argument is an expression from a DSL, or domain specific langauge. All DSL means is that that functions use formats and operations that the user of the functions is familiar with regardless of how difficult the implementation will be for the developer :-)
Now the users of the make_part_number function might be in a purchasing department and just be poking at part data from an R Studio console window. These users work with part numbers all day long, every day, and asking them translate dashes into commas and a little white space is just asking them to make errors.
The user also might be a developer using make_part_number in a package that will be the foundation for a system for tracking the statisics of part usage. Even here having part number arguments that look like the ones in the documentation the developer will be referencing is also a win in terms of minimizing errors.
As a general principle it's better to use a syntax that already exists rather than create a new one. It's better to use a more simple syntax than one that is littered with ceremonial programming text.
Oh.... let's not forget the developer who is going to implement make_part_number. That developer will have a harder task than users will and will have to make use of quosures to create a DSL that reflects the users view of the problem domain. But that is what developers do, spend great effort and time to minimize the effort and time users spend.
And later in this tutorial we'll be that developer and build the make_part_number function for our made up part numbering system.
Now that we had a quick look at the big picture and motivation for quosures let's start digging details.
##Quoting
The concept of quoting in programming refers to taking an expression used in a program and turning into it a string-like object so that it can be directly analyzed in a program rather than being evaluated.
R provides a number of ways to quote an expression. In the tidyverse the function `enquo` is used to quote an expression which is used as an argument to a function. In this tutorial we will focus on `enquo` because we will need it for the `make_part_number` function.
The `enquo` function returns an object called a `quosure`. A `quosure` is a quoted expression.
The rlang package provides a number of functions that let us easily tease apart the pieces of a quosure and we'll see them as we go though this tutorial.
As an example of a simple one of these functions let's make a quosure and use the `f_text` function to get its text value.
```{r quosure1}
# function that makes a quosure
f1 <- function(expr) {
rlang::enquo(expr)
}
# make the quosure
q1 <- f1(a-b-c-d)
# now get as text what was quoted
rlang::f_text(q1)
```
The text value of the quosure is the expression input to `enquo` but prettified a bit by adding whitespace around the dashes.
Note that the best practice for using binary operator like `-` is to put whitespace on both sides of the operator. We didn't do that here on purpose... we want to write a part number exactly in the way the "documentation" for our made up part numbering system says we should.
One of the things you can do with a quosure is evaluate it by using the eval_tidy function from the rlang package. We're using package name prefixes in the examples in this tutorial just so that it is easier to see where functions are coming from.
```{r quosure2}
# assign some values to some varibles
a <- 10
b <- 5
c <- 3
d <- 2
# this is the same function as in the previous example
f1 <- function(expr) {
rlang::enquo(expr)
}
# make the quosure
q1 <- f1(a-b-c-d)
# evaluate the quosure, that is compute the value of a-b-c-d
rlang::eval_tidy(q1)
```
This doesn't seem to be all that interesting. It looks like just a roundabout way to execute `a-b-c-d` in the console like this:
```{r quosure3}
a <- 10
b <- 5
c <- 3
d <- 2
a-b-c-d
```
But it is interesting. We'll need to take a closer look at the structure of a quosure to see why.
It turns out that our `make_part_number` function will not have to evaluate a quoted expession like "a-b-c-d", but it will need to evaluate individually the `a`, `b`, `c` and `d` that make up the part number. We'll have to look how quosures are interpreted in order to see how that can be done, and we'll look at that next.
#Interpretation
Interpretation is the process of taking an expression, like a-b-c-d, and breaking it down in to pieces that can be computed and then putting those computed pieces back together to get the final result.
The preceding definition probably isn't all clear but as we go though this part of the tutorial it will make sense.
First of all you might ask the question "Why do we have to break down anything?". Why can't you just compute `a-b-c-d` and be done with it.
There are a couple of reasons for breaking things down. Some expressions are much more complicated than `a-b-c-d` and there is no reason to think that they could be evaluated in a single gulp. But another reason is an artifact of how almost all modern computers work. In a nutshell a computer has instructions built into to it to compute most binary operations. Here is some psuedo assembly code that does subtraction:
```
sub R2, R1
load R6
```
The first instruction says to subtract R1 from R2 and put the result in the accumlator. R1, R2, R6 and the accumlator are special memory locations, called registers, that are internal to the computer itself.
The second instruction loads the accumlator into R6. So an R expression like:
`
R6 <- R2 - R1
`
... can be easily computed by modern computer hardware.
However there are no assembly instructions like:
`
sub R3, R2, R1
`
So an R expression like:
`
R6 <- R3 - R2 - R1
`
... would have to somehow be broken down into a series of binary operations for it to be evaluated.
It turns out, as luck would have it, that a quosure will do at least part of that breakdown for you.
A quosure can be broken up into a head and a tail. The head contains an operation or a function and the tail is a list of arguments for that operation or function. Let's use some of the functions from `rlang` see how the `quosure` for `a-b-c-d` is broken down.
<div style=display:none>
```{r clr6}
rm(list = ls())
```
</div>
```{r breakdown1}
# here is that function that makes a closure again
f1 <- function(expr) {
rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
# get the operation/function of q1
rlang::lang_head(q1)
# get the arguments of q1
rlang::lang_tail(q1)
```
From the head you can see that it is a `-` operation being used. A `-` operation should have exactly 2 arguments.
And you can see that the tail has exactly 2 argruments in it. Not only does it have exactly two arguments it looks like quosures have pretty good idea of how breakdown an expression!
The first argument is a pruned off version of the `a-b-c-d` and the second is what was pruned off of it.
The second argument `d` is just a solo symbol. We can evaluate that like this:
<div style=display:none>
```{r clr7}
rm(list = ls())
```
</div>
```{r breakdown8}
a <- 10
b <- 5
c <- 3
d <- 2
f1 <- function(expr) {
rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
# arg2 is d
arg2 <- rlang::lang_tail(q1)[[2]]
arg2
# which we can evaluate
rlang::eval_tidy(arg2)
```
... and we get a value of 2 for `d`, which is in fact the value that was assigned to `d`. We did this by using `eval_tidy` on the second entry in the tail...remember the tail is a list.
We could evaluate the first argument if we want to and it would compute `a-b-c`, but for our `make_part_number` function we need the individual values for `a`, `b`, and 'c`. Let's see how to do that.
What we want to do is turn the first argument in the tail of the q1 quosure into a quosure itself so we can use the tail of that quosure to prune off the `c`. To do that we need to interpret that first argument. That's what the `rlang` `expr_interp` function does. Like this:
<div style=display:none>
```{r clr8}
rm(list = ls())
```
</div>
```{r breakdown9}
a <- 10
b <- 5
c <- 3
d <- 2
f1 <- function(expr) {
rlang::enquo(expr)
}
q1 <- f1(a-b-c-d)
arg1 <- rlang::lang_tail(q1)[[1]]
# this is the first argument, that is the first element in the tail
arg1
# now make a quosure out of it
q2 <- rlang::expr_interp(arg1)
# now get the second argument of the tail of the q2 quosure
arg2 <- rlang::lang_tail(q2)[[2]]
# this will show that arg2 is c
arg2
# and evaluate it
rlang::eval_tidy(arg2)
```
And we can see now that `expr_interp` was able to make a quosure out of the first argument of `q1`'s tail. Then we were able to prune off the 'c' from that expression and evaluate it. The result of the evaluation was what we expected, 3, the value that was assigned to c.
With a little recursion we can prune off all of the part number attributes individually and evaluate them. Will see that shortly when we make the `make_part_number` function.
Now we know just enough about quosures to make the `make_part_number` function. It won't be a full fledged production version, just to keep the example simple, but it will work.
##`make_part_number`
Below is the make_part_number function.
The one "trick" that `make_part_number` is using that we haven't looked at yet is that only language objects have a head and tail. The only thing you can do with `quosures` that are not language objects is to evaluate them. The `rlang` function `is_lang` will tell you if a quosure is a language object or not.
Comments are scattered thoughout... exercise for the student to understand how `make_part_number` works. :-)
```{r mpn1}
make_part_number <- function(expr) {
# used for recursion
eval_attrs <- function(q, part_attributes = vector(mode = "character")) {
if(!rlang::is_lang(q)) {
# q is not a language object so just evaluate it
# and add it to the list of part attributes
part_attributes <- c(as.character(rlang::eval_tidy(q)), part_attributes)
# and finish the recursion
return(part_attributes)
}
# if we got to here q is a language object so it has a tail
tail <- rlang::lang_tail(q)
# add the second entry in the tail to the part attributes
part_attributes <-
c(as.character(rlang::eval_tidy(rlang::expr_interp(tail[[2]]))),
part_attributes)
# recurse to find the next part number attribute
eval_attrs(rlang::expr_interp(tail[[1]]), part_attributes)
}
# make a quosure out of expr
q <- rlang::enquo(expr)
# recurse to find all the part number attributes
atrs <- eval_attrs(q)
# concatonate all the part number attributes with "-"
stringr::str_c(atrs, collapse="-")
}
```
Now let's try it out.
```{r mpn2}
a <- 10
b <- 9
c <- 4
d <- 7
make_part_number(a-b-c-d)
```
Well that works as expected... Yippee!!!
Actually it works a bit better than you might think. Here we set an attribute to a string value and another to a computation:
```{r mpn3}
a <- "xxZ23"
b <- 9
c <- b + d
d <- 7
make_part_number(a-b-c-d)
```
... and that works too.
And we can even do some computations in line in the part number itself:
```{r mpn4}
a <- 10
b <- 9
c <- 4
d <- 7
make_part_number(a-b-c-b*c)
```
It may seem to be a surprise that these last couple of examples worked. But it really isn't because we are using the same technique that R uses when is does a standard evaluation of an expression. It walks through the expression using the head and tail of each quosure it finds.
At this point we have seen the very basics of how you can use quosures. There is actually a lot more to look at. One of the things this tutorial glossed over was the role of environments in the evaluation of quosures. You might not be familiar with environments but they are really important for quosures, but that's for later tutorial.
There are other ways to make `quosures` and use them... we've only look at a couple of the functions that R has for working with quosures. `Quosures` can handle much more complicated expressions than the part number we've started with here.
`Quosures` can also be used to analyze an expressions. In fact `make_part_number` doesn't do a very good job of this as is. If you experiment with it a little bit you will see that it's easy to give it a part number that it will misinterpret. But all that is for a later tutorials too.
Hope you found this tutorial informative and engaging,
Dan