-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathprogramming.Rmd
976 lines (707 loc) · 31.9 KB
/
programming.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
---
title: "Programming with dplyr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Programming with dplyr}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
```{r setup, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library("dplyr")
library("purrr")
library("rlang")
set.seed(1014)
```
<!-- to hide comments change "contents" to "none" -->
<style>
.r-comment {
color: blue;
display:contents;
background-color:#FCF3CF;
}
</style>
<div class=r-comment>
<p>Comments are in a blue font and in the .rmd file are put inside of div tags like so:
<pre>
<div class=r-comment>
a comment
</div>
</pre>
<p> To turn off the comments just look for the <style> tag at beginning
of .rmd file and change display:contents; to display:none;
<p> To find comments in .rmd file search for <div class=r-comment>
<p> Note everything in here is in IMHO, of course :-)
</div>
<div class=r-comment>
<h2>General Comment so far</h2>
<p>So far I think this paper will be a lot of help to R users that are pretty good at R
already and have some background in programming.
<p>But I think it leaves many R users who want to start using
tidy eval with a lot of questions because many of the points covered imply more R
and programming knowledge than typical R users have... even those making packages.
<p>I comment as I read though the paper... and when I comment I just write down
whatever comes into my head...
because I think a reader may have similar questions pop into their head.
<p>Everything in here is just IMHO, of course :-)
</div>
<div class=r-comment>
<p>General comment
<p>The thing I have found about writing papers/courses is that it is hard to put yourself in a frame of mind of the reader who knows much less about the topic than you.
<p>So with that in mind this is what I try to do, besides checking for technical correctness.
<ul>
<li>point out forward references that will be cleared up later</li>
<li>no undefined terms, even if there is only a quick phrase that points out the details are not necessary. A sees an undefined term will sometimes think they missed something and try to "look it up" and be frustrated when the can't find it. The should be able to get all the basics you are trying to cover from just this paper.</li>
</ul>
<p>"quoting function" is not a standard R term, is it? If a reader searched for R quoting functions they would find no details about them. You should make that clear to reader. It would be better to use a standard R term for this.
<p>Isn't a dplyr verb the same as a dplyr function? For the reader to understand what you are talking about they would have to understand grammars and have read the doc's about how dplyr is a grammer for data access/manipulation. That's a lot to ask and it really isn't a level of knowledge needed to make programs with dplyr.
<p>But since *verb* is standard R term that would be better to use that than quoting function.
<p>BTW when I see the term "quoting functions" I thing of quote(), rlang::quo() and so on. So as is I think quoting functions is a confusing description. Quoting functions quote expressions.
<p>Also I don't think the intro explains why you would want to program with dplyr.
</div>
Programming with dplyr is about making functions that are easier to use. dplyr::select is an example of a kind of function you might want to make because the syntax of calling it is just easier to digest and understand, once you get used to it.
```{r select}
t <- tibble::tribble(
~a, ~b,
1, 2
)
# just literal table and column names are needed
select(t, size = a)
```
You can pick off and even change the names of columns from a data.frame without having to use filters or even quotation marks.
But programming with dplyr requires a bit of special knowledge because most dplyr
functions are not like the *regular* functions you are used to writing. In fact to distinguish them from regular functions we call them verbs.
The thing that distinguishes verbs from regular functions is in the way verbs treat their arguments. Unforunately R documentation doesn't tell you if a function is a verb or regular function. But that distinction doesn't matter except when you are using them to do dplyr programming.
<div class=r-comment>
<p>Isn't "tidy eval" the term typically used to describe how evaluation of quotes is done?.
<p>Changed "tidy evaluation" to "tidy eval"
</div>
In this vignette you will learn about verbs, what challenges they pose for
programming, and how **tidy eval** solves those problems.
## Introduction
<div class=r-comment>
quoting function changed to verb
</div>
### Regular functions versus verbs
<div class=r-comment>
<p>quoting functions changed to verbs
<p>You don't specify what you mean by a value. There are lots of abstractions and
in tidy eval so it pays to me specific.
<p>Changed language so catagory is not used twice in sentence.
<p>I don't think the examples in this section are apropos to the topic. The core
of the topic is how regular functions see their arguments vs. how verbs do. The
examples show how to quote expressions work not the differences between regular
functions and verbs.
<p>I understand using `quote()` to introduce quoting functions because it is the
least complicted of the quoting functions. However it is not really useful for
quoting arguments and neither is `enquote()` which is the focus of the differences
between regular functions and verbs.
<p>And as an aside will quote and enquote really be that useful when programming dplyr?
<p>I think your plan is to show how quote works than compare and contrast it to quo and enquo
latter in the paper. (I'm making comments as I read the paper... I think that does a better
job of reflecting what might be in a reader mind as they read the paper). But I think it is
bad to introduce concepts/functions that in the end will not be necessary for the topic
being taught.
```{r q1}
f1 <- function(arg1) { print(quote(arg1))}
f2 <- function(arg1) { print(enquote(arg1))}
f3 <- function(arg1) { print(rlang::enquo(arg1))}
a <- 1
b <- 2
f1(a + b)
f2(a + b)
f3(a + b)
```
<p>Only f3 produces an unevaluted quote of what is passed into arg1.
<p>So I think this introduction has to introduce rlang::enquo because that is was is
going to be at the heart of using quotes.
<p>I redid the section to reflect this. Notice there is no need to go into any details about
`enquo` just so the results it produces.
I'm glad I went through this... before I did I had missed some of the important details of quote, enquote and enquo.
</div>
R functions can be broadly categorised as regular functions or
verbs. They differ in the way they see their arguments.
<div class=r-comment>
<p>"see" better description than "treat"
</div>
Regular functions only see the value produced after R uses
standard evaluation to compute it's result. So, for example this function:
`sum(a * 2, a + 8)`
only sees the result of computing `a * 2` and `a + 8`, which is number. The function
has no idea of what kind of expression was passed as an argument.
<div class-r-comment>
good to include link to terms that the reader might want to look up... it just makes
things for convienent for the reader.
</div>
We can see that in the following examples. `identical()` is an R function, not a verb, from the base package. It tests to see if two expressions are the same by value. Because `identical` is a regular function it uses standard evaluation of it's arguments before it compares them.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/identical.html
```{r identical1}
a <- 2
b <- 3
identical(c(a,3), c(2, b))
```
In this example R's standard evaluation is done on the two arguments before they are compared. But after evaluation both arguments are the same, `c(2,3)`, so `identical` returns TRUE.
It would be interesting to see a function similar to `identitical` that is a verb. No such function exists so we will have to make one. In order to do this we will need a function from the rlang package named `enquo`. http://rlang.tidyverse.org/reference/quosure.html
`enquo` is one of a number of quoting functions provided by R. It comes from the
rlang package and it is used specifically to quote an argument of a function. Quoting
in programming means taking an expression and turning into an object this is similar to a string.
This is our identicalv verb...meaning is works like `identical` but is does not let R apply it's
standard evaluation to an argument
```{r identicalv}
identicalv <- function(arg1, arg2) {
identical(rlang::enquo(arg1), rlang::enquo(arg2))
}
a <- 2
b <- 3
identicalv(c(a,3), c(2,b))
```
Putting the same arguments into `identicalv` as we did `identical` returns a false because it sees
arg1 and arg2 as being different `strings`. We can change identical` a little bit to see why this is.
```{r identicalv2}
identicalv <- function(arg1, arg2) {
# print out what rlang::enquo produces as a string
print(as.character(rlang::enquo(arg1)))
print(as.character(rlang::enquo(arg2)))
identical(rlang::enquo(arg1), rlang::enquo(arg2))
}
a <- 2
b <- 3
identicalv(c(a,3), c(2,b))
```
As you an see the object produced by `rlang::enquo` appears to be a character
vector. However this object is much more than a character vector and we'll see
that later in this paper.
[^1]: The difference between behavior of `identical` and `identicalv` is why regular functions are said to use standard evaluation unlike
verbs which use non-standard evaluation (NSE).
<div class=r-comment>
Commenting done to here
</div>
### Changing the context of evaluation
A quoted expression can be **evaluated** using the function `eval()`. Let's
quote an expression that represents the subset of lowercase letters from 1 to 5,
and evaluate this:
```{r}
x <- quote(letters[1:5])
x
eval(x)
```
Of course this is not very impressive, you could just type the expression
normally to get this value. But one of R's most important feature is that you
can change the context of evaluation to obtain different results. A context,
also called **environment**, is basically a set that links symbols to
values. The namespaces of packages are such context. For instance, in the
context of the base namespace, the symbol `letters` is given the value of a
character vector of lowercase letters. However it could mean something different
in another context. We could create a context where `letters` represent the
uppercase letters in reverse order! Evaluating a quoted expression in such a
context could return a completely different result:
```{r}
context <- list(letters = rev(LETTERS))
x
eval(x, context)
```
Interestingly, data frames can be used as evaluation contexts. In a data frame
context, the column names represent vectors so that you can refer to those
columns in an expression:
```{r}
data1 <- tibble(mass = c(70, 80, 90), height = 1.6, 1.7, 1.8)
data2 <- tibble(mass = c(75, 85, 95), height = 1.5, 1.7, 1.9)
bmi_expr <- quote(mass / height^2)
eval(bmi_expr, data1)
eval(bmi_expr, data2)
```
In the last snippet we are creating an expression with `quote()` and we evaluate
it manually with `eval()`. However quoting functions typically perform the
quoting and the evaluation for you behind the scene:
```{r}
with(data1, mass / height^2)
with(data2, mass / height^2)
```
For this reason quoting functions usually take a data frame as input in addition
to user expressions so they can be evaluated in the context of the data. This is
a powerful feature that gives R its identity as a data-oriented programming
language. Quoting functions are everywhere in R:
* `with(data, expr)` evaluates `expr` in the context of `data`.
* `lm(formula, data)` creates a design matrix with predictors evaluated in the
context of `data`.
* `mutate(data, new = expr)` creates a `new` column from an expression evaluated
in the context of `data`.
* `ggplot(data, aes(expr))` defines the `x` aesthetic as the value of `expr`
evaluated in the context of `data`.
In the context of the dplyr interface, quoting the arguments has two benefits:
* Operations on data frames can be expressed succinctly because
you don't need to repeat the name of the data frame. For example,
you can write `filter(df, x == 1, y == 2, z == 3)` instead of
`df[df$x == 1 & df$y ==2 & df$z == 3, ]`.
* dplyr can choose to compute results in a different way to base R.
This is important for database backends because dplyr itself doesn't
do any work, but instead generates the SQL that tells the database
what to do.
Unfortunately the benefits of quoting functions do not come for free. While they
simplify direct inputs, they make it harder to program the inputs. Quoting works
for you when *you* use dplyr but works against you when *your functions* use
dplyr.
### Varying quoted inputs
The issue of referential transparency to do with the difficulty of passing
contextual variables in order to vary the inputs of quoting functions. When you
pass variables to quoting functions they get quoted along with the rest of the
expression.
To see the problem more clearly, let's define a simple quoting function [^2]
that pastes its inputs as a string:
```{r}
cement <- function(..., .sep = " ") {
strings <- map(exprs(...), as_string)
paste(strings, collapse = .sep)
}
```
Compared to the regular function `paste()`, the quoting function `cement()`
saves a bit of typing because it performs the string-quoting automatically:
```{r}
paste("it", "is", "rainy")
cement(it, is, rainy)
```
Now what if we wanted to store the weather adjective in a variable? `paste()`
has no issue on that front because it gets the value of the argument rather than
its expression. On the other hand if we pass a variable to `cement()`, it would
be quoted just like the other inputs and `cement()` would never get to see its
contents:
```{r}
x <- "shiny"
paste("it", "is", x)
cement(it, is, x)
```
The solution to this problem is a special syntax that signals the quoting
function that part of the argument is to be unquoted, i.e., evaluated right
away. The ability to mix quoting and evaluation is called **quasiquotation** and
is the main tidy eval feature.
[^2]: As we will see later on `exprs()` captures the expressions of its inputs.
Passing `...` to `exprs()` returns a list of quoted arguments forwarded
through dots.
## Quasiquotation
> Put simply, quasi-quotation enables one to introduce symbols that stand for
> a linguistic expression in a given instance and are used as that linguistic
> expression in a different instance.
--- [Willard van Orman Quine](https://en.wikipedia.org/wiki/Quasi-quotation)
As we have seen, automatic quoting makes R and dplyr very convenient for
interactive use but makes it difficult to refer to variable inputs. The solution
to this problem is __quasiquotation__, which allows you to evaluate directly
inside an expression that is otherwise quoted. Quasiquotation was coined by
Willard van Orman Quine in the 1940s, and was adopted for programming by the
LISP community in the 1970s. Quasiquotation is available (or will soon be) in
all quoting functions of the tidyverse thanks to the tidy evaluation framework.
### The bang! bang! operator
The tidy eval syntax for unquoting is `!!`. Anything supplied to to this
operator is evaluated right away and the result is substituted in place. Let's
see `!!` in action in our `cement()` function:
```{r}
x <- "shiny"
cement(it, is, !! x)
```
Even though the arguments are quoted, `!! x` signals that `x` should be
evaluated right away. From `cement()` perspective, it's as if the user had typed
`"shiny"` instead of `!! x`.
We have seen above that the fundamental quoting function in base R is `quote()`.
In the tidyverse, it is `expr()`. All it does is to quote its argument with
quasiquotation support and returns it right away:
```{r}
expr(x)
expr(!! x)
```
`expr()` is especially useful for debugging quasiquotation. You can wrap it
around any expression in which you use `!!` to examine the effect of unquoting.
Let's try it with `cement()`:
```{r}
expr(cement(it, is, !! x))
```
This technique is essential to work your way around to mastering tidy eval.
### Creating symbols
Now that we are armed with quasiquotation, let's try to program with the dplyr
verb `mutate()`. We'll take a BMI computation as running example.
```{r}
# Rescale height
starwars <- mutate(starwars, height = height / 100)
transmute(starwars, bmi = mass / height^2)
```
Let's say we want to vary the height input. A first intuition might be to store
the column name in a variable and unquote it. But we get an error:
```{r, error = TRUE}
x <- "height"
transmute(starwars, bmi = mass / (!! x)^2)
```
The error message indicates a type error. A binary operator expected a numeric
input but got something else. The error becomes clear if we use `expr()` to
debug the unquoting:
```{r}
expr(transmute(starwars, bmi = mass / (!! x)^2))
```
We are unquoting a string and that's exactly what `transmute()` uses to evaluate
the BMI. This can't work! We need to unquote something that looks like code
instead of a string. What we are looking for is a **symbol**. A symbol is a
string that references an object in a context. Symbols are the meat of R
code. In `foo(bar)`, `foo` is a symbol that references a function and `bar` is a
symbol that references some object.
There are two ways of creating symbolic R code objects: by quotation or by
construction. We already know how to create symbols by quoting. However that
does not help us much because we face the same issue again, namely that the
quoted symbol is a constant that can't be varied:
```{r}
quote(height)
expr(height)
```
The other way is to build it out of a string using the constructor `sym()`.
Constructors are regular functions and can be programmed with variables:
```{r}
sym("height")
x <- "height"
sym(x)
```
Let's build a symbol and try to unquote it in the transmute expression. Using
`expr()` to examine the effect of unquoting, things are looking good:
```{r}
x <- sym("height")
expr(transmute(starwars, bmi = mass / (!! x)^2))
```
And indeed it now works!
```{r}
transmute(starwars, bmi = mass / (!! x)^2)
```
## Creating a wrapper around a dplyr pipeline
Quasiquotation is all we need to write our first wrapper function around a dplyr
pipeline. The goal is to write reliable functions that reduce duplication in
our data analysis code. Let's say that we often take a grouped average using
dplyr and our scripts are littered with little pipelines that look like this:
```{r}
starwars %>%
group_by(species) %>%
summarise(avg = mean(height))
```
It would be a good idea to extract this logic into a function. It would reduce
the risk of writing a typo and would make our code more concise as well as
clearer if we choose a good name for this function.
We know from the previous sections that this kind of naive wrapper will not
work because the variable names will be automatically quoted:
```{r, error = TRUE}
mean_by <- function(data, var, group) {
data %>%
group_by(group) %>%
summarise(avg = mean(var))
}
mean_by(starwars, "species", "height")
```
* In the best case the column names they contain will be ignored. For instance `group_by()`
looks for a column named `group` and doesn't see the string `"species"`.
* In the worst case they will be misused. For instance `summarise()` would try
to take the average of the string `"height"`.
To avoid this, our wrapper simply needs to construct symbols from its inputs and
unquote them in the pipeline:
```{r}
mean_by <- function(data, var, group) {
var <- sym(var)
group <- sym(group)
data %>%
group_by(!! group) %>%
summarise(avg = mean(!! var))
}
mean_by(starwars, "height", "species")
mean_by(starwars, "mass", "eye_color")
```
### Creating your own quoting functions
The wrapper that we just created is a regular function that takes strings and
doesn't quote any of its inputs. This has the advantage that it is easy to
program with but the inconvenient that it doesn't integrate well with the rest
of the tidyverse verbs. Fortunately it is easy to transform the wrapper into a
quoting function.
First we need to choose which of our wrapper arguments should be quoted. Given
the friction that quotation causes for programming, it is best to only quote
arguments when absolutely necessary, i.e. when it makes sense to refer to data
frame columns. In dplyr, the argument that takes a data frame (which is always
the first argument in order to be compatible with pipes) is never quoted. We'll
apply the same logic to our wrapper and only quote the `group` and `var`
arguments.
Tidy eval provides two functions to quote an argument supplied by the caller of
a function. Both of those enable quasiquotation:
* `enexpr()` which returns a raw expression.
* `enquo()` which returns an expression wrapped in a **quosure**.
Let's first try `enexpr()` in a simple function that does nothing but capture
its argument and return it right away:
```{r}
quoting <- function(x) enexpr(x)
x <- sym("foo")
quoting(x)
quoting(!! x)
```
We have in fact just reinvented `expr()`! Indeed `expr()` is a simple wrapper
around `enexpr()`:
<div class=r-comment>
<p> dplyr::expr was caused error because expr is in base, not dplyr
<p>removed dplyr from dplyr::expr
</div>
```{r}
expr
```
In the same vein, `quo()` is a wrapper around `enquo()`. All it does is to
capture the expression of its argument, store it in a quosure, and return it as
is:
```{r}
dplyr::quo
quo(x)
quo(!! x)
```
A quosure is like a raw expression except that it is evaluated in the original
context of its capture. It combines an expression (a quote) and a context (an
enclosure) in a single object. We'll see below why it is important to keep track
of the original context of arguments. For now, let's just use it in our pipeline
wrapper to transform it into a quoting function. As a reminder here is the
current definition of our function:
```{r}
mean_by <- function(data, var, group) {
var <- sym(var)
group <- sym(group)
data %>%
group_by(!! group) %>%
summarise(avg = mean(!! var))
}
```
All we need to do is to replace the `sym()` constructor by `enquo()`:
```{r}
mean_by <- function(data, var, group) {
var <- enquo(var)
group <- enquo(group)
data %>%
group_by(!! group) %>%
summarise(avg = mean(!! var))
}
```
The wrapper now automatically quotes its arguments. This has several
implications:
* First the user no longer has to supply quoted strings:
```{r}
mean_by(starwars, height, species)
```
* Secondly, while `sym()` assumed that the supplied arguments were symbols,
`enquo()` captures arbitrary expressions. This is a good fit for our wrapper
because both `group_by()` and `summarise()` accept complex expressions:
```{r}
mean_by(starwars, height * 100, as.factor(species))
```
* Since our function now quotes its arguments, it is no longer programmable in
the usual way. If another function passes variables to `mean_by()`, it needs
to use quasiquotation itself. A typical composition of quoting functions
thus looks like a chain of quoted and unquoted arguments:
```{r}
mean_by_species <- function(data, var) {
var <- enquo(var)
mean_by(data, !! var, species)
}
mean_by_species(starwars, height)
```
Thanks to `enquo()` we now have a wrapper function that quotes its inputs and
interacts with dplyr verbs via quasiquotation. It is getting pretty close to a
real tidyverse-like user interface! However we could still improve a few things,
like the automatic labelling of column names which could be better. It would
also be nice if the wrapper could accept a variable number of arguments like
other dplyr or tidyr verbs. We'll address the latter issue first.
### Accepting multiple arguments
Whether our wrapper should take multiple grouping variables or multiple
variables to average is a design decision that could go either way depending on
your needs. In this tutorial we'll allow multiple grouping variables.
It is relatively easy to write R functions that accept an unspecified number of
arguments. The function just takes `...` as argument. In the body of the
function `...` are then forwarded to another variadic function that is in charge
of materialising the arguments. The end point is typically the list function:
```{r}
variadic <- function(...) list(...)
variadic("foo", "bar")
```
Passing on arguments through dots to quoting functions is very easy. Unlike
named arguments which need to be repeatedly quoted and unquoted, the `...`
object can just be passed along:
```{r}
mean_by <- function(data, var, ...) {
var <- enquo(var)
data %>%
group_by(...) %>%
summarise(avg = mean(!! var))
}
```
Your users can now create grouped averages for any combination of groups!
```{r}
mean_by(starwars, height, species, eye_color)
```
You can learn about more advanced ways of dealing with multiple arguments with
`exprs()`, `quos()` and `syms()` in the section on variadic quasiquotation
below.
### Labelling inputs
dplyr functions try their best to provide useful column names for new columns.
This is an area where our wrapper could use some improvement:
```{r}
names(mean_by(starwars, height, as.factor(species)))
```
First note that the issue is in fact already solved for the grouping
variables. That's a benefit from taking arguments with `...`, they accept
optional names:
```{r}
mean_by(starwars, height, species_fct = as.factor(species))
```
However for named arguments we need to do a bit more work. We'll make use of two
tidy eval features:
* `quo_name()` which is a helper that transforms an arbitrary expression
(including quosures) to a name that is suitable for data frames:
```{r}
wrapper <- function(x) {
x <- enquo(x)
quo_name(x)
}
wrapper(foo)
wrapper(foo(bar, baz()))
```
* The `:=` operator. It makes it possible to unquote on the left-hand side of
an argument. Since the LHS of `=` is automatically quoted, it makes sense to
have quasiquotation for argument names:
```{r}
x <- "Column Name"
summarise(starwars, !! x := n())
```
We can give a nice default name to the column of averages by transforming the
captured variable to a name and pasting a prefix at its front:
```{r}
mean_by <- function(data, var, ...) {
var <- enquo(var)
name <- quo_name(var)
name <- paste0("avg_", name)
data %>%
group_by(...) %>%
summarise(!! name := mean(!! var))
}
```
We get a good name that reflects the user input, even when the argument is a
complex expression:
```{r}
mean_by(starwars, height, species)
mean_by(starwars, identity(height), species)
```
Overall the most flexible interface is `...` since they let the user specify
custom names. But what if we want to add a prefix to the grouping variables as
well? Then we can't just pass the `...` variable down to `group_by()`, we have
to capture all the variables in the dots and modify their names before passing
them on. This calls for more advanced means of working with multiple arguments.
### Capturing and modifying arguments in `...`
Up until now, we have captured *named arguments* with `enquo()`, we have
forwarded variadic arguments by passing `...` to tidy eval functions, but we
have yet to actually capture those arguments contained in `...`. Getting a hold
on the expressions supplied as `...` arguments is necessary in order to make
modifications such as changing the argument names.
As we have seen arguments transiting through dots need to be materialised with
endpoint functions such as `c()` or `list()`. Tidy eval provides two variadic
endpoints for dots: `exprs()` and `quos()`. These functions quote all of their
inputs and return them in a list of expressions or quosures:
```{r}
exprs(foo, bar)
quos(baz, bam)
```
Thanks to the magic of `...` forwarding, `exprs()` and `quos()` will capture all
arguments passed through dots:
```{r}
quoting <- function(...) {
exprs(foo, ...)
}
quoting(bar(baz))
```
We'll first experiment with a simple `group_by()` wrapper before applying our
new knowledge to the `mean_by()` wrapper. This wrapper will prefix all grouping
variables with `grp_`. To achieve this there are two problems to solve:
modifying the names, and forwarding the list of captured arguments to
`group_by()` once we are done changing the names. Let's start this function with
a bare skeleton. It will take a data frame, a prefix for the group names, and an
undefined number of grouping arguments:
```{r}
prefixed_group_by <- function(data, prefix, ...) {
groups <- quos(...)
groups
}
groups <- prefixed_group_by(starwars, "grp_", as.factor(species), color = eye_color)
groups
names(groups)
```
We have supplied two arguments as grouping variable. The first is an unnamed
complex expression, the second is a named symbol. The first thing to do is to
give a default name to arguments. One way to obtain default names would be to
map `quo_name()` over the relevant elements but there is an easier way. `quos()`
will do it for you if you switch on the `.named` argument:
```{r}
quos <- quos(foo(bar), baz = foo(), .named = TRUE)
names(quos)
```
We are now in a good position for adding a prefix to the names of captured
arguments:
```{r}
prefixed_group_by <- function(data, prefix, ...) {
groups <- quos(..., .named = TRUE)
names(groups) <- paste0(prefix, names(groups))
groups
}
groups <- prefixed_group_by(starwars, "grp_", as.factor(species), color = eye_color)
names(groups)
```
Alright! We only have one last problem to solve. We need a way to forward this
list of arguments to `group_by()`. Unquoting the list with `!!` is not helpful
here because `group_by()` expects separate arguments and wouldn't know what to
do with a whole list. This leads us to `!!!`, one of the most handy features of
tidy eval.
### Unquote-splicing arguments with `!!!`
The __unquote-splicing__ operator `!!!` is a variant of simple unquoting. Just
like `!!`, it evaluates its right-hand side right away. The difference is in the
way it substitutes the result in the surrounding call:
* `!!` substitutes in place:
```{r}
expr(call(!! 1:5))
```
* `!!!` takes a vector and substitutes all its elements in the call:
```{r}
expr(call(!!! 1:5))
```
This is exactly what we need to forward a list of captured arguments to
`group_by()`!
```{r}
prefixed_group_by <- function(data, prefix, ...) {
groups <- quos(..., .named = TRUE)
names(groups) <- paste0(prefix, names(groups))
group_by(data, !!! groups)
}
prefixed_group_by(starwars, "grp_", as.factor(species), color = eye_color)
```
Modifying `mean_by()` to automatically prefix the grouping factors is now
child's play:
```{r}
mean_by <- function(data, var, ...) {
var <- enquo(var)
name <- quo_name(var)
name <- paste0("avg_", name)
data %>%
prefixed_group_by("grp_", ...) %>%
summarise(!! name := mean(!! var))
}
mean_by(starwars, height, species, eye = eye_color)
```
### Wrapping it up
In order to write our little wrapper, we have learned to:
* Quote R code with `quote()` and `expr()` and construct symbols with `sym()`.
* Capture named arguments with `enquo()` and `...` arguments with `quos()`.
* Unquote single arguments with `!!` and multiple arguments with `!!!`.
* Use `:=` to enable `!!` on the left-hand side of a named argument.
* Debug the unquoting by wrapping `expr()` around an expression.
* Use `quo_name()` and `quos(.named = TRUE)` to provide default names to
captured arguments.
This set of techniques will get you a long way as quasiquotation is really the
meat of programming with tidy eval. `enquo()` and `quos()` return quosures that
are more reliable than bare expressions but you don't have to understand how
quosures work or why they are needed to effectively use tidy eval.
When you feel ready, you can learn about the concept of quosures. It will
improve your understanding of R programming and you will gain knowledge that can
be applied to R functions. Quosures and closures (the technical name of R
functions) have a lot in common!
## Where do quoting verbs find things?
### Contexts and hierarchical ambiguity
### Solving ambiguity with quasiquotation and the `.data` pronoun
### Raw expressions versus contextual expressions (quosures)