forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
OO-essentials.rmd
681 lines (481 loc) · 34.8 KB
/
OO-essentials.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
---
title: OO field guide
layout: default
output: bookdown::html_chapter
---
# OO field guide {#oo}
This chapter is a field guide for recognising and working with R's objects in the wild. R has three object oriented systems (plus the base types), so it can be a bit intimidating. The goal of this guide is not to make you an expert in all four systems, but to help you identify which system you're working with and to help you use it effectively. \index{object-oriented programming}
Central to any object-oriented system are the concepts of class and method. A __class__ defines the behaviour of __objects__ by describing their attributes and their relationship to other classes. The class is also used when selecting __methods__, functions that behave differently depending on the class of their input. Classes are usually organised in a hierarchy: if a method does not exist for a child, then the parent's method is used instead; the child __inherits__ behaviour from the parent.
R's three OO systems differ in how classes and methods are defined:
* __S3__ implements a style of OO programming called generic-function OO.
This is different from most programming languages, like Java, C++, and C#,
which implement message-passing OO. With message-passing, messages (methods)
are sent to objects and the object determines which function to call.
Typically, this object has a special appearance in the method call, usually
appearing before the name of the method/message: e.g.,
`canvas.drawRect("blue")`. S3 is different. While computations are still
carried out via methods, a special type of function called a
__generic function__ decides which method to call, e.g.,
`drawRect(canvas, "blue")`. S3 is a very casual system. It has no
formal definition of classes.
* __S4__ works similarly to S3, but is more formal. There are two major
differences to S3. S4 has formal class definitions, which describe the
representation and inheritance for each class, and has special helper
functions for defining generics and methods. S4 also has multiple dispatch,
which means that generic functions can pick methods based on the class of
any number of arguments, not just one.
* __Reference classes__, called RC for short, are quite different from S3
and S4. RC implements message-passing OO, so methods belong to classes,
not functions. `$` is used to separate objects and methods, so method calls
look like `canvas$drawRect("blue")`. RC objects are also mutable: they don't
use R's usual copy-on-modify semantics, but are modified in place. This
makes them harder to reason about, but allows them to solve problems that
are difficult to solve with S3 or S4.
There's also one other system that's not quite OO, but it's important to mention here:
* __base types__, the internal C-level types that underlie the other OO
systems. Base types are mostly manipulated using C code, but they're
important to know about because they provide the building blocks for the
other OO systems.
The following sections describe each system in turn, starting with base types. You'll learn how to recognise the OO system that an object belongs to, how method dispatch works, and how to create new objects, classes, generics, and methods for that system. The chapter concludes with a few remarks on when to use each system.
##### Prerequisites
You'll need the pryr package, `install.packages("pryr")`, to access useful functions for examining OO properties.
##### Quiz
Think you know this material already? If you can answer the following questions correctly, you can safely skip this chapter. Find the answers at the end of the chapter in [answers](#oo-answers).
1. How do you tell what OO system (base, S3, S4, or RC) an object is
associated with?
1. How do you determine the base type (like integer or list) of an object?
1. What is a generic function?
1. What are the main differences between S3 and S4? What are the main
differences between S4 & RC?
##### Outline
* [Base types](#base-types) teaches you about R's base object system. Only
R-core can add new classes to this system, but it's important to know about
because it underpins the three other systems.
* [S3](#s3) shows you the basics of the S3 object system. It's the simplest
and most commonly used OO system.
* [S4](#s4) discusses the more formal and rigorous S4 system.
* [RC](#rc) teaches you about R's newest OO system: reference classes, or
RC for short.
* [Picking a system](#picking-a-system) advises on which OO system to
use if you're starting a new project.
## Base types {#base-types}
Underlying every R object is a C structure (or struct) that describes how that object is stored in memory. The struct includes the contents of the object, the information needed for memory management, and, most importantly for this section, a __type__. This is the __base type__ of an R object. Base types are not really an object system because only the R core team can create new types. As a result, new base types are added very rarely: the most recent change, in 2011, added two exotic types that you never see in R, but are useful for diagnosing memory problems (`NEWSXP` and `FREESXP`). Prior to that, the last type added was a special base type for S4 objects (`S4SXP`) in 2005. \indexc{SEXP} \index{base types} \index{objects!base types}
[Data structures](#data-structures) explains the most common base types (atomic vectors and lists), but base types also encompass functions, environments, and other more exotic objects likes names, calls, and promises that you'll learn about later in the book. You can determine an object's base type with `typeof()`. Unfortunately the names of base types are not used consistently throughout R, and type and the corresponding "is" function may use different names: \indexc{typeof()}
```{r}
# The type of a function is "closure"
f <- function() {}
typeof(f)
is.function(f)
# The type of a primitive function is "builtin"
typeof(sum)
is.primitive(sum)
```
You may have heard of `mode()` and `storage.mode()`. I recommend ignoring these functions because they're just aliases of the names returned by `typeof()`, and exist solely for S compatibility. Read their source code if you want to understand exactly what they do. \indexc{mode()}
Functions that behave differently for different base types are almost always written in C, where dispatch occurs using switch statements (e.g., `switch(TYPEOF(x))`). Even if you never write C code, it's important to understand base types because everything else is built on top of them: S3 objects can be built on top of any base type, S4 objects use a special base type, and RC objects are a combination of S4 and environments (another base type). To see if an object is a pure base type, i.e., it doesn't also have S3, S4, or RC behaviour, check that `is.object(x)` returns `FALSE`.
## S3 {#s3}
S3 is R's first and simplest OO system. It is the only OO system used in the base and stats packages, and it's the most commonly used system in CRAN packages. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can't take away any part of it and still have a useful OO system. \index{S3} \index{objects!S3|see{S3}}
### Recognising objects, generic functions, and methods
Most objects that you encounter are S3 objects. But unfortunately there's no simple way to test if an object is an S3 object in base R. The closest you can come is `is.object(x) & !isS4(x)`, i.e., it's an object, but not S4. An easier way is to use `pryr::otype()`: \indexc{otype()}
```{r, message = FALSE}
library(pryr)
df <- data.frame(x = 1:10, y = letters[1:10])
otype(df) # A data frame is an S3 class
otype(df$x) # A numeric vector isn't
otype(df$y) # A factor is
```
In S3, methods belong to functions, called __generic functions__, or generics for short. S3 methods do not belong to objects or classes. This is different from most other programming languages, but is a legitimate OO style. \index{functions!generics|see{generics}} \index{S3!generics} \index{generics!S3}
To determine if a function is an S3 generic, you can inspect its source code for a call to `UseMethod()`: that's the function that figures out the correct method to call, the process of __method dispatch__. Similar to `otype()`, pryr also provides `ftype()` which describes the object system, if any, associated with a function: \indexc{UseMethod()}
```{r}
mean
ftype(mean)
```
Some S3 generics, like `[`, `sum()`, and `cbind()`, don't call `UseMethod()` because they are implemented in C. Instead, they call the C functions `DispatchGroup()` or `DispatchOrEval()`. Functions that do method dispatch in C code are called __internal generics__ and are documented in `?"internal generic"`. `ftype()` knows about these special cases too.
Given a class, the job of an S3 generic is to call the right S3 method. You can recognise S3 methods by their names, which look like `generic.class()`. For example, the Date method for the `mean()` generic is called `mean.Date()`, and the factor method for `print()` is called `print.factor()`. \index{methods!S3} \index{S3!methods}
This is the reason that most modern style guides discourage the use of `.` in function names: it makes them look like S3 methods. For example, is `t.test()` the `t` method for `test` objects? Similarly, the use of `.` in class names can also be confusing: is `print.data.frame()` the `print()` method for `data.frames`, or the `print.data()` method for `frames`? `pryr::ftype()` knows about these exceptions, so you can use it to figure out if a function is an S3 method or generic:
```{r}
ftype(t.data.frame) # data frame method for t()
ftype(t.test) # generic function for t tests
```
You can see all the methods that belong to a generic with `methods()`:
```{r}
methods("mean")
methods("t.test")
```
(Apart from methods defined in the base package, most S3 methods will not be visible: use `getS3method()` to read their source code.)
You can also list all generics that have a method for a given class:
```{r}
methods(class = "ts")
```
There's no way to list all S3 classes, as you'll learn in the following section.
### Defining classes and creating objects
S3 is a simple and ad hoc system; it has no formal definition of a class. To make an object an instance of a class, you just take an existing base object and set the class attribute. You can do that during creation with `structure()`, or after the fact with `class<-()`: \index{S3!classes} \index{classes!S3}
```{r}
# Create and assign class in one step
foo <- structure(list(), class = "foo")
# Create, then set class
foo <- list()
class(foo) <- "foo"
```
S3 objects are usually built on top of lists, or atomic vectors with attributes. (You can refresh your memory of attributes with [attributes](#attributes).) You can also turn functions into S3 objects. Other base types are either rarely seen in R, or have unusual semantics that don't work well with attributes.
You can determine the class of any object using `class(x)`, and see if an object inherits from a specific class using `inherits(x, "classname")`. \index{attributes!class}
```{r}
class(foo)
inherits(foo, "foo")
```
The class of an S3 object can be a vector, which describes behaviour from most to least specific. For example, the class of the `glm()` object is `c("glm", "lm")` indicating that generalised linear models inherit behaviour from linear models. Class names are usually lower case, and you should avoid `.`. Otherwise, opinion is mixed whether to use underscores (`my_class`) or CamelCase (`MyClass`) for multi-word class names.
Most S3 classes provide a constructor function:
```{r}
foo <- function(x) {
if (!is.numeric(x)) stop("X must be numeric")
structure(list(x), class = "foo")
}
```
You should use it if it's available (like for `factor()` and `data.frame()`). This ensures that you're creating the class with the correct components. Constructor functions usually have the same name as the class.
Apart from developer supplied constructor functions, S3 has no checks for correctness. This means you can change the class of existing objects:
```{r}
# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)
# Turn it into a data frame (?!)
class(mod) <- "data.frame"
# But unsurprisingly this doesn't work very well
print(mod)
# However, the data is still there
mod$coefficients
```
If you've used other OO languages, this might make you feel queasy. But surprisingly, this flexibility causes few problems: while you _can_ change the type of an object, you never should. R doesn't protect you from yourself: you can easily shoot yourself in the foot. As long as you don't aim the gun at your foot and pull the trigger, you won't have a problem.
### Creating new methods and generics
To add a new generic, create a function that calls `UseMethod()`. `UseMethod()` takes two arguments: the name of the generic function, and the argument to use for method dispatch. If you omit the second argument it will dispatch on the first argument to the function. There's no need to pass any of the arguments of the generic to `UseMethod()` and you shouldn't do so. `UseMethod()` uses black magic to find them out for itself. \indexc{UseMethod()} \index{S3!new generic}
```{r}
f <- function(x) UseMethod("f")
```
A generic isn't useful without some methods. To add a method, you just create a regular function with the correct (`generic.class`) name: \index{S3!new methods}
```{r}
f.a <- function(x) "Class a"
a <- structure(list(), class = "a")
class(a)
f(a)
```
Adding a method to an existing generic works in the same way:
```{r}
mean.a <- function(x) "a"
mean(a)
```
As you can see, there's no check to make sure that the method returns the class compatible with the generic. It's up to you to make sure that your method doesn't violate the expectations of existing code.
### Method dispatch
S3 method dispatch is relatively simple. `UseMethod()` creates a vector of function names, like `paste0("generic", ".", c(class(x), "default"))` and looks for each in turn. The "default" class makes it possible to set up a fall back method for otherwise unknown classes. \index{S3!method dispatch rules}
```{r}
f <- function(x) UseMethod("f")
f.a <- function(x) "Class a"
f.default <- function(x) "Unknown class"
f(structure(list(), class = "a"))
# No method for b class, so uses method for a class
f(structure(list(), class = c("b", "a")))
# No method for c class, so falls back to default
f(structure(list(), class = "c"))
```
Group generic methods add a little more complexity. Group generics make it possible to implement methods for multiple generics with one function. The four group generics and the functions they include are: \index{group generics} \index{S3!group generics}
* Math: `abs`, `sign`, `sqrt`, `floor`, `cos`, `sin`, `log`, `exp`, ...
* Ops: `+`, `-`, `*`, `/`, `^`, `%%`, `%/%`, `&`, `|`, `!`, `==`, `!=`, `<`, `<=`, `>=`, `>`
* Summary: `all`, `any`, `sum`, `prod`, `min`, `max`, `range`
* Complex: `Arg`, `Conj`, `Im`, `Mod`, `Re`
Group generics are a relatively advanced technique and are beyond the scope of this chapter but you can find out more about them in `?groupGeneric`. The most important thing to take away from this is to recognise that `Math`, `Ops`, `Summary`, and `Complex` aren't real functions, but instead represent groups of functions. Note that inside a group generic function a special variable `.Generic` provides the actual generic function called.
If you have complex class hierarchies it's sometimes useful to call the "parent" method. It's a little bit tricky to define exactly what that means, but it's basically the method that would have been called if the current method did not exist. Again, this is an advanced technique: you can read about it in `?NextMethod`. \indexc{NextMethod()}
Because methods are normal R functions, you can call them directly:
```{r}
c <- structure(list(), class = "c")
# Call the correct method:
f.default(c)
# Force R to call the wrong method:
f.a(c)
```
However, this is just as dangerous as changing the class of an object, so you shouldn't do it. Please don't point the loaded gun at your foot! The only reason to call the method directly is that sometimes you can get considerable performance improvements by skipping method dispatch. See [performance](#be-lazy) for details.
You can also call an S3 generic with a non-S3 object. Non-internal S3 generics will dispatch on the __implicit class__ of base types. (Internal generics don't do that for performance reasons.) The rules to determine the implicit class of a base type are somewhat complex, but are shown in the function below: \index{implicit class} \index{base types!implicit class}
```{r}
iclass <- function(x) {
if (is.object(x)) {
stop("x is not a primitive type", call. = FALSE)
}
c(
if (is.matrix(x)) "matrix",
if (is.array(x) && !is.matrix(x)) "array",
if (is.double(x)) "double",
if (is.integer(x)) "integer",
mode(x)
)
}
iclass(matrix(1:5))
iclass(array(1.5))
```
### Exercises
1. Read the source code for `t()` and `t.test()` and confirm that
`t.test()` is an S3 generic and not an S3 method. What happens if
you create an object with class `test` and call `t()` with it?
1. What classes have a method for the `Math` group generic in base R? Read
the source code. How do the methods work?
1. R has two classes for representing date time data, `POSIXct` and
`POSIXlt`, which both inherit from `POSIXt`. Which generics have
different behaviours for the two classes? Which generics share the same
behaviour?
1. Which base generic has the greatest number of defined methods?
1. `UseMethod()` calls methods in a special way. Predict what the following
code will return, then run it and read the help for `UseMethod()` to
figure out what's going on. Write down the rules in the simplest form
possible.
```{r, eval = FALSE}
y <- 1
g <- function(x) {
y <- 2
UseMethod("g")
}
g.numeric <- function(x) y
g(10)
h <- function(x) {
x <- 10
UseMethod("h")
}
h.character <- function(x) paste("char", x)
h.numeric <- function(x) paste("num", x)
h("a")
```
1. Internal generics don't dispatch on the implicit class of base types.
Carefully read `?"internal generic"` to determine why the length of `f`
and `g` is different in the example below. What function helps
distinguish between the behaviour of `f` and `g`?
```{r, eval = FALSE}
f <- function() 1
g <- function() 2
class(g) <- "function"
class(f)
class(g)
length.function <- function(x) "function"
length(f)
length(g)
```
## S4 {#s4}
S4 works in a similar way to S3, but it adds formality and rigour. Methods still belong to functions, not classes, but: \index{objects!S4|see{S4}} \index{S4}
* Classes have formal definitions which describe their fields and
inheritance structures (parent classes).
* Method dispatch can be based on multiple arguments to a generic function,
not just one.
* There is a special operator, `@`, for extracting slots (aka fields)
from an S4 object.
All S4 related code is stored in the `methods` package. This package is always available when you're running R interactively, but may not be available when running R in batch mode. For this reason, it's a good idea to include an explicit `library(methods)` whenever you're using S4.
S4 is a rich and complex system. There's no way to explain it fully in a few pages. Here I'll focus on the key ideas underlying S4 so you can use existing S4 objects effectively. To learn more, some good references are:
* [S4 system development in Bioconductor](http://www.bioconductor.org/help/course-materials/2010/AdvancedR/S4InBioconductor.pdf)
* John Chambers' [_Software for Data Analysis_](http://amzn.com/0387759352?tag=devtools-20)
* [Martin Morgan's answers to S4 questions on stackoverflow](http://stackoverflow.com/search?tab=votes&q=user%3a547331%20%5bs4%5d%20is%3aanswe)
### Recognising objects, generic functions, and methods
Recognising S4 objects, generics, and methods is easy. You can identify an S4 object because `str()` describes it as a "formal" class, `isS4()` returns `TRUE`, and `pryr::otype()` returns "S4". S4 generics and methods are also easy to identify because they are S4 objects with well defined classes.
There aren't any S4 classes in the commonly used base packages (stats, graphics, utils, datasets, and base), so we'll start by creating an S4 object from the built-in stats4 package, which provides some S4 classes and methods associated with maximum likelihood estimation:
```{r}
library(stats4)
# From example(mle)
y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
nLL <- function(lambda) - sum(dpois(y, lambda, log = TRUE))
fit <- mle(nLL, start = list(lambda = 5), nobs = length(y))
# An S4 object
isS4(fit)
otype(fit)
# An S4 generic
isS4(nobs)
ftype(nobs)
# Retrieve an S4 method, described later
mle_nobs <- method_from_call(nobs(fit))
isS4(mle_nobs)
ftype(mle_nobs)
```
Use `is()` with one argument to list all classes that an object inherits from. Use `is()` with two arguments to test if an object inherits from a specific class.
```{r}
is(fit)
is(fit, "mle")
```
You can get a list of all S4 generics with `getGenerics()`, and a list of all S4 classes with `getClasses()`. This list includes shim classes for S3 classes and base types. You can list all S4 methods with `showMethods()`, optionally restricting selection either by `generic` or by `class` (or both). It's also a good idea to supply `where = search()` to restrict the search to methods available in the global environment.
### Defining classes and creating objects
In S3, you can turn any object into an object of a particular class just by setting the class attribute. S4 is much stricter: you must define the representation of a class with `setClass()`, and create a new object with `new()`. You can find the documentation for a class with a special syntax: `class?className`, e.g., `class?mle`. \index{S4!classes} \index{classes!S4}
An S4 class has three key properties:
* A __name__: an alpha-numeric class identifier. By convention, S4 class names
use UpperCamelCase.
* A named list of __slots__ (fields), which defines slot names and
permitted classes. For example, a person class might be represented by a
character name and a numeric age: `list(name = "character", age = "numeric")`.
\index{slots}
* A string giving the class it inherits from, or, in S4 terminology,
that it __contains__. You can provide multiple classes for multiple
inheritance, but this is an advanced technique which adds much
complexity.
In `slots` and `contains` you can use S4 classes, S3 classes registered
with `setOldClass()`, or the implicit class of a base type. In `slots`
you can also use the special class `ANY` which does not restrict the input.
S4 classes have other optional properties like a `validity` method that tests if an object is valid, and a `prototype` object that defines default slot values. See `?setClass` for more details.
The following example creates a Person class with fields name and age, and an Employee class that inherits from Person. The Employee class inherits the slots and methods from the Person, and adds an additional slot, boss. To create objects we call `new()` with the name of the class, and name-value pairs of slot values. \indexc{setClass()} \indexc{new()}
```{r}
setClass("Person",
slots = list(name = "character", age = "numeric"))
setClass("Employee",
slots = list(boss = "Person"),
contains = "Person")
alice <- new("Person", name = "Alice", age = 40)
john <- new("Employee", name = "John", age = 20, boss = alice)
```
Most S4 classes also come with a constructor function with the same name as the class: if that exists, use it instead of calling `new()` directly.
To access slots of an S4 object use `@` or `slot()`: \index{subsetting!S4} \index{S4|subsetting}
```{r}
alice@age
slot(john, "boss")
```
(`@` is equivalent to `$`, and `slot()` to `[[`.)
If an S4 object contains (inherits from) an S3 class or a base type, it will have a special `.Data` slot which contains the underlying base type or S3 object: \indexc{.Data}
```{r}
setClass("RangedNumeric",
contains = "numeric",
slots = list(min = "numeric", max = "numeric"))
rn <- new("RangedNumeric", 1:10, min = 1, max = 10)
rn@min
rn@.Data
```
Since R is an interactive programming language, it's possible to create new classes or redefine existing classes at any time. This can be a problem when you're interactively experimenting with S4. If you modify a class, make sure you also recreate any objects of that class, otherwise you'll end up with invalid objects.
### Creating new methods and generics
S4 provides special functions for creating new generics and methods. `setGeneric()` creates a new generic or converts an existing function into a generic. `setMethod()` takes the name of the generic, the classes the method should be associated with, and a function that implements the method. For example, we could take `union()`, which usually just works on vectors, and make it work with data frames: \index{S4!generics} \index{S4!methods} \index{generics!S4} \index{methods!S4}
```{r}
setGeneric("union")
setMethod("union",
c(x = "data.frame", y = "data.frame"),
function(x, y) {
unique(rbind(x, y))
}
)
```
If you create a new generic from scratch, you need to supply a function that calls `standardGeneric()`:
```{r}
setGeneric("myGeneric", function(x) {
standardGeneric("myGeneric")
})
```
`standardGeneric()` is the S4 equivalent to `UseMethod()`.
### Method dispatch
If an S4 generic dispatches on a single class with a single parent, then S4 method dispatch is the same as S3 dispatch. The main difference is how you set up default values: S4 uses the special class `ANY` to match any class and "missing" to match a missing argument. Like S3, S4 also has group generics, documented in `?S4groupGeneric`, and a way to call the "parent" method, `callNextMethod()`. \index{S4!method dispatch rules}
Method dispatch becomes considerably more complicated if you dispatch on multiple arguments, or if your classes use multiple inheritance. The rules are described in `?Methods`, but they are complicated and it's difficult to predict which method will be called. For this reason, I strongly recommend avoiding multiple inheritance and multiple dispatch unless absolutely necessary.
Finally, there are two methods that find which method gets called given the specification of a generic call:
```{r, eval = FALSE}
# From methods: takes generic name and class names
selectMethod("nobs", list("mle"))
# From pryr: takes an unevaluated function call
method_from_call(nobs(fit))
```
### Exercises
1. Which S4 generic has the most methods defined for it? Which S4 class
has the most methods associated with it?
1. What happens if you define a new S4 class that doesn't "contain" an
existing class? (Hint: read about virtual classes in `?Classes`.)
1. What happens if you pass an S4 object to an S3 generic? What happens
if you pass an S3 object to an S4 generic? (Hint: read `?setOldClass`
for the second case.)
## RC {#rc}
Reference classes (or RC for short) are the newest OO system in R. They were introduced in version 2.12. They are fundamentally different to S3 and S4 because: \index{RC} \index{reference classes|see{RC}} \index{objects!RC|see{RC}}
* RC methods belong to objects, not functions
* RC objects are mutable: the usual R copy-on-modify semantics do not apply
These properties make RC objects behave more like objects do in most other programming languages, e.g., Python, Ruby, Java, and C#. Reference classes are implemented using R code: they are a special S4 class that wraps around an environment.
### Defining classes and creating objects
Since there aren't any reference classes provided by the base R packages, we'll start by creating one. RC classes are best used for describing stateful objects, objects that change over time, so we'll create a simple class to model a bank account. \index{RC!classes} \index{classes!RC}
Creating a new RC class is similar to creating a new S4 class, but you use `setRefClass()` instead of `setClass()`. The first, and only required argument, is an alphanumeric __name__. While you can use `new()` to create new RC objects, it's good style to use the object returned by `setRefClass()` to generate new objects. (You can also do that with S4 classes, but it's less common.) \indexc{setRefClass()}
```{r}
Account <- setRefClass("Account")
Account$new()
```
`setRefClass()` also accepts a list of name-class pairs that define class __fields__ (equivalent to S4 slots). Additional named arguments passed to `new()` will set initial values of the fields. You can get and set field values with `$`: \index{fields}
```{r}
Account <- setRefClass("Account",
fields = list(balance = "numeric"))
a <- Account$new(balance = 100)
a$balance
a$balance <- 200
a$balance
```
Instead of supplying a class name for the field, you can provide a single argument function which will act as an accessor method. This allows you to add custom behaviour when getting or setting a field. See `?setRefClass` for more details.
Note that RC objects are __mutable__, i.e., they have reference semantics, and are not copied-on-modify: \index{copy-on-modify!exceptions}
```{r}
b <- a
b$balance
a$balance <- 0
b$balance
```
For this reason, RC objects come with a `copy()` method that allow you to make a copy of the object:
```{r}
c <- a$copy()
c$balance
a$balance <- 100
c$balance
```
An object is not very useful without some behaviour defined by __methods__. RC methods are associated with a class and can modify its fields in place. In the following example, note that you access the value of fields with their name, and modify them with `<<-`. You'll learn more about `<<-` in [Environments](#binding). \index{RC!methods} \index{methods!RC} \indexc{<<-}
```{r}
Account <- setRefClass("Account",
fields = list(balance = "numeric"),
methods = list(
withdraw = function(x) {
balance <<- balance - x
},
deposit = function(x) {
balance <<- balance + x
}
)
)
```
You call an RC method in the same way as you access a field:
```{r}
a <- Account$new(balance = 100)
a$deposit(100)
a$balance
```
The final important argument to `setRefClass()` is `contains`. This is the name of the parent RC class to inherit behaviour from. The following example creates a new type of bank account that returns an error preventing the balance from going below 0.
```{r, error = TRUE}
NoOverdraft <- setRefClass("NoOverdraft",
contains = "Account",
methods = list(
withdraw = function(x) {
if (balance < x) stop("Not enough money")
balance <<- balance - x
}
)
)
accountJohn <- NoOverdraft$new(balance = 100)
accountJohn$deposit(50)
accountJohn$balance
accountJohn$withdraw(200)
```
All reference classes eventually inherit from `envRefClass`. It provides useful methods like `copy()` (shown above), `callSuper()` (to call the parent field), `field()` (to get the value of a field given its name), `export()` (equivalent to `as()`), and `show()` (overridden to control printing). See the inheritance section in `setRefClass()` for more details.
### Recognising objects and methods
You can recognise RC objects because they are S4 objects (`isS4(x)`) that inherit from "refClass" (`is(x, "refClass")`). `pryr::otype()` will return "RC". RC methods are also S4 objects, with class `refMethodDef`.
### Method dispatch
Method dispatch is very simple in RC because methods are associated with classes, not functions. When you call `x$f()`, R will look for a method f in the class of x, then in its parent, then its parent's parent, and so on. From within a method, you can call the parent method directly with `callSuper(...)`. \index{RC!method dispatch rules}
### Exercises
1. Use a field function to prevent the account balance from being directly
manipulated. (Hint: create a "hidden" `.balance` field, and read the
help for the fields argument in `setRefClass()`.)
1. I claimed that there aren't any RC classes in base R, but that was a
bit of a simplification. Use `getClasses()` and find which classes
`extend()` from `envRefClass`. What are the classes used for? (Hint:
recall how to look up the documentation for a class.)
## Picking a system {#picking-a-system}
Three OO systems is a lot for one language, but for most R programming, S3 suffices. In R you usually create fairly simple objects and methods for pre-existing generic functions like `print()`, `summary()`, and `plot()`. S3 is well suited to this task, and the majority of OO code that I have written in R is S3. S3 is a little quirky, but it gets the job done with a minimum of code. \index{objects!which system?}
```{r, eval = FALSE, echo = FALSE}
packageVersion("Matrix")
library(Matrix)
gs <- getGenerics("package:Matrix")
sum(gs@package == "Matrix")
length(getClasses("package:Matrix", FALSE))
```
If you are creating more complicated systems of interrelated objects, S4 may be more appropriate. A good example is the `Matrix` package by Douglas Bates and Martin Maechler. It is designed to efficiently store and compute with many different types of sparse matrices. As of version 1.1.3, it defines 102 classes and 20 generic functions. The package is well written and well commented, and the accompanying vignette (`vignette("Intro2Matrix", package = "Matrix")`) gives a good overview of the structure of the package. S4 is also used extensively by Bioconductor packages, which need to model complicated interrelationships between biological objects. Bioconductor provides many [good resources](https://www.google.com/search?q=bioconductor+s4) for learning S4. If you've mastered S3, S4 is relatively easy to pick up; the ideas are all the same, it is just more formal, more strict, and more verbose.
If you've programmed in a mainstream OO language, RC will seem very natural. But because they can introduce side effects through mutable state, they are harder to understand. For example, when you usually call `f(a, b)` in R you can assume that `a` and `b` will not be modified. But if `a` and `b` are RC objects, they might be modified in the place. Generally, when using RC objects you want to minimise side effects as much as possible, and use them only where mutable states are absolutely required. The majority of functions should still be "functional", and free of side effects. This makes code easier to reason about and easier for other R programmers to understand.
## Quiz answers {#oo-answers}
1. To determine the OO system of an object, you use a process of elimination.
If `!is.object(x)`, it's a base object. If `!isS4(x)`, it's S3. If
`!is(x, "refClass")`, it's S4; otherwise it's RC.
1. Use `typeof()` to determine the base class of an object.
1. A generic function calls specific methods depending on the class of
it inputs. In S3 and S4 object systems, methods belong to generic
functions, not classes like in other programming languages.
1. S4 is more formal than S3, and supports multiple inheritance and
multiple dispatch. RC objects have reference semantics, and methods
belong to classes, not functions.