-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
New blog post about leaking this references
- Loading branch information
Showing
1 changed file
with
297 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,297 @@ | ||
--- | ||
layout: post | ||
title: "Are you leaking this?" | ||
date: 2024-08-22 10:49:29 +1200 | ||
category: software | ||
tags: | ||
- java | ||
- this | ||
- memory | ||
- constructor | ||
--- | ||
|
||
Regarding memory management, Java is in general a much safer language than C or C++. | ||
However, it is still possible to get in trouble if you are not careful. One common pitfall | ||
is to leak the `this` reference from a constructor, which can produce strange behaviour, | ||
since the `this` reference points to an object that is not fully constructed yet. | ||
|
||
Here is an example of how this can happen (pun intended). Consider a typical implementation | ||
of the Observer pattern. We have a class of objects that are observable: | ||
|
||
```java | ||
public class Observable { | ||
private final Collection<Observer> observers = new ArrayList<>(); | ||
|
||
public void addObserver(Observer observer) { | ||
observers.add(observer); | ||
} | ||
|
||
public void removeObserver(Observer observer) { | ||
observers.remove(observer); | ||
} | ||
|
||
public void notifyObservers() { | ||
for (Observer observer : observers) { | ||
observer.onEvent(); | ||
} | ||
} | ||
} | ||
``` | ||
|
||
And we have the observers, which are objects that want to be notified when something happens: | ||
|
||
```java | ||
public class Observer { | ||
private final int id; | ||
private int eventsObserved = 0; | ||
|
||
public Observer(int id, Observable observable) { | ||
observable.addObserver(this); | ||
this.id = id; | ||
} | ||
|
||
public void onEvent() { | ||
System.out.println("Event received by observer " + id); | ||
eventsObserved++; | ||
} | ||
|
||
public int getEventsObserved() { | ||
return eventsObserved; | ||
} | ||
|
||
// autogenerated by the IDE | ||
@Override | ||
public boolean equals(Object o) { | ||
if (this == o) return true; | ||
if (o == null || getClass() != o.getClass()) return false; | ||
Observer observer = (Observer) o; | ||
return id == observer.id; | ||
} | ||
|
||
// autogenerated by the IDE | ||
@Override | ||
public int hashCode() { | ||
return Objects.hashCode(id); | ||
} | ||
} | ||
``` | ||
|
||
The whole reason the observers exist is to observe the observable, so it makes sense that | ||
they register themselves with the observable in the constructor. | ||
|
||
Finally, let's write a small test to see if everything works as expected: | ||
|
||
```java | ||
class ObserverTest { | ||
@Test | ||
void testObserver() { | ||
// given | ||
Observable observable = new Observable(); | ||
Observer observer0 = new Observer(0, observable); | ||
Observer observer1 = new Observer(1, observable); | ||
|
||
// when | ||
observable.notifyObservers(); | ||
|
||
// then | ||
assertEquals(1, observer0.getEventsObserved()); | ||
assertEquals(1, observer1.getEventsObserved()); | ||
} | ||
} | ||
``` | ||
|
||
The test passes, and the print statements produce the expected output: | ||
|
||
``` | ||
Event received by observer 0 | ||
Event received by observer 1 | ||
``` | ||
|
||
Now let's introduce a small change in the `Observable` class. We change the `ArrayList` of | ||
observers to a `HashSet`: | ||
|
||
```java | ||
private final Collection<Observer> observers = new HashMap<>(); | ||
``` | ||
|
||
We run the test again, and this time it fails: | ||
|
||
``` | ||
Expected :1 | ||
Actual :0 | ||
``` | ||
|
||
and the print statements produce the following output: | ||
|
||
``` | ||
Event received by observer 0 | ||
``` | ||
|
||
Looks like somehow the observer 1 did not receive the event. Our first thought might be that | ||
we made a mistake implementing the `equals/hashCode`. Let's add a couple of assertions to the | ||
test: | ||
|
||
``` | ||
assertNotEquals(observer0, observer1); | ||
assertNotEquals(observer0.hashCode(), observer1.hashCode()); | ||
``` | ||
|
||
Those assertions pass. Given the two observers are different, how come only one of them | ||
received the event? The answer is that the two observers were not always different. In particular, | ||
they were equal when they were added to the `HashSet`, and they only became different later. | ||
Impossible! The `equals/hashCode` implementation uses the `id` field, which is final. It cannot | ||
change its value! | ||
|
||
Let's look again at the constructor of the `Observer`: | ||
|
||
```java | ||
public Observer(int id, Observable observable) { | ||
observable.addObserver(this); | ||
this.id = id; | ||
} | ||
``` | ||
|
||
The `this` reference is leaked from the constructor and passed to the `addObserver` method | ||
of another object. At this point, the object is only partially constructed. The `id` field | ||
has not received its "final" value yet. However, the observable calls `equals/hashCode` to | ||
insert the observers into a set, and those methods operate on partially constructed objects. | ||
|
||
Note that neither the Java compiler nor the JVM issue a warning or an error when a method | ||
accesses an un-initialised field, even if the field is final. | ||
|
||
You may wonder: if `id` has not been initialised yet, what is its value? Programmers with | ||
experience in languages like C or C++ may be getting cold sweats at this point. However, | ||
in Java the value of a field is always the default value for its type. Since `id` is an integer, | ||
its default value is 0. This is why the two observers were equal when they were added to the | ||
`HashSet`. However, after the constructor finished, the `id` field was initialised to the | ||
definitive value, and the observers became different. In other words, despite `id` being a `final` | ||
field, in practice it had two values during its lifetime! | ||
|
||
I know what you are thinking: the constructor was purposely written to leak the `this` reference | ||
before the `id` field was initialised. This is a contrived example. In practice, programmers | ||
are unlikely to make that obvious mistake. Just initialise the fields before calling any other | ||
method, or avoid calling methods of other objects from the constructor altogether, for example, | ||
by separating the registration of the observer from the constructor. Easy! | ||
|
||
Unfortunately, the problem is not always that obvious. What if I told you that it is possible | ||
to leak a reference to a partially constructed `this` even without using the `this` keyword or | ||
calling a different object from the constructor? | ||
|
||
Let's see a different example. Consider the following class: | ||
|
||
```java | ||
public class Customer { | ||
private final int age; | ||
private final double discountRate; | ||
|
||
public Customer(int age) { | ||
this.age = age; | ||
this.discountRate = calculateDiscountRate(); | ||
} | ||
|
||
protected double calculateDiscountRate() { | ||
if (age > 65) { | ||
return 0.5; | ||
} else { | ||
return 0.0; // pay full price | ||
} | ||
} | ||
|
||
public double getDiscountRate() { | ||
return discountRate; | ||
} | ||
} | ||
``` | ||
|
||
Customers get a discount based on their age. However, we plan to have other | ||
types of customers with other types of discounts, | ||
therefore we have declared the `calculateDiscountRate` method protected. | ||
|
||
We now introduce a subclass of customers, the `FrequentCustomer`. Customers who | ||
visit the shop more than 5 times get an extra 10% discount: | ||
|
||
```java | ||
public class FrequentCustomer extends Customer { | ||
public static final int MINIMUM_VISITS = 5; | ||
private final int visits; | ||
|
||
public FrequentCustomer(int age, int visits) { | ||
super(age); | ||
this.visits = visits; | ||
} | ||
|
||
@Override | ||
protected double calculateDiscountRate() { | ||
if (visits >= MINIMUM_VISITS) { | ||
return Math.min(1.0, super.calculateDiscountRate() + 0.10); // frequent customers get 10% extra discount | ||
} else { | ||
return super.calculateDiscountRate(); | ||
} | ||
} | ||
} | ||
``` | ||
|
||
We write a test to check that the discount rate is calculated correctly: | ||
|
||
```java | ||
class FrequentCustomerTest { | ||
static final int MANY_VISITS = 8; | ||
static final int FEW_VISITS = 0; | ||
static final int YOUNG_AGE = 20; | ||
static final int SENIOR_AGE = 70; | ||
|
||
@Test | ||
void youngFrequentCustomerShouldGet10PercentDiscount() { | ||
FrequentCustomer frequentCustomer = new FrequentCustomer(YOUNG_AGE, MANY_VISITS); | ||
assertEquals(0.1, frequentCustomer.getDiscountRate()); // 10% for frequent visits | ||
} | ||
|
||
@Test | ||
void youngInfrequentCustomerShouldGetNoDiscount() { | ||
FrequentCustomer frequentCustomer = new FrequentCustomer(YOUNG_AGE, FEW_VISITS); | ||
assertEquals(0.0, frequentCustomer.getDiscountRate()); | ||
} | ||
|
||
@Test | ||
void elderFrequentCustomerShouldGet60PercentDiscount() { | ||
FrequentCustomer frequentCustomer = new FrequentCustomer(SENIOR_AGE, MANY_VISITS); | ||
assertEquals(0.6, frequentCustomer.getDiscountRate()); // 50% for being a senior + 10% for frequent visits | ||
} | ||
|
||
@Test | ||
void elderFrequentCustomerShouldGet50PercentDiscount() { | ||
FrequentCustomer frequentCustomer = new FrequentCustomer(SENIOR_AGE, FEW_VISITS); | ||
assertEquals(0.5, frequentCustomer.getDiscountRate()); // 50% for being a senior | ||
} | ||
} | ||
``` | ||
|
||
Two of the test cases fail: | ||
|
||
* `youngFrequentCustomerShouldGet10PercentDiscount`: expected 0.1, actual 0.0. | ||
* `elderFrequentCustomerShouldGet60PercentDiscount`: expected 0.6, actual 0.5. | ||
|
||
Clearly the 10% extra discount is not being applied. But why? | ||
|
||
The problem is that the `calculateDiscountRate` method of the `FrequentCustomer` class | ||
accesses the `visits` field, but at the time that code is executed, the field is not | ||
initialised yet. The constructor of `FrequentCustomer` calls the constructor of `Customer`, | ||
which in turn calls the overridden `calculateDiscountRate` method of `FrequentCustomer`. | ||
|
||
In this case, there is no obvious `this` reference being leaked. In fact, the `this` keyword | ||
does not even appear other than on the left hand side of assignments. There is also no obvious | ||
"inversion" of the natural order of a constructor. Fields are initialised before calling methods. | ||
At a first glance, the code looks correct. | ||
|
||
The morale is that it is not safe to assume that when the constructor of `Customer` finishes | ||
its execution, the object is fully initialised. The constructor of `FrequentCustomer` still | ||
has to be executed in order to finish the initialisation of the object. In fact, it is | ||
possible that the `calculateDiscountRate` method of `FrequentCustomer` is called before | ||
the constructor of `FrequentCustomer` has executed! | ||
|
||
It is clear that omitting the keyword `this` or avoiding invoking methods of other | ||
classes is not sufficient to prevent this class of errors. How can we prevent them, then? | ||
A safe recipe is to **never call member methods from the constructor. All methods called | ||
from the constructor should be `static`**. Another option is to only call `final` or `private` | ||
methods, but this is still prone to errors because depending on the order of the calls, | ||
the method may be called before all the fields are initialised. |