Skip to content

Commit

Permalink
New blog post about leaking this references
Browse files Browse the repository at this point in the history
  • Loading branch information
berrueta committed Aug 23, 2024
1 parent 2e440e9 commit e1faf77
Showing 1 changed file with 297 additions and 0 deletions.
297 changes: 297 additions & 0 deletions _posts/2024-08-22-are-you-leaking-this.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
---
layout: post
title: "Are you leaking this?"
date: 2024-08-22 10:49:29 +1200
category: software
tags:
- java
- this
- memory
- constructor
---

Regarding memory management, Java is in general a much safer language than C or C++.
However, it is still possible to get in trouble if you are not careful. One common pitfall
is to leak the `this` reference from a constructor, which can produce strange behaviour,
since the `this` reference points to an object that is not fully constructed yet.

Here is an example of how this can happen (pun intended). Consider a typical implementation
of the Observer pattern. We have a class of objects that are observable:

```java
public class Observable {
private final Collection<Observer> observers = new ArrayList<>();

public void addObserver(Observer observer) {
observers.add(observer);
}

public void removeObserver(Observer observer) {
observers.remove(observer);
}

public void notifyObservers() {
for (Observer observer : observers) {
observer.onEvent();
}
}
}
```

And we have the observers, which are objects that want to be notified when something happens:

```java
public class Observer {
private final int id;
private int eventsObserved = 0;

public Observer(int id, Observable observable) {
observable.addObserver(this);
this.id = id;
}

public void onEvent() {
System.out.println("Event received by observer " + id);
eventsObserved++;
}

public int getEventsObserved() {
return eventsObserved;
}

// autogenerated by the IDE
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Observer observer = (Observer) o;
return id == observer.id;
}

// autogenerated by the IDE
@Override
public int hashCode() {
return Objects.hashCode(id);
}
}
```

The whole reason the observers exist is to observe the observable, so it makes sense that
they register themselves with the observable in the constructor.

Finally, let's write a small test to see if everything works as expected:

```java
class ObserverTest {
@Test
void testObserver() {
// given
Observable observable = new Observable();
Observer observer0 = new Observer(0, observable);
Observer observer1 = new Observer(1, observable);

// when
observable.notifyObservers();

// then
assertEquals(1, observer0.getEventsObserved());
assertEquals(1, observer1.getEventsObserved());
}
}
```

The test passes, and the print statements produce the expected output:

```
Event received by observer 0
Event received by observer 1
```

Now let's introduce a small change in the `Observable` class. We change the `ArrayList` of
observers to a `HashSet`:

```java
private final Collection<Observer> observers = new HashMap<>();
```

We run the test again, and this time it fails:

```
Expected :1
Actual :0
```

and the print statements produce the following output:

```
Event received by observer 0
```

Looks like somehow the observer 1 did not receive the event. Our first thought might be that
we made a mistake implementing the `equals/hashCode`. Let's add a couple of assertions to the
test:

```
assertNotEquals(observer0, observer1);
assertNotEquals(observer0.hashCode(), observer1.hashCode());
```

Those assertions pass. Given the two observers are different, how come only one of them
received the event? The answer is that the two observers were not always different. In particular,
they were equal when they were added to the `HashSet`, and they only became different later.
Impossible! The `equals/hashCode` implementation uses the `id` field, which is final. It cannot
change its value!

Let's look again at the constructor of the `Observer`:

```java
public Observer(int id, Observable observable) {
observable.addObserver(this);
this.id = id;
}
```

The `this` reference is leaked from the constructor and passed to the `addObserver` method
of another object. At this point, the object is only partially constructed. The `id` field
has not received its "final" value yet. However, the observable calls `equals/hashCode` to
insert the observers into a set, and those methods operate on partially constructed objects.

Note that neither the Java compiler nor the JVM issue a warning or an error when a method
accesses an un-initialised field, even if the field is final.

You may wonder: if `id` has not been initialised yet, what is its value? Programmers with
experience in languages like C or C++ may be getting cold sweats at this point. However,
in Java the value of a field is always the default value for its type. Since `id` is an integer,
its default value is 0. This is why the two observers were equal when they were added to the
`HashSet`. However, after the constructor finished, the `id` field was initialised to the
definitive value, and the observers became different. In other words, despite `id` being a `final`
field, in practice it had two values during its lifetime!

I know what you are thinking: the constructor was purposely written to leak the `this` reference
before the `id` field was initialised. This is a contrived example. In practice, programmers
are unlikely to make that obvious mistake. Just initialise the fields before calling any other
method, or avoid calling methods of other objects from the constructor altogether, for example,
by separating the registration of the observer from the constructor. Easy!

Unfortunately, the problem is not always that obvious. What if I told you that it is possible
to leak a reference to a partially constructed `this` even without using the `this` keyword or
calling a different object from the constructor?

Let's see a different example. Consider the following class:

```java
public class Customer {
private final int age;
private final double discountRate;

public Customer(int age) {
this.age = age;
this.discountRate = calculateDiscountRate();
}

protected double calculateDiscountRate() {
if (age > 65) {
return 0.5;
} else {
return 0.0; // pay full price
}
}

public double getDiscountRate() {
return discountRate;
}
}
```

Customers get a discount based on their age. However, we plan to have other
types of customers with other types of discounts,
therefore we have declared the `calculateDiscountRate` method protected.

We now introduce a subclass of customers, the `FrequentCustomer`. Customers who
visit the shop more than 5 times get an extra 10% discount:

```java
public class FrequentCustomer extends Customer {
public static final int MINIMUM_VISITS = 5;
private final int visits;

public FrequentCustomer(int age, int visits) {
super(age);
this.visits = visits;
}

@Override
protected double calculateDiscountRate() {
if (visits >= MINIMUM_VISITS) {
return Math.min(1.0, super.calculateDiscountRate() + 0.10); // frequent customers get 10% extra discount
} else {
return super.calculateDiscountRate();
}
}
}
```

We write a test to check that the discount rate is calculated correctly:

```java
class FrequentCustomerTest {
static final int MANY_VISITS = 8;
static final int FEW_VISITS = 0;
static final int YOUNG_AGE = 20;
static final int SENIOR_AGE = 70;

@Test
void youngFrequentCustomerShouldGet10PercentDiscount() {
FrequentCustomer frequentCustomer = new FrequentCustomer(YOUNG_AGE, MANY_VISITS);
assertEquals(0.1, frequentCustomer.getDiscountRate()); // 10% for frequent visits
}

@Test
void youngInfrequentCustomerShouldGetNoDiscount() {
FrequentCustomer frequentCustomer = new FrequentCustomer(YOUNG_AGE, FEW_VISITS);
assertEquals(0.0, frequentCustomer.getDiscountRate());
}

@Test
void elderFrequentCustomerShouldGet60PercentDiscount() {
FrequentCustomer frequentCustomer = new FrequentCustomer(SENIOR_AGE, MANY_VISITS);
assertEquals(0.6, frequentCustomer.getDiscountRate()); // 50% for being a senior + 10% for frequent visits
}

@Test
void elderFrequentCustomerShouldGet50PercentDiscount() {
FrequentCustomer frequentCustomer = new FrequentCustomer(SENIOR_AGE, FEW_VISITS);
assertEquals(0.5, frequentCustomer.getDiscountRate()); // 50% for being a senior
}
}
```

Two of the test cases fail:

* `youngFrequentCustomerShouldGet10PercentDiscount`: expected 0.1, actual 0.0.
* `elderFrequentCustomerShouldGet60PercentDiscount`: expected 0.6, actual 0.5.

Clearly the 10% extra discount is not being applied. But why?

The problem is that the `calculateDiscountRate` method of the `FrequentCustomer` class
accesses the `visits` field, but at the time that code is executed, the field is not
initialised yet. The constructor of `FrequentCustomer` calls the constructor of `Customer`,
which in turn calls the overridden `calculateDiscountRate` method of `FrequentCustomer`.

In this case, there is no obvious `this` reference being leaked. In fact, the `this` keyword
does not even appear other than on the left hand side of assignments. There is also no obvious
"inversion" of the natural order of a constructor. Fields are initialised before calling methods.
At a first glance, the code looks correct.

The morale is that it is not safe to assume that when the constructor of `Customer` finishes
its execution, the object is fully initialised. The constructor of `FrequentCustomer` still
has to be executed in order to finish the initialisation of the object. In fact, it is
possible that the `calculateDiscountRate` method of `FrequentCustomer` is called before
the constructor of `FrequentCustomer` has executed!

It is clear that omitting the keyword `this` or avoiding invoking methods of other
classes is not sufficient to prevent this class of errors. How can we prevent them, then?
A safe recipe is to **never call member methods from the constructor. All methods called
from the constructor should be `static`**. Another option is to only call `final` or `private`
methods, but this is still prone to errors because depending on the order of the calls,
the method may be called before all the fields are initialised.

0 comments on commit e1faf77

Please sign in to comment.