Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for record pattern matching #5185

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

johannescoetzee
Copy link
Contributor

@johannescoetzee johannescoetzee commented Dec 16, 2024

This PR adds support for pattern matching with record patterns. The lowering is effectively done in 2 stages:

First, the instanceof check for the record pattern is created. This is represented as a series of instanceof checks for the whole record tree. There are lots of examples of what this lowering looks like in the unit tests but, since the resulting lowering gets rather large very quickly, I'll include a simple example here.

Record patterns with concrete + exact types

record Foo(String left, Integer right) {}

class Test {
  void test(Object o) {
    if (o instanceof Foo(String s, Integer i)) {}
  }
}

the instanceof check is lowered to

(o instanceof Foo) && ((($obj1 = ((Foo) o).left()) instanceof String) && (($obj0 = ((Foo) o).right()) instanceof Integer))

The first o instanceof Foo call is the same we'd expect for type pattern expressions.

The next (($obj1 = ((Foo) o).left()) instanceof String) && (($obj0 = ((Foo) o).right()) instanceof Integer) introduces the major difference from type pattern matching. For each record pattern in the pattern tree, a temporary variable is created for each record field. In this case:

$obj1 = ((Foo) o).left()
$obj0 = ((Foo) o).right()

This is necessary to avoid doubling-up on side-effects from repeating the getter calls in nested record patterns or in the pattern variable assignment.

The assignments for these temporary variables are added to the CPG where ((Foo) o).left(), for example, is first called and anywhere after that ((Foo) o).left() is replaced by $obj1.

The assignment for the pattern variables are then

s = $obj1
i = $obj0

which look a bit silly in this case since we could've just used the s and i variables directly instead of creating the temporary variables. This is because the return type of the left and right calls already match the pattern variable types exactly. In cases where this is not true, however, the temporary variables are necessary and are cast to the correct types during the variable assignments as needed. This can be seen in the more complex example below.

Records with generics / subtypes

class Bar {}
class Baz extends Bar {}

record Foo<T>(Bar left, T right) {}

class Test {
  void test(Object o) {
    if (o instanceof Foo(Baz s, Integer i)) {} 
  }
}

where the instanceof check and assignments are respectively lowered to

(o instanceof Foo) && ((($obj1 = ((Foo) o).left()) instanceof Baz) && (($obj0 = ((Foo) o).right()) instanceof Integer))

s = (Baz) $obj1
i = (Integer) $obj0

Separately handling the simpler case where no temporary variables are technically needed would lead to other questions. For example, the current lowering

($obj1 = ((Foo) o).left()) instanceof String)
s = $obj1

could be either simplified to

(s = ((Foo) o).left()) instanceof String

or the instanceof call could be omitted entirely. That would then lead to the question of when instanceof calls should even be added and I figured the extra code complexity added by dealing with this isn't justified by the results anyways (since this only applies to leaf patterns and we'd still need temporary variables for everything inbetween).

Casts in instanceof chains for nested generics

Casts in general, but in particular in the instanceof calls, are only added when necessary. For example, in

record Foo(Bar foo) {}
record Bar(String bar) {}

class Test {
  public void test(Object o) {
    if (o instanceof Foo(Bar(String s))) {}
  }
} 

the instanceof call is lowered to

(o instanceof Foo) && ((($obj0 = ((Foo) o).foo()) instanceof Bar) && (($obj1 = $obj0.bar()) instanceof String))

where (Foo) o is the only cast needed since all the other types are already exact matches. However, if these types are not already exact matches as in

class Parent {}
class Child extends Parent {}

record Foo<T>(T foo) {}
record Bar(Parent bar) {}

class Test {
  public void test(Object o) {
    if (o instanceof Foo(Bar(Child s))) {}
  }
} 

the instanceof call is lowered to

(o instanceof Foo) && ((($obj0 = ((Foo) o).foo()) instanceof Bar) && (($obj1 = ((Bar) $obj0).bar()) instanceof Child))

where the (Bar) $obj0 cast is needed as well.

@johannescoetzee johannescoetzee requested a review from ml86 December 16, 2024 15:20
@ml86
Copy link
Contributor

ml86 commented Dec 16, 2024

What is the problem in not emitting instanceof calls if they are not required?
Figuring out when to omit should be easy: If the declared type in the pattern equals the type in the record, no check is required.
I would think this would save us from generating quite a lot of checks because my expectation is that people usually want to type check one of the record element and for the others they just want to create a local variable in the if block.

@johannescoetzee
Copy link
Contributor Author

johannescoetzee commented Dec 16, 2024

The biggest issue I ran into with this was having to handle the case where the pattern match is purely there to extract fields of records and where no type check is actually needed. The 3 options I could think of were:

  1. to not create the condition at all (just using true or not creating an if statement at all)
  2. to settle for some default (maybe only doing the top-level check)
  3. to not handle these special cases

I chose 3 since it keeps the implementation simple, but 2 could be a good alternative. 1 would be too much of a departure from the original source IMO.

@ml86
Copy link
Contributor

ml86 commented Dec 17, 2024

I also think that we should not leave out an if statement which is explicit in the source code and since the if statement needs some kind of condition expression i would say we always generated the explicit top level check and the nested stuff only if necessary.

@johannescoetzee johannescoetzee force-pushed the johannes/java-record-patterns branch from 9ad945f to f09b519 Compare December 19, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants