Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let blocks #2010

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Let blocks #2010

wants to merge 17 commits into from

Conversation

jneem
Copy link
Member

@jneem jneem commented Jul 25, 2024

This implements let blocks, but only for "plain" bindings that don't do any destructuring.

I looked into handling destructuring. As long as we don't need to allow recursive bindings, I think destructuring can be handled by desugaring into a match block that matches an array of all the right hand sides. I.e.,

let
  a = 1,
  { foo } = { foo = 2 }
in ...

becomes

[1, { foo = 2 }] |> match { [a, { foo }] => ... }

Recursive let blocks seem to be rather more tricky with the current pattern compilation strategy.

@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:15 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:40 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:54 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 16:19 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 26, 2024 06:08 Inactive
@jneem jneem marked this pull request as ready for review July 26, 2024 06:15
@jneem jneem requested a review from yannham July 26, 2024 06:15
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to ball rolling on this long awaited feature 🙂 as far as I can tell the code looks good, but I have a couple reservations at the time being at the design level:

  • We're trading the let-binding structure for something more complicated (a hashmap), even for a single-var let-binding, which I suppose is the most common case. Do we have an idea of the cost of that (maybe that will be easier to answer thanks to your other PR on continuous benchmarks!). Subsidiary question: do we need a hashmap here? I would rather expect a Vec, given that we are going to iterate through them and insert them in the environment.
  • I think we should first try to pause and think about the following possible combinations: normal let block as implemented in this PR, let rec block (but still without pattern), let blocks with patterns and let rec blocks with patterns. In particular, what's the semantics and the possible desugaring/compilation of those. And more importantly, is each combination meaningful? Or can we get away with restricting the possibilities? I'm fine restricting let-blocks, but I don't think the fact that it wouldn't work well with how we compile pattern currently is a good motivation, without having either a plan B for how it could work in the future, or a good reason to rule let rec blocks with patterns because they're expected to be much less useful.

@jneem
Copy link
Member Author

jneem commented Jul 30, 2024

We're trading the let-binding structure for something more complicated

Oh, good point. I think a Vec should be fine, and probably even a SmallVec or similar. I'll benchmark the options.

I think it isn't too difficult to support let blocks with patterns, so let me try to do that. If that works, the forbidden combination would be rec + pattern, which is a bit arbitrary but at least it's easy to explain. Ultimately, it would be nice to just not have any restrictions at all. I don't think it's hard to fix it, it will just take some work. The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

@github-actions github-actions bot temporarily deployed to pull request July 31, 2024 11:08 Inactive
@jneem
Copy link
Member Author

jneem commented Jul 31, 2024

Ok, I've added support for patterns in let blocks.

I also measured the performance. It looks like the IndexMap hurt performance a bit, and changing it to SmallVec is as fast as the original. It's a little hard to be sure, since the benchmarks are pretty noisy.

@github-actions github-actions bot temporarily deployed to pull request July 31, 2024 11:31 Inactive
@yannham
Copy link
Member

yannham commented Jul 31, 2024

So, the semantics of let pat1 = val1, ..., patn = valn in body is let [pat1, ..., patn] = [val1, ..., valn]. I think that's fair, although probably a bit wasteful as an actual desugaring. But it's ok for a new feature. It's also likely very easy to compile to a sequence of patterns directly in a more efficient manner without allocating, building and destructuring an array (I'm thinking implementing a variant of Compile for something wrapping [Pattern] and taking a sequence of arguments, or something like that).

The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

I think pattern compilation is already doing something close to what you want. It basically compiles a pattern - given the term to match on - to an expression that evaluates to null or to a record of the bindings introduced by the pattern. The final step is to call %pattern_branch% or sth that evaluates the branch in an environment augmented with the bindings. We can change this last step to something more involved. But I fear this is going to be more complicated, because before we can bind any of the pattern variable in the environment, we first have to evaluate part of the matched value. I suspect we'll need something similar to what we do for recursive records, that is patching the environment afterwards to "tie the knot" properly.

In general, for recursive let-blocks with patterns, both the desugaring and the possible compilation aren't entirely crystal clear to me right now. Looking at a single recursive destructuring let could be interesting, something like let {x=a, y=b} = ...whatever uses a and b.... in body. Once again, before even being able to build the recursive environment for a and b, you first need to evaluate partially the body first. Maybe we can fill the environment with "bombs" initially, like InfinteRecursion errors (because obviously if we need a before we can even bind a, this isn't going to work). But then, we will bind a to a closure with those bombs in the environment, and we'll then need to patch them later with the actual recursive environment. As far as I can tell, this requires a bit more machinery that what's currently available. Or at least to repurpose part of the recursive records' machinery to that end.

@jneem
Copy link
Member Author

jneem commented Jul 31, 2024

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
  %0 = foo,
  a = %0.x,
  %1 = %0.y,
  c = %1.z
in
bar

I think this extends to handle arrays pretty easily. I'm less sure about constants and ADTs. Maybe those could be handled by extending the "primitive" (i.e. non-pattern) let term to allow constants or patterns of the form 'Foo x

@yannham
Copy link
Member

yannham commented Aug 1, 2024

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
 %0 = foo,
 a = %0.x,
 %1 = %0.y,
 c = %1.z
in
bar

So, if we have {x=[{y=[c]}]]}, we'll need to have c as a top-level binding I guess. It seems it's like pattern matching except that you bubble up all the bound variables as a top-level let-binding.

I think this illustrates my point: we now have to (re?)define the semantics of patterns for both pattern matching and destructuring separately, and also probably have separate implementations, as we used to, beyond the fact that I trust you can find a reasonable semantics for recursive destructuring. But as before, they might diverge or have different evaluation strategy, which can be surprising. It's not necessarily a no-go, but it's not very satisfying. I just wondered if could unify both in one core compilation scheme (even if there recursive side is actually never used in actual match expression but only in destructuring).

I think what the pattern matching compilation is doing is pretty close in spirit already. It just does that in pure Nickel code, by putting stuff in a record in several steps instead of bubbling up all the bindings at the top-level. One possibility is to make pattern compilation more like what you propose. Another is to do what I propose, and patch the result at the end. The issue with the latter is in case of infinite recursion (the current pattern matching compilation would cause an unbound variable error instead), but I think putting "bombs" in the environment and shadowing them at the end once we can build the recursive environment is actually maybe not that bad.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a more thorough review. It looks good, I'm just requesting changes to block merging because I think we shouldn't include this in 1.8, which is too close.

I think it could have more tests, in particular for typechecking recursive let-blocks or let-blocks with patterns.

It's also a bit annoying that we often have to do two traversals of the bindings just for the recursive case, as mentioned in a comment. But I guess we can live with it.

core/src/parser/utils.rs Outdated Show resolved Hide resolved
core/src/parser/utils.rs Show resolved Hide resolved
core/src/pretty.rs Show resolved Hide resolved
core/src/transform/desugar_destructuring.rs Outdated Show resolved Hide resolved
core/src/transform/free_vars.rs Outdated Show resolved Hide resolved
lsp/nls/src/position.rs Outdated Show resolved Hide resolved

for (_, val) in bindings {
self.fill(val, if attrs.rec { &new_env } else { env });
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not going to have any import consequence, but it's a bit annoying that we have to do those double iterations each time just in case the let is recursive. One possibility would be to distinguish both cases in the match, and have a different implementation, which leads obviously to a bit of code duplication. Or adds if wherever needed (to do everything in the first for and elide the second in the non-recursive case), but I fear this will make the code not very readable. So I don't have a great idea nor a strong opinion. It's just... itchy 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I definitely agree. This was the clearest version I could come up with, even though it isn't the most efficient.

@github-actions github-actions bot temporarily deployed to pull request August 5, 2024 10:20 Inactive
Copy link
Contributor

github-actions bot commented Aug 5, 2024

🐰Bencher

ReportThu, August 8, 2024 at 03:25:21 UTC
Projectnickel
Branch2010/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
fibonacci 10➖ (view plot)491,960.00
pidigits 100➖ (view plot)3,193,300.00
product 30➖ (view plot)796,660.00
scalar 10➖ (view plot)1,468,900.00
sum 30➖ (view plot)795,990.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

@jneem jneem requested a review from yannham August 5, 2024 10:29
@jneem
Copy link
Member Author

jneem commented Aug 5, 2024

I took a look at doing recursive pattern bindings by putting "bombs" in the environment, but I had trouble working out when to replace them. Like, in

let rec
  { foo = a, .. } = b,
  b = { foo = 1, c = a },
in b.c

then we want to start off by poisoning "a", then evaluating "b" to normal form, then we need to execute the pattern binding and unpoisoning "a", and then evaluating "c.a". But doing things in a different order could easily make us evaluate "a" too early, and I don't see a general way to do it...

Also, one thing I noticed about recursive pattern bindings is that it might want to be lazier than non-recursive bindings. For example, we currently eagerly evaluate enough of the bound value to check whether it matches the pattern but it isn't too hard to come up with examples of recursive bindings where this leads to infinite recursion that could be avoided.

@yannham
Copy link
Member

yannham commented Aug 5, 2024

Hmm. The more we talk about it, it the more it seems like recursive destructuring is just maybe something we should entirely avoid. Because if we have trouble defining the implementation and lazyness of this scheme, it's most likely neither obvious nor natural (in the mathematical sense of canonical) for the user as well.

@jneem
Copy link
Member Author

jneem commented Aug 6, 2024

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

let rec
  a = { foo = 3 }
  b = a.foo + 1
in b

but then if I want to refactor it to

let rec
  { foo } = { foo = 3 }
  b = foo + 1
in b

then I need to break up the let block into a sequence of lets. But I agree, if we don't have good semantics in general then it doesn't make sense to add it.

@jneem
Copy link
Member Author

jneem commented Aug 6, 2024

Ok, I tried writing tests and there are a bunch of problems with typechecking recursive let blocks. We'll need to do something more like what recursive records do, but I need to understand that first...

@jneem jneem marked this pull request as draft August 6, 2024 04:37
@yannham
Copy link
Member

yannham commented Aug 6, 2024

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

One possibility would be to rule out more complicated cases, because it's not obvious that they are really useful in practice. So, for example, we could require that the right hand side of a destructuring let rec must evaluate to a WHNF without needing the recursive bindings. Then we could do the poisoning approach without caring about the order.

Otherwise, I reckon the right semantics for recursive destructuring is probably the old one where it's just desugared into a sequence of lazy let-bindings. However, I think there were non trivial issues around default values, lazyness (something like let 'Foo 5 = value in ... would be transparent because the pattern doesn't bind anything, while today it's desugared to value |> match { 'Foo 5 => ...} which does check that value is 'Foo 5), and error reporting. I just wonder if recursive destructuring is useful enough to 1) go back to this more complicated situation with discrepancies between pattern matching and destructuring, and more line of code in the implementation 2) breaking backward compatibility (again)

@github-actions github-actions bot temporarily deployed to pull request August 7, 2024 09:36 Inactive
@github-actions github-actions bot temporarily deployed to pull request August 7, 2024 10:27 Inactive
// FIXME: if we're recursive, do we need to do unify the fresh
// type variables with the (modified by the recursive check)
// binding_type? I feel like we should, but it doesn't seem to make
// a difference.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure about this point. I tried unifying the fresh utys, like what RecRecord does, but I didn't find any cases where it made a difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading what RecRecord does, it feels a bit over-complicated. I think the approach of this PR is fine, because you store the free unification variable (or type, when there's an annotation) assigned to each binding in tys, and use those directly for each check later.

The RecRecord case does things a bit differently. It enriches the environment in the same way, either with a fresh unification variable or a type for fields with an apparent type. But then, it proceeds to build a record type {id1: ?a1, ..., idn: ?an} where the unification variables are entirely free and have no link whatsoever with the one inserted in the environment previously (the difference with your approach). It unifies this record type with the expected type of the whole expression, to first ensure that we are checking a record literal against a record type.

Then, in a second step, it picks the fields that didn't have any annotation/apparent type, and unifies the free-standing variables from the generated record type and what's in the environment: this step is required, because otherwise something like {foo = 1 +1, bar = foo ++ "a"} : _ would typecheck: foo will be bound to String in the environment, and the inferred type would be {foo : Number, bar : String}, but we'd be missing the constraint that the type foo : Number must agree with the inferred binding foo : String in the environment.

The only reason I can see why we do that, beside the stupid possibility of overlooking a simpler solution, is as a form of optimization. Indeed, in the recursive record case, when fields are annotated - you can expect that to happen quite often - the current approach eschews some useless identity unifications. Typically, in this PR, with let rec foo : SuperComplexType = ..., bar : OtherComplexType = ... in body, we will put (foo,SuperComplexType) and (bar, SuperComplexType) in tys. Then we will check each let-bound variable against this type, which means checking foo : T against T (after desugaring, let foo : T = bound, ... is actually let foo = (bound : T)) . This will first unify T with itself, which maybe surprisingly isn't free, in particular if T is polymorphic. Then it will check foo against T, as expected.

So, the recursive record approach eschews the additional unification for annotated fields. Even then, honestly, I think we can do it in a much simpler way in both cases: in tys (or the hashmap that we are building in the recursive record case), we can use fresh unification variables that are entirely free for fields/bindings that are annotated (because the annotation will take care of the typechecking, and the annotation is also already taken for the recursive environment), while we would use the unification variable inserted in the environment for fields/bindings that aren't annotated, instead of first re-generating entirely free variables for every fields/binding and then in a second step unify this variable with the one in the environment.

@jneem jneem marked this pull request as ready for review August 7, 2024 10:45
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some tests for the LSP as well? Also, we need to update the documentation as well, at least the syntax.

Because of the doc thing, and that I don't want to ship a last-minute new feature in the next version, I'm going to request changes, but it's just for the sake of delaying the merging of this PR.

Otherwise, we might have a look at what other languages do - I realized, while discussing with other people from the group, that Haskell does have something called lazy patterns that might be the semantics we want for destructuring. And they have a form of destructuring (although it's just desugared to pattern matching, so I suspect it's not going to support any form of recursivity). I don't know much about Scala, etc. But there might be inspiration there.

// FIXME: if we're recursive, do we need to do unify the fresh
// type variables with the (modified by the recursive check)
// binding_type? I feel like we should, but it doesn't seem to make
// a difference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading what RecRecord does, it feels a bit over-complicated. I think the approach of this PR is fine, because you store the free unification variable (or type, when there's an annotation) assigned to each binding in tys, and use those directly for each check later.

The RecRecord case does things a bit differently. It enriches the environment in the same way, either with a fresh unification variable or a type for fields with an apparent type. But then, it proceeds to build a record type {id1: ?a1, ..., idn: ?an} where the unification variables are entirely free and have no link whatsoever with the one inserted in the environment previously (the difference with your approach). It unifies this record type with the expected type of the whole expression, to first ensure that we are checking a record literal against a record type.

Then, in a second step, it picks the fields that didn't have any annotation/apparent type, and unifies the free-standing variables from the generated record type and what's in the environment: this step is required, because otherwise something like {foo = 1 +1, bar = foo ++ "a"} : _ would typecheck: foo will be bound to String in the environment, and the inferred type would be {foo : Number, bar : String}, but we'd be missing the constraint that the type foo : Number must agree with the inferred binding foo : String in the environment.

The only reason I can see why we do that, beside the stupid possibility of overlooking a simpler solution, is as a form of optimization. Indeed, in the recursive record case, when fields are annotated - you can expect that to happen quite often - the current approach eschews some useless identity unifications. Typically, in this PR, with let rec foo : SuperComplexType = ..., bar : OtherComplexType = ... in body, we will put (foo,SuperComplexType) and (bar, SuperComplexType) in tys. Then we will check each let-bound variable against this type, which means checking foo : T against T (after desugaring, let foo : T = bound, ... is actually let foo = (bound : T)) . This will first unify T with itself, which maybe surprisingly isn't free, in particular if T is polymorphic. Then it will check foo against T, as expected.

So, the recursive record approach eschews the additional unification for annotated fields. Even then, honestly, I think we can do it in a much simpler way in both cases: in tys (or the hashmap that we are building in the recursive record case), we can use fresh unification variables that are entirely free for fields/bindings that are annotated (because the annotation will take care of the typechecking, and the annotation is also already taken for the recursive environment), while we would use the unification variable inserted in the environment for fields/bindings that aren't annotated, instead of first re-generating entirely free variables for every fields/binding and then in a second step unify this variable with the one in the environment.

@github-actions github-actions bot temporarily deployed to pull request August 8, 2024 03:21 Inactive
@jneem jneem mentioned this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants