Let blocks #2010

jneem · 2024-07-25T15:11:51Z

This implements let blocks, but only for "plain" bindings that don't do any destructuring.

I looked into handling destructuring. As long as we don't need to allow recursive bindings, I think destructuring can be handled by desugaring into a match block that matches an array of all the right hand sides. I.e.,

let
  a = 1,
  { foo } = { foo = 2 }
in ...

becomes

[1, { foo = 2 }] |> match { [a, { foo }] => ... }

Recursive let blocks seem to be rather more tricky with the current pattern compilation strategy.

yannham

Thanks for getting to ball rolling on this long awaited feature 🙂 as far as I can tell the code looks good, but I have a couple reservations at the time being at the design level:

We're trading the let-binding structure for something more complicated (a hashmap), even for a single-var let-binding, which I suppose is the most common case. Do we have an idea of the cost of that (maybe that will be easier to answer thanks to your other PR on continuous benchmarks!). Subsidiary question: do we need a hashmap here? I would rather expect a Vec, given that we are going to iterate through them and insert them in the environment.
I think we should first try to pause and think about the following possible combinations: normal let block as implemented in this PR, let rec block (but still without pattern), let blocks with patterns and let rec blocks with patterns. In particular, what's the semantics and the possible desugaring/compilation of those. And more importantly, is each combination meaningful? Or can we get away with restricting the possibilities? I'm fine restricting let-blocks, but I don't think the fact that it wouldn't work well with how we compile pattern currently is a good motivation, without having either a plan B for how it could work in the future, or a good reason to rule let rec blocks with patterns because they're expected to be much less useful.

jneem · 2024-07-30T09:52:48Z

We're trading the let-binding structure for something more complicated

Oh, good point. I think a Vec should be fine, and probably even a SmallVec or similar. I'll benchmark the options.

I think it isn't too difficult to support let blocks with patterns, so let me try to do that. If that works, the forbidden combination would be rec + pattern, which is a bit arbitrary but at least it's easy to explain. Ultimately, it would be nice to just not have any restrictions at all. I don't think it's hard to fix it, it will just take some work. The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

jneem · 2024-07-31T11:13:11Z

Ok, I've added support for patterns in let blocks.

I also measured the performance. It looks like the IndexMap hurt performance a bit, and changing it to SmallVec is as fast as the original. It's a little hard to be sure, since the benchmarks are pretty noisy.

yannham · 2024-07-31T14:52:14Z

So, the semantics of let pat1 = val1, ..., patn = valn in body is let [pat1, ..., patn] = [val1, ..., valn]. I think that's fair, although probably a bit wasteful as an actual desugaring. But it's ok for a new feature. It's also likely very easy to compile to a sequence of patterns directly in a more efficient manner without allocating, building and destructuring an array (I'm thinking implementing a variant of Compile for something wrapping [Pattern] and taking a sequence of arguments, or something like that).

The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

I think pattern compilation is already doing something close to what you want. It basically compiles a pattern - given the term to match on - to an expression that evaluates to null or to a record of the bindings introduced by the pattern. The final step is to call %pattern_branch% or sth that evaluates the branch in an environment augmented with the bindings. We can change this last step to something more involved. But I fear this is going to be more complicated, because before we can bind any of the pattern variable in the environment, we first have to evaluate part of the matched value. I suspect we'll need something similar to what we do for recursive records, that is patching the environment afterwards to "tie the knot" properly.

In general, for recursive let-blocks with patterns, both the desugaring and the possible compilation aren't entirely crystal clear to me right now. Looking at a single recursive destructuring let could be interesting, something like let {x=a, y=b} = ...whatever uses a and b.... in body. Once again, before even being able to build the recursive environment for a and b, you first need to evaluate partially the body first. Maybe we can fill the environment with "bombs" initially, like InfinteRecursion errors (because obviously if we need a before we can even bind a, this isn't going to work). But then, we will bind a to a closure with those bombs in the environment, and we'll then need to patch them later with the actual recursive environment. As far as I can tell, this requires a bit more machinery that what's currently available. Or at least to repurpose part of the recursive records' machinery to that end.

jneem · 2024-07-31T15:32:05Z

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
  %0 = foo,
  a = %0.x,
  %1 = %0.y,
  c = %1.z
in
bar

I think this extends to handle arrays pretty easily. I'm less sure about constants and ADTs. Maybe those could be handled by extending the "primitive" (i.e. non-pattern) let term to allow constants or patterns of the form 'Foo x

yannham · 2024-08-01T12:55:05Z

What about let rec {x=a, y = {z = c}} = foo in bar becomes
let rec
 %0 = foo,
 a = %0.x,
 %1 = %0.y,
 c = %1.z
in
bar

So, if we have {x=[{y=[c]}]]}, we'll need to have c as a top-level binding I guess. It seems it's like pattern matching except that you bubble up all the bound variables as a top-level let-binding.

I think this illustrates my point: we now have to (re?)define the semantics of patterns for both pattern matching and destructuring separately, and also probably have separate implementations, as we used to, beyond the fact that I trust you can find a reasonable semantics for recursive destructuring. But as before, they might diverge or have different evaluation strategy, which can be surprising. It's not necessarily a no-go, but it's not very satisfying. I just wondered if could unify both in one core compilation scheme (even if there recursive side is actually never used in actual match expression but only in destructuring).

I think what the pattern matching compilation is doing is pretty close in spirit already. It just does that in pure Nickel code, by putting stuff in a record in several steps instead of bubbling up all the bindings at the top-level. One possibility is to make pattern compilation more like what you propose. Another is to do what I propose, and patch the result at the end. The issue with the latter is in case of infinite recursion (the current pattern matching compilation would cause an unbound variable error instead), but I think putting "bombs" in the environment and shadowing them at the end once we can build the recursive environment is actually maybe not that bad.

yannham

I did a more thorough review. It looks good, I'm just requesting changes to block merging because I think we shouldn't include this in 1.8, which is too close.

I think it could have more tests, in particular for typechecking recursive let-blocks or let-blocks with patterns.

It's also a bit annoying that we often have to do two traversals of the bindings just for the recursive case, as mentioned in a comment. But I guess we can live with it.

core/src/parser/utils.rs

core/src/pretty.rs

core/src/transform/desugar_destructuring.rs

core/src/transform/free_vars.rs

lsp/nls/src/position.rs

yannham · 2024-08-01T13:42:27Z

lsp/nls/src/usage.rs

+
+ for (_, val) in bindings {
+ self.fill(val, if attrs.rec { &new_env } else { env });
+ }


It's probably not going to have any import consequence, but it's a bit annoying that we have to do those double iterations each time just in case the let is recursive. One possibility would be to distinguish both cases in the match, and have a different implementation, which leads obviously to a bit of code duplication. Or adds if wherever needed (to do everything in the first for and elide the second in the non-recursive case), but I fear this will make the code not very readable. So I don't have a great idea nor a strong opinion. It's just... itchy 😛

Yes, I definitely agree. This was the clearest version I could come up with, even though it isn't the most efficient.

github-actions · 2024-08-05T10:24:35Z

Bencher

Report	Thu, August 8, 2024 at 03:25:21 UTC
Project	nickel
Branch	2010/merge
Testbed	ubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Latency (latency)
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results

Benchmark	Latency	Latency Results nanoseconds (ns)
fibonacci 10	➖ (view plot)	491,960.00
pidigits 100	➖ (view plot)	3,193,300.00
product 30	➖ (view plot)	796,660.00
scalar 10	➖ (view plot)	1,468,900.00
sum 30	➖ (view plot)	795,990.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

jneem · 2024-08-05T10:58:39Z

I took a look at doing recursive pattern bindings by putting "bombs" in the environment, but I had trouble working out when to replace them. Like, in

let rec
  { foo = a, .. } = b,
  b = { foo = 1, c = a },
in b.c

then we want to start off by poisoning "a", then evaluating "b" to normal form, then we need to execute the pattern binding and unpoisoning "a", and then evaluating "c.a". But doing things in a different order could easily make us evaluate "a" too early, and I don't see a general way to do it...

Also, one thing I noticed about recursive pattern bindings is that it might want to be lazier than non-recursive bindings. For example, we currently eagerly evaluate enough of the bound value to check whether it matches the pattern but it isn't too hard to come up with examples of recursive bindings where this leads to infinite recursion that could be avoided.

yannham · 2024-08-05T11:17:02Z

Hmm. The more we talk about it, it the more it seems like recursive destructuring is just maybe something we should entirely avoid. Because if we have trouble defining the implementation and lazyness of this scheme, it's most likely neither obvious nor natural (in the mathematical sense of canonical) for the user as well.

jneem · 2024-08-06T02:53:45Z

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

let rec
  a = { foo = 3 }
  b = a.foo + 1
in b

but then if I want to refactor it to

let rec
  { foo } = { foo = 3 }
  b = foo + 1
in b

then I need to break up the let block into a sequence of lets. But I agree, if we don't have good semantics in general then it doesn't make sense to add it.

jneem · 2024-08-06T04:37:39Z

Ok, I tried writing tests and there are a bunch of problems with typechecking recursive let blocks. We'll need to do something more like what recursive records do, but I need to understand that first...

yannham · 2024-08-06T07:42:45Z

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

One possibility would be to rule out more complicated cases, because it's not obvious that they are really useful in practice. So, for example, we could require that the right hand side of a destructuring let rec must evaluate to a WHNF without needing the recursive bindings. Then we could do the poisoning approach without caring about the order.

Otherwise, I reckon the right semantics for recursive destructuring is probably the old one where it's just desugared into a sequence of lazy let-bindings. However, I think there were non trivial issues around default values, lazyness (something like let 'Foo 5 = value in ... would be transparent because the pattern doesn't bind anything, while today it's desugared to value |> match { 'Foo 5 => ...} which does check that value is 'Foo 5), and error reporting. I just wonder if recursive destructuring is useful enough to 1) go back to this more complicated situation with discrepancies between pattern matching and destructuring, and more line of code in the implementation 2) breaking backward compatibility (again)

jneem · 2024-08-07T10:45:36Z

core/src/typecheck/mod.rs

+ // FIXME: if we're recursive, do we need to do unify the fresh
+ // type variables with the (modified by the recursive check)
+ // binding_type? I feel like we should, but it doesn't seem to make
+ // a difference.


I'm a bit unsure about this point. I tried unifying the fresh utys, like what RecRecord does, but I didn't find any cases where it made a difference.

Rereading what RecRecord does, it feels a bit over-complicated. I think the approach of this PR is fine, because you store the free unification variable (or type, when there's an annotation) assigned to each binding in tys, and use those directly for each check later.

The RecRecord case does things a bit differently. It enriches the environment in the same way, either with a fresh unification variable or a type for fields with an apparent type. But then, it proceeds to build a record type {id1: ?a1, ..., idn: ?an} where the unification variables are entirely free and have no link whatsoever with the one inserted in the environment previously (the difference with your approach). It unifies this record type with the expected type of the whole expression, to first ensure that we are checking a record literal against a record type.

Then, in a second step, it picks the fields that didn't have any annotation/apparent type, and unifies the free-standing variables from the generated record type and what's in the environment: this step is required, because otherwise something like {foo = 1 +1, bar = foo ++ "a"} : _ would typecheck: foo will be bound to String in the environment, and the inferred type would be {foo : Number, bar : String}, but we'd be missing the constraint that the type foo : Number must agree with the inferred binding foo : String in the environment.

The only reason I can see why we do that, beside the stupid possibility of overlooking a simpler solution, is as a form of optimization. Indeed, in the recursive record case, when fields are annotated - you can expect that to happen quite often - the current approach eschews some useless identity unifications. Typically, in this PR, with let rec foo : SuperComplexType = ..., bar : OtherComplexType = ... in body, we will put (foo,SuperComplexType) and (bar, SuperComplexType) in tys. Then we will check each let-bound variable against this type, which means checking foo : T against T (after desugaring, let foo : T = bound, ... is actually let foo = (bound : T)) . This will first unify T with itself, which maybe surprisingly isn't free, in particular if T is polymorphic. Then it will check foo against T, as expected.

So, the recursive record approach eschews the additional unification for annotated fields. Even then, honestly, I think we can do it in a much simpler way in both cases: in tys (or the hashmap that we are building in the recursive record case), we can use fresh unification variables that are entirely free for fields/bindings that are annotated (because the annotation will take care of the typechecking, and the annotation is also already taken for the recursive environment), while we would use the unification variable inserted in the environment for fields/bindings that aren't annotated, instead of first re-generating entirely free variables for every fields/binding and then in a second step unify this variable with the one in the environment.

yannham

Should we add some tests for the LSP as well? Also, we need to update the documentation as well, at least the syntax.

Because of the doc thing, and that I don't want to ship a last-minute new feature in the next version, I'm going to request changes, but it's just for the sake of delaying the merging of this PR.

Otherwise, we might have a look at what other languages do - I realized, while discussing with other people from the group, that Haskell does have something called lazy patterns that might be the semantics we want for destructuring. And they have a form of destructuring (although it's just desugared to pattern matching, so I suspect it's not going to support any form of recursivity). I don't know much about Scala, etc. But there might be inspiration there.

yannham · 2024-08-07T15:09:51Z

core/src/typecheck/mod.rs

+ // FIXME: if we're recursive, do we need to do unify the fresh
+ // type variables with the (modified by the recursive check)
+ // binding_type? I feel like we should, but it doesn't seem to make
+ // a difference.


Rereading what RecRecord does, it feels a bit over-complicated. I think the approach of this PR is fine, because you store the free unification variable (or type, when there's an annotation) assigned to each binding in tys, and use those directly for each check later.

The RecRecord case does things a bit differently. It enriches the environment in the same way, either with a fresh unification variable or a type for fields with an apparent type. But then, it proceeds to build a record type {id1: ?a1, ..., idn: ?an} where the unification variables are entirely free and have no link whatsoever with the one inserted in the environment previously (the difference with your approach). It unifies this record type with the expected type of the whole expression, to first ensure that we are checking a record literal against a record type.

Then, in a second step, it picks the fields that didn't have any annotation/apparent type, and unifies the free-standing variables from the generated record type and what's in the environment: this step is required, because otherwise something like {foo = 1 +1, bar = foo ++ "a"} : _ would typecheck: foo will be bound to String in the environment, and the inferred type would be {foo : Number, bar : String}, but we'd be missing the constraint that the type foo : Number must agree with the inferred binding foo : String in the environment.

The only reason I can see why we do that, beside the stupid possibility of overlooking a simpler solution, is as a form of optimization. Indeed, in the recursive record case, when fields are annotated - you can expect that to happen quite often - the current approach eschews some useless identity unifications. Typically, in this PR, with let rec foo : SuperComplexType = ..., bar : OtherComplexType = ... in body, we will put (foo,SuperComplexType) and (bar, SuperComplexType) in tys. Then we will check each let-bound variable against this type, which means checking foo : T against T (after desugaring, let foo : T = bound, ... is actually let foo = (bound : T)) . This will first unify T with itself, which maybe surprisingly isn't free, in particular if T is polymorphic. Then it will check foo against T, as expected.

So, the recursive record approach eschews the additional unification for annotated fields. Even then, honestly, I think we can do it in a much simpler way in both cases: in tys (or the hashmap that we are building in the recursive record case), we can use fresh unification variables that are entirely free for fields/bindings that are annotated (because the annotation will take care of the typechecking, and the annotation is also already taken for the recursive environment), while we would use the unification variable inserted in the environment for fields/bindings that aren't annotated, instead of first re-generating entirely free variables for every fields/binding and then in a second step unify this variable with the one in the environment.

github-actions bot temporarily deployed to pull request July 25, 2024 15:15 Inactive

github-actions bot temporarily deployed to pull request July 25, 2024 15:40 Inactive

github-actions bot temporarily deployed to pull request July 25, 2024 15:54 Inactive

github-actions bot temporarily deployed to pull request July 25, 2024 16:19 Inactive

github-actions bot temporarily deployed to pull request July 26, 2024 06:08 Inactive

jneem marked this pull request as ready for review July 26, 2024 06:15

jneem requested a review from yannham July 26, 2024 06:15

yannham reviewed Jul 30, 2024

View reviewed changes

jneem added 9 commits July 31, 2024 12:29

Add let blocks to the AST

79706d1

Add parsing and tests

3cc1a07

Fix build

8d3dece

Missing insta

d0df237

Fix it again

eabb794

Add a custom parse error

f282553

Try out smallvec

815a3df

Support multiple patterns in the ast

2dc6ab0

Support constructing let blocks with patterns

f3f5467

jneem force-pushed the let-blocks branch from f9a19b5 to f3f5467 Compare July 31, 2024 11:05

github-actions bot temporarily deployed to pull request July 31, 2024 11:08 Inactive

Fix warning

d7f31a0

github-actions bot temporarily deployed to pull request July 31, 2024 11:31 Inactive

yannham requested changes Aug 1, 2024

View reviewed changes

Review comments

2379b4b

github-actions bot temporarily deployed to pull request August 5, 2024 10:20 Inactive

jneem requested a review from yannham August 5, 2024 10:29

WIP

ad539af

jneem marked this pull request as draft August 6, 2024 04:37

WIP

780bfba

github-actions bot temporarily deployed to pull request August 7, 2024 09:36 Inactive

Fix typechecking

f078ae1

github-actions bot temporarily deployed to pull request August 7, 2024 10:27 Inactive

jneem commented Aug 7, 2024

View reviewed changes

jneem marked this pull request as ready for review August 7, 2024 10:45

yannham requested changes Aug 7, 2024

View reviewed changes

jneem added 3 commits August 8, 2024 09:08

Add lsp tests

9127ac5

Add an error for duplicate bindings in a let block

e8a4338

Update the manual

d8dbaf6

github-actions bot temporarily deployed to pull request August 8, 2024 03:21 Inactive

jneem mentioned this pull request Aug 28, 2024

Recursive destructuring #2027

Open

jneem mentioned this pull request Sep 6, 2024

Let rec patterns #2031

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let blocks #2010

Let blocks #2010

jneem commented Jul 25, 2024 •

edited

Loading

yannham left a comment

jneem commented Jul 30, 2024

jneem commented Jul 31, 2024

yannham commented Jul 31, 2024 •

edited

Loading

jneem commented Jul 31, 2024

yannham commented Aug 1, 2024 •

edited

Loading

yannham left a comment

yannham Aug 1, 2024

jneem Aug 5, 2024

github-actions bot commented Aug 5, 2024 •

edited

Loading

jneem commented Aug 5, 2024

yannham commented Aug 5, 2024

jneem commented Aug 6, 2024

jneem commented Aug 6, 2024

yannham commented Aug 6, 2024 •

edited

Loading

jneem Aug 7, 2024

yannham Aug 7, 2024

yannham left a comment

yannham Aug 7, 2024

Let blocks #2010

Are you sure you want to change the base?

Let blocks #2010

Conversation

jneem commented Jul 25, 2024 • edited Loading

yannham left a comment

Choose a reason for hiding this comment

jneem commented Jul 30, 2024

jneem commented Jul 31, 2024

yannham commented Jul 31, 2024 • edited Loading

jneem commented Jul 31, 2024

yannham commented Aug 1, 2024 • edited Loading

yannham left a comment

Choose a reason for hiding this comment

yannham Aug 1, 2024

Choose a reason for hiding this comment

jneem Aug 5, 2024

Choose a reason for hiding this comment

github-actions bot commented Aug 5, 2024 • edited Loading

Bencher

jneem commented Aug 5, 2024

yannham commented Aug 5, 2024

jneem commented Aug 6, 2024

jneem commented Aug 6, 2024

yannham commented Aug 6, 2024 • edited Loading

jneem Aug 7, 2024

Choose a reason for hiding this comment

yannham Aug 7, 2024

Choose a reason for hiding this comment

yannham left a comment

Choose a reason for hiding this comment

yannham Aug 7, 2024

Choose a reason for hiding this comment

jneem commented Jul 25, 2024 •

edited

Loading

yannham commented Jul 31, 2024 •

edited

Loading

yannham commented Aug 1, 2024 •

edited

Loading

github-actions bot commented Aug 5, 2024 •

edited

Loading

yannham commented Aug 6, 2024 •

edited

Loading