Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the AsyncContext store be a weak map? #76

Open
andreubotella opened this issue Mar 25, 2024 · 15 comments
Open

Should the AsyncContext store be a weak map? #76

andreubotella opened this issue Mar 25, 2024 · 15 comments

Comments

@andreubotella
Copy link
Member

andreubotella commented Mar 25, 2024

The AsyncContext store is a mapping from AsyncContext.Variable objects to arbitrary JS values, which cannot be iterated, and whose entries can't be observed unless you have the AsyncContext.Variable key. This would allow implementing the store as a weak map, where the values need not be kept alive after the key has been GC'd.

As far as the spec is concerned, an object or symbol in an internal field isn't considered to be live if it cannot be returned to JS (and isn't compared by identity with other values). So this allows implementations to treat the store as a weak map. Implementations are also not required to do so, since the spec doesn't require that any particular non-live object is observed as having been GC'd as far as FinalizationRegistry and WeakRef are concerned. So the AsyncContext spec text as written allows both options.

This issue has previously been discussed in the biweekly meetings or in the Matrix chatroom for the proposal, with the conclusion being that the AsyncContext store could be implemented as a regular (non-weak) map. I think this was mostly advocated for by @littledan and @legendecas, arguing that in almost every use case the AsyncContext.Variable keys would be kept alive for the duration of the program.

However, there are no issues opened (or closed) about this, and no mention of this on the readme, so this seems to be a pretty obscure point. In fact, I recently submitted a design doc for the implementation of AsyncContext in V8, which didn't mention anything about GC with respect to the entries of the store, but it seems like reviewers still assumed that it would behave like a weak map.

Since this affects not only memory usage but also GC performance, I think that if we decide that the store should be a strong map, we should have good and documented reasons for it, and set expectations. This would have to be mentioned in the readme and in a note in the spec text, as well as mentioned in plenary.

@Qard
Copy link

Qard commented Mar 26, 2024

I would be favour of it being a WeakMap. While it is true that generally stores will exist for the life of the program, it's also true that in an ecosystem as large as JS things will inevitably be used wrong all the time. 😅

The AsyncContextFrame design which Cloudflare (and my in-progress change to Node.js) uses is presently a Map, but could equally work as a WeakMap. My only concern would be that we need to measure the performance difference of reading from stores with either implementation in real-world scenarios--reading stores tends to happen a bit more than writing stores, but not nearly as often as propagating the current store. As long as reusing/copying/propagating the store from one async scope to the next is low-cost it should be generally fine, I would expect.

@littledan
Copy link
Member

WeakMaps are more expensive GC-wise than Maps, and I don’t understand what benefit they would bring. Maybe we could find a scenario where we expect WeakMap to enable GC where a Map wouldn’t? I just have trouble picturing it.

@mhofman
Copy link
Member

mhofman commented Apr 1, 2024

I really don't understand why it matters since from what I understand no spec steps iterate or otherwise return the keys. It effectively is a WeakMap and the implementation can take whatever approach it wants there. Would an editorial note highlighting this not be sufficient?

@andreubotella
Copy link
Member Author

andreubotella commented Apr 1, 2024

I really don't understand why it matters since from what I understand no spec steps iterate or otherwise return the keys. It effectively is a WeakMap and the implementation can take whatever approach it wants there. Would an editorial note highlighting this not be sufficient?

As far as the spec text is concerned, yes. But if we expect that implementations would treat the store as a map, that needs to have an effect on developer-facing documentation, in that developers should be made aware that values can leak. That's why I don't think this decision can be completely up to the implementations' discretion.

@mhofman
Copy link
Member

mhofman commented Apr 1, 2024

I think I missing the context of why, if the developer drops the Variable, why we'd want to keep the mapping and associated values alive (until the async context has no more continuations)

@quisido
Copy link

quisido commented Apr 6, 2024

I agree with WeakMap. The following example would be a memory leak, no?

function a(request){
  const aVar = new Variable();

  aVar.run(request.url, () => {
    console.log(aVar.get());
  });
}

server.listen(a);

This would create an entry in the Map for every request, none of which would expire after the request completes.

I don't think "well don't use it that way then" is a good reason not to resolve this for users when WeakMap was invented to solve this.

It's unintuitive to think that each Variable instance leaves behind memory residue after the instance itself has been garbage collected. How many other classes in the language do?

@andreubotella
Copy link
Member Author

andreubotella commented Apr 6, 2024

I agree with WeakMap. The following example would be a memory leak, no?

function a(request){
  const aVar = new Variable();

  aVar.run(request.url, () => {
    console.log(aVar.get());
  });
}

server.listen(a);

This would create an entry in the Map for every request, none of which would expire after the request completes.

I don't think "well don't use it that way then" is a good reason not to resolve this for users when WeakMap was invented to solve this.

It's unintuitive to think that each Variable instance leaves behind memory residue after the instance itself has been garbage collected. How many other classes in the language do?

This depends a lot on implementation decisions, but the way a straightforward implementation of the spec would go (and the way I'm implementing it in V8), this code by itself would not cause a memory leak even if the store is a strong map. This is because calling run creates a clone of the current map with the new entry, and restores the previous one when the function returns.

However, if something inside the callback calls .then() on a promise, or calls an async function, then that promise will keep a reference to that cloned map containing aVar until the promise resolves – and follow-on promises (created by calling .then() or awaiting that promise) will contain clones of that map which might keep the stored value alive, even though the variable needed to access it might not be.

@quisido
Copy link

quisido commented Apr 6, 2024

I'm confused. The run aspect of the example isn't the concern; it's creating a new Variable instance on every request. Per the OP, each request then inserts a key into the context store (variable instance mapped to arbitrary value), and that instance would never clear from memory even after the variable instance's closure has ended.

Smaller example, assuming run doesn't need to be called to be inserted into the context map:

server.listen(() => {
  new Variable();
});

@andreubotella
Copy link
Member Author

andreubotella commented Apr 6, 2024

I'm confused. The run aspect of the example isn't the concern; it's creating a new Variable instance on every request. Per the OP, each request then inserts a key into the context store (variable instance mapped to arbitrary value), and that instance would never clear from memory even after the variable instance's closure has ended.

It's not like there is a single context store that remains alive for the duration of the program. Calling run creates a clone of the store, that is active until the callback returns, and after that, the previous store is restored. If the store active inside the callback is not referenced from anywhere, it will be GC'd, and any other stores will not contain entries for the new variable. But awaiting or calling .then() on promises (or doing other things like creating an AsyncContext.Snapshot or calling setTimeout) will reference that new map, and might keep it alive after the variable is dead.

@littledan
Copy link
Member

What I am asking for is an example of when a variable will be dead while you’re still running code descendant from var.run(). It is hard for me to think of examples because I usually think of AsyncContext.Variable objects being defined at the top level of a module (so, eternal).

@Qard
Copy link

Qard commented Apr 8, 2024

It generally would be eternal, but some may (mistakenly) create a store per-request as per the example. In most cases this should be fine and naturally disappear eventually as snapshots which reference it do, but there are sometimes cases where context loss occurs due to things like connection pools which could result in a snapshot getting recycled over and over as it passes from one request to the next through a broken connection pool propagation. We see this relatively often at Datadog so I think it is something we should consider.

@littledan
Copy link
Member

Huh, OK, variable added per request, then the request gets recycled in a pool... yeah I see how that can cause a leak that could be resolved by this being a WeakMap (though the WeakMap would only work for the cases where you really create your own new variable object; if you reuse an existing one, you still have a leak). I wonder if there's anything we could do to make it less attractive to misuse connection polls this way, with respect to AsyncContext.

@Qard
Copy link

Qard commented Apr 10, 2024

The counterpoint is that while it is a leak, a broken connection pool is going to just keep leaking the same store forever, not a new store every time, so it's likely an inconsequential leak. I'm a bit mixed on which is the better model, they both have trade-offs.

@mhofman
Copy link
Member

mhofman commented Apr 11, 2024

I still don't follow the motivation of the discussion.

The spec defines a lower bound for liveness: if a value can be observed without FR/WR, it is live. A value held in an AsyncContext.Variable clearly cannot be observed if the variable instance is no longer reachable. It effectively is a join on the liveness of the variable and of the async context, which is the same semantics as WeakMap values, which is a join on the liveness of the WM instance and of the key. We would need to change the definition of liveness for the language if you wanted to require implementations to delay collection until a further point. Such a mechanism exists in the form of the kept objects list, which ensures that a WeakRef can be derefed multiple times in the same synchronous execution. I do not believe there is any motivation to do something similar here.

Now if the question is whether an implementation can "simplify" its implementation, use a Map, and thus leak variables and their value, well that's not a spec concern, and I have a laundry list of other places where implementations already leak today (I'm looking at you, promises), and which are not well documented anywhere.

@andreubotella
Copy link
Member Author

andreubotella commented Apr 18, 2024

One of the possibilities for the web integration of AsyncContext is that script-triggered same-origin navigations would use the context in which you start the navigation (e.g. where you set location.href) as the context in which HTML parsing and initial script execution takes place for the navigated page. (This would be observable because you could send AsyncContext.Variable objects to a different same-origin window before the navigation, and get them back afterwards.)

If we end up going with this, then even if most AsyncContext.Variables live for the remainder of the page/realm, contexts that use them can still live past that. So in browsers, this would necessitate a weak map implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants