Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add steps to destroy documents that are ineligible for receiving message when posting message through broadcast channel #8972

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lozy219
Copy link

@lozy219 lozy219 commented Mar 2, 2023

As discussed in #7253, we should clearly spec that when posting message through broadcast channel, if the destination is not eligible for messaging, the associated document should be made unsalvageable. This PR add some more steps to perform this in the 9.5 Broadcasting to other browsing contexts section.

cc @fergald @rakina @domenic @annevk @smaug---- @asutherland please help to take a look 🙇


/browsing-the-web.html ( diff )
/web-messaging.html ( diff )

@lozy219 lozy219 changed the title Add steps to destroy the ineligible document when posting message through broadcast channel Add steps to destroy documents that are ineligible for receiving message when posting message through broadcast channel Mar 2, 2023
@fergald
Copy link

fergald commented Mar 3, 2023

@domenic could you take a look at this please and advise on how to land it?

source Outdated Show resolved Hide resolved
source Outdated
<ul>
<li><p>If <var>destination</var>'s <span>relevant global object</span> <var>o</var> is a
<code>Window</code>, <span data-x="destroy a document">destroy</span> <var>o</var>'s
<span data-x="concept-document-window">associated <code>Document</code></span>.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth adding a note clarifying that it is OK to trigger "destroy a document" here because the non-fully active document is either detached or is a BFCached document, which shouldn't receive BroadcastChannel messages and can't be restored after missing this message (maybe also link to https://w3ctag.github.io/bfcache-guide/#gate-fully-active)

@domenic
Copy link
Member

domenic commented Mar 3, 2023

@domenic could you take a look at this please and advise on how to land it?

Will do, but it will likely take a few days as my review backlog is quite large at the moment.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with some colleagues and we're not really okay with destroying the cache for this case. We currently deliver the messages, but we are open to not delivering them, if that would be better. I suspect that really depends on how BroadcastChannel is used so it's probably not possible to pick something that works well everywhere. Though perhaps we could pick a default and let websites configure it in the constructor.

Edit: as per discussion below WebKit does not deliver the messages.

@fergald
Copy link

fergald commented Mar 7, 2023

@annevk From my testing (see below) Mozilla and WebKit (desktop) both drop these if the receiver is in BFCache. Do you have a demontration page where they are queued and delivered?

  • BroadcastChannel
  • Chrome destroys (actually will not allow the page to enter BFCache but we want to relax this)
  • FF/WebKit drops

My reading of the spec right now is that these should be dropped because they are only delivered to active documents. A big concern is that BroadcastChannel is used to log out all tabs when one tab logs out. Dropping these will result in logged-in state being restored from BFCache, which is a serious privacy problem

In general, an API that is reliable until BFCacheing occurs will lead to hard to reproduce bugs.

We're fine with queuing but it would be hard to spec limits on queue size. Leaving that implementation dependent is an option. @asutherland how do you feel about that?

If we queue them, I think the timing is a bit ambiguous. Should they run as soon as the document is active again, i.e. before visibilitychange and pageshow events? That's my fairly naive reading of the spec.

@cdumez
Copy link

cdumez commented Mar 7, 2023

WebKit doesn't drop b/f cache entries based on BroadcastChannel (or any other Web APIs for that matter). We also do not prevent entering the b/f cache based on BroadcastChannel.

B/f cache entries are very valuable and I personally do not think it is a good idea to discard its entries based on something like BroadcastChannel messages being broadcasted.

@asutherland
Copy link

Re: @fergald

We're fine with queuing but it would be #7253 (comment) limits on queue size. Leaving that implementation dependent is an option. @asutherland how do you feel about that?

I think we're eventually going to need to spec the size of structured serialization and storage actions, so I don't think we should view that as insurmountable (but I'm also unable to do the spec work at this time, so I won't presume we have that for this discussion).

A compromise if we are going to queue/buffer things is to set a minimum number of messages that will be buffered, like 3. @cdumez would webkit be interested in supporting buffering on a per-BroadcastChannel basis? (That is, each BroadcastChannel would have its own buffer as opposed to having a shared buffer across all BroadcastChannels associated with a global.) Buffering per-channel allows a page to partition the messages they really want reliable-ish transport for to happen on a rarely used channel, like for "log out" versus "spammy heartbeats".

The upper bound could be implementation dependent, which also makes sense because browsers are going to have implementation-dependent heuristics and underlying implementation costs that determine when a bfached thing is no longer worth its cost.

If we queue them, I think the timing is a bit ambiguous. Should they run as soon as the document is active again, i.e. before visibilitychange and pageshow events? That's my fairly naive reading of the spec.

It seems desirable that a page can unambiguously know if it's seeing buffered messages from BroadcastChannel (or other APIs; ex: some discussion re sessionStorage noting that storage APIs have it easier since they have a canonical backing store). It seems like there are some options for this:

  • We add a flag to Events or their bufferable sub-classes that explicitly indicates if they're coming from buffer playback or not. It might be useful to note if the buffer overflowed or not, as one could imagine very complex pages could have a fast path and a slow path that varies on whether there was an overflow.
  • We book-end the buffered events so that we schedule the "pageshow" event task, then all of the buffered tasks, then an "all-done-with-buffered-tasks" event task or a promise that we added to "pageshow" that resolves once all the tasks have been dispatched. This has the advantage of letting page logic defer actions until it knows it is caught up to realtime which seems important. I expect in the absence of this, pages may just use setTimeout(0) which probably would work but limits implementation flexibility in the future. (Although I'm not sure of a situation where it would be desirable to let a freshly created timer interleave with the buffered event playback).

@petervanderbeken
Copy link

* FF/WebKit drops

Firefox doesn't drop messages, it removes the page from the BFCache if a message is sent to the open BroadcastChannel while in the BFCache.

From what I can tell by testing, WebKit keeps the page in the BFCache but drops the messages sent to the open BroadcastChannel while in the BFCache. @annevk, could you confirm that that's actually WebKit's behaviour? That seems quite weird as a behaviour to me. I think it makes more sense to either remove the page from the BFCache, or to queue the messages. We chose the first one because queueing is definitely more complicated (need to find a sensible limit, …).

@fergald
Copy link

fergald commented Mar 7, 2023

Clarifications. When I said "drops" I meant drops the message. So the current state is

  • Chrome: no bfcache with BC
  • Mozilla: evicts if message is received while in BFCache
  • WebKit: drops message

I thought I saw Mozilla behaving the same as WebKit but I cannot reproduce it.

To make life a little easier, I have created a WPT that sends a message to a page in BFCache. It is NOTRUN on Chrome and Mozilla (as expected). I don't have a way to test it on WebKit but I expect it will FAIL.

@fergald
Copy link

fergald commented Mar 8, 2023

Can we agree on something before discussing details?

If we restore a page from BFCache it must either

  • receive all BC messages or
  • have opted-in to dropping messages and can find out when that happened

@cdumez @annevk That is not WebKit's current behaviour, so I would particularly like your response on this.

@petervanderbeken I think you already agree but an explicit response would be helpful.

@asutherland this may be slightly different to your position because it requires an explicit opt-in. This doesn't prevent queuing by default however it means that if the queue overflows, we evict, unless the page explicitly told us that it's OK to drop.

@cdumez
Copy link

cdumez commented Mar 8, 2023

Can we agree on something before discussing details?

If we restore a page from BFCache it must either

  • receive all BC messages or
  • have opted-in to dropping messages and can find out when that happened

@cdumez @annevk That is not WebKit's current behaviour, so I would particularly like your response on this.

I am not opposed to: receive all BC messages when restoring a page from the b/f cache (probably up to a certain limit)
I thought this was what we implemented but it wouldn't surprise me if we were dropping messages instead.

Either way, we haven't noticed yet any breakage from our current behavior.

@petervanderbeken
Copy link

@petervanderbeken I think you already agree but an explicit response would be helpful.

I discussed it a bit with @smaug----, we still prefer no queueing. Queueing is complicated to spec and implement, ideally you'd then want to specify the limits (taking into account message sizes too). We already have compat issues around limits of just the length of the session history itself for example.

So our position would be: no queueing, evict on message with an opt-in for dropping messages instead.

@fergald
Copy link

fergald commented Mar 8, 2023

@petervanderbeken Do we need to spec the details of queueing?

I think you and I agree there should be only 2 outcomes:

  • document is restored from BFCache without missing messages
  • document is reloaded

To achieve that, we can just say that the messages are queued and delivered if the document becomes active again. If a browser wants to evict with no queueing they are in spec because the document never becomes active again.

@fergald
Copy link

fergald commented Apr 18, 2023

Pinging this. Does anyone object to speccing the following?

  • documents with an open BroadcastChannel can go into BFCace
  • documents reactivated from BFCache should receive all the messages that would have arrived
  • (doesn't really need to be said but for clarity here) browsers are free to destroy the document and evict the BFCache entry if they want to e.g. if the queue of messages gets too big

@petervanderbeken
Copy link

We're ok with that proposal. We will probably keep our current behaviour at first, which is essentially a message queue length of 0, but the third bullet point allows for that.

I think the main question is if WebKit is interested in changing their behaviour of dropping messages.

@annevk
Copy link
Member

annevk commented May 8, 2023

That depends on how often it would mean a cache eviction where currently there is none. We don't really want to regress on that as we haven't seen issues with our current behavior.

@fergald
Copy link

fergald commented Jun 1, 2023

@annevk

That depends on how often it would mean a cache eviction where currently there is none. We don't really want to regress on that as we haven't seen issues with our current behavior.

Less than 0.4% of history navigations in Chrome have a BC open. I do not know what fraction of that would receive messages.

I can't find any discussion or documentation of BC being unreliable beyond this thread. Before #7296, the spec seemed to say that messages should always be delivered. I don't think devs are aware that it is not reliable, especially as it is reliable on Firefox and Chrome. Pages using BC as a heartbeat will be fine but sites using BC to send incremental updates are definitely broken by this probably in ways that are very hard to trace back to BC+BFCache.

I also don't know what guidance we could give to sites that want a reliable BC. Rebuilding it from other APIs seems hard. If you don't care about reliability but do care about BFCache, closing your BCs in pagehide is easy.

@domenic
Copy link
Member

domenic commented Jun 1, 2023

We discussed this in the triage call (#9308) today. Summary:

  • WebKit would be open to a queue with an implementation-defined limit while in bfcache, with implementation-defined behavior when the limit is reached. Others on the call were not excited about the implementation-defined post-limit behavior.
  • To reach our standard of "2+ implementers support, no strong implementer objections" on requiring eviction when the limit is reached, @annevk would need to check with the rest of the WebKit team as to whether they would strongly object to evicting on exceeding a limit.
    • What would help here is more-specific numbers, which @fergald could help collect, e.g.: what percent of back navs have a BC channel open and receive 1+ messages, 2+ messages, N+ messages, etc. (Or maybe in KiB...)
    • I pointed out that if WebKit sets a limit on the same order of magnitude as "a bfcached web page", it's very unlikely that limit will ever be reached.
  • There was some discussion of how this is similar to, or not similar to, storage events. It seems like previously the decision was to not fire any storage events for bfcached pages: Session storage and changing browsing contexts storage#119 (comment) although this was never reflected into the spec. @annevk would be comfortable with dropping all BroadcastChannel messages for bfcached pages in this same manner, but others on the call were not excited about making BroadcastChannel a lossy communications channel.
  • In general @annevk was skeptical that sites using BroadcastChannel would correctly respond to queued-and-replayed messages after being taken out of bfcache. Examples of such sites, which would break their functionality with dropped messages but work correctly with replayed messages (especially 2+ replayed messages) would be helpful.

@past past removed the agenda+ To be discussed at a triage meeting label Jun 2, 2023
@fergald
Copy link

fergald commented Jun 15, 2023

I don't have data on the fraction of restores that would be impacted by a queue size of N. Gathering that would require implementing this change. I also can't test that nothing fails with queueing because that requires implementing queueing, I can only look at the code.

In general in the examples below, I cannot see a reason why anything would end up in an incorrect state if messages were delivered after restore. I can see how some cases may result in a lot of work being performed if multiple events are delivered.

There is no timing guarantee on BC messages or guarantees of ordering with respect to other message channels. So I would say that if an app cannot correctly receive 2+ messages on restore, then it was already buggy without BFCache. Although for some apps, the day-to-day likelihood of hitting such a bug could be zero.

If we believe that 2+ messages is commonly problematic, that seems like an argument for a max queue size of 1.

Examples

TL;DR

  • it's easy to find things that will break with dropped messages, including some widely used code (Firebase)
  • there are some examples that looked like they would be OK with dropped messages
  • I couldn't find any that would obviously break with a queue
  • replaying a large queue on restore will be a performance problem for some pages

Sources

I got most of these from https://snyk.io/advisor/npm-package/broadcast-channel/example. This is basically a polyfill for BroadcastChannel, defaulting to native BroadcastChannel. In the first 5 examples, 4 look like a problem and 1 looked like it was using BroadcastChannel to broadcast within the same document, so would not be a problem. I don't know how popular these apps/packages are.

The Firebase example comes from searching httparchive, it is very widely used. I found it very hard to uncover other understandable usages of BC via httparchive, so I stopped.

I also searched Google's internal source but it's hard for me to know what's launched/public and everything gets minified and finding the public version of the JS is a chore, so I'm not sharing any examples from there but there were a couple that looked problematic.

Telegram

TDWeb is telegram.org's library for building web clients. BroadcastChannel is used in the closeOtherClients method. It appears that this is called whenever a client is constructed, I suppose it ensures that you only have 1 live client at a time. This would fail if there was a client in BFCache and the message was dropped. It seems like it would be fine to deliver 2+ of these.

Firebase

https://cs.opensource.google/firebase-sdk/firebase-js-sdk/+/master:packages/installations/src/helpers/fid-changed.ts;l=97?q=%22new%20BroadcastChannel%22
broadcasts changes to the FID which I believe is the user-id. A tab that misses this broadcast will remain on the old user ID. This sounds bad if all tabs are supposed to be on the same FID.
Again, I don't see why delivering 2+ of these would break anything but if dealing with them is costly, it could result in jank.

RXDB

RXDB is a JS database that uses BroadcastChannel to make sure all tabs using the DB are in sync. That's going to fail if messages are dropped.
I don't see any reason for it to fail if messages are delivered after BFCache restore.

Lektor

Lektor is some kind of CMS. It sends a signal to reload all open tabs when changes are made. This signal comes to a worker which then sends it via BroadcastChannel. A tab coming back from BFCache would stay out of sync.
Delivering 2+ of these seems like it would not break, just reload excessively.

@lozy219 lozy219 force-pushed the destroy-ineligible-document-from-broadcast-channel branch from 202f9ae to 7a3925f Compare September 20, 2023 06:44
source Show resolved Hide resolved
browsing context</span>'s <code>WindowProxy</code>'s <span data-x="concept-windowproxy-window">
[[Window]]</span>.</p></li>

<li><p>Let <var>channels</var> be a list of <code>BroadcastChannel</code> objects whose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In step 9 of postMessage, it looks like the channels are sorted by creation order. Should we add that step here as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when broadcasting a message to multiple channel, we have to sort them in order to reserve the order, but here it's flushing all the messages received on different channels on page reactivation, it would be a bit subtle to define what's the correct order of the channel given the messages could be received from different time.

Also, the sorting behavior seems not compulsory with this sentence:

This does not define a complete ordering. Within this constraint, user agents may sort the list in any implementation-defined manner.

I will leave this open first, and change the content if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The messages from each channel's queue might be from different times, yes. But for each message, the ordering of the channel that got them should be the same right? (e.g. message X is received and there are channels A & B, and X should be received by the older channel first, regardless if X is sent immediately or later on after being queued). I think it would be good to preserve that property in this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have added a similar sentence here. Since all the channels here should have the same "relevant agents" so maybe we just need to mention the order. Is that good enough?

source Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
@lozy219
Copy link
Author

lozy219 commented Sep 22, 2023

Hi @annevk , from #8972 (comment) , it looks like we have reached an agreement that mentioning the queueing behavior with a implementation defined capacity is acceptable. We have made some updates ub this pull request, could you help to take a look and confirm if there is any objection from WebKit side?

Also does @fergald 's examples in #8972 (comment) resolved your concern about if sites are able to handle dropped messages? Do you need us to collect any more data to support the proposal?

Thanks.

@past past added agenda+ To be discussed at a triage meeting and removed agenda+ To be discussed at a triage meeting labels Oct 31, 2023
@annevk
Copy link
Member

annevk commented Nov 16, 2023

I feel like there's still no significant data either way that WebKit's existing behavior is problematic. And while we're open to adjusting it in terms of attempting to replay messages, we'd probably still err on the side of wanting to revive a document over replaying its messages. When forced to make a choice between those that is.

@fergald
Copy link

fergald commented Jun 14, 2024

Getting back to this as we have someone working on it again.

I feel like there's still no significant data either way that WebKit's existing behavior is problematic. And while we're open to adjusting it in terms of attempting to replay messages, we'd probably still err on the side of wanting to revive a document over replaying its messages. When forced to make a choice between those that is.

I've linked to code that definitely breaks, the firebase code will leave a page in the logged-in state when it should be logged out. Nobody has ever debugged this hard-to-repro problem to the point of filing a bug against Safari but the problem is still there.

If we adopt this behaviour then we need to document that BroadcastChannel is not a reliable communication channel.

I am curious what else Safari does. Does it drop or deliver storage events that occur while in BFCache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

9 participants