Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how PLAINTEXT elements may contain child nodes. #10540

Merged
merged 8 commits into from
Aug 14, 2024

Conversation

dmsnell
Copy link
Contributor

@dmsnell dmsnell commented Aug 2, 2024

Resolves #8009

When there are active formatting elements open when encountering a start tag whose name is PLAINTEXT, further character tokens may reconstruct the active formatting elements, but the spec implies that this should not happen, because PLAINTEXT effectively disables the HTML parsing after it.

Once a start tag with the tag name "plaintext" has been seen, that
will be the last token ever seen other than character tokens
(and the end-of-file token), because there is no way to switch out
of the PLAINTEXT state.

This is confusing because while the tokenizer remains in PLAINTEXT state, the tree builder continues to apply the normal rules for its insertion mode, which is where active format reconstruction may be triggered.

While this is confusing, because it seems to contradict the purpose of the PLAINTEXT element, all major browsers follow this behavior, and a clarified note in the spec could help implementors to avoid mistaking this behavior (as I did).

Before
Screenshot 2024-08-05 at 12 09 29 PM

After
Screenshot 2024-08-13 at 1 34 19 PM


/parsing.html ( diff )

Resolves whatwg#8009

All major HTML parsers reconstruct active formatting elements when
inserting a new PLAINTEXT element, leaving formatting elements as
children of the PLAINTEXT element.

However, the spec implies that this should not happen, because it
doesn't instruct reconstruction. The implication in the spec is that
a PLAINTEXT element may contain no children other than the plaintext
content of the remainder of the HTML document.

> Once a start tag with the tag name "plaintext" has been seen, that
> will be the last token ever seen other than character tokens
> (and the end-of-file token), because there is no way to switch out
> of the PLAINTEXT state.

This patch updates the spec to conform to the existing implementations
by adding the mention to trigger reconstruction.
@zcorpan
Copy link
Member

zcorpan commented Aug 2, 2024

See #8009 (comment)

@dmsnell dmsnell changed the title Reconstruct active formatting elements for PLAINTEXT element. Clarify how PLAINTEXT elements may contain child nodes. Aug 5, 2024
@dmsnell
Copy link
Contributor Author

dmsnell commented Aug 5, 2024

thanks @zcorpan - I have updated the patch and included screenshots of the changed section. I think that explicitly calling out that active format reconstruction may take place, and that PLAINTEXT elements may have child nodes, would be a worthwhile addition to the note.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
dmsnell and others added 2 commits August 6, 2024 15:12
Co-authored-by: Simon Pieters <zcorpan@gmail.com>
@dmsnell
Copy link
Contributor Author

dmsnell commented Aug 8, 2024

Thanks @zcorpan!

source Show resolved Hide resolved
source Show resolved Hide resolved
Co-authored-by: Anne van Kesteren <annevk@annevk.nl>
@annevk
Copy link
Member

annevk commented Aug 13, 2024

@dmsnell you will also need to make your membership of the "automattic" GitHub organization public to satisfy the IPR bot.

Co-authored-by: Anne van Kesteren <annevk@annevk.nl>
@dmsnell
Copy link
Contributor Author

dmsnell commented Aug 13, 2024

Thanks again @annevk. I've lower-cased the plaintext element reference, and marked my membership as public. Before today I didn't realize there were public and private memberships on my profile.

@annevk annevk merged commit caf70fa into whatwg:main Aug 14, 2024
2 checks passed
@dmsnell dmsnell deleted the reconstruct-on-plaintext branch August 14, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs implementer interest Moving the issue forward requires implementers to express interest topic: parser
Development

Successfully merging this pull request may close these issues.

Surprising parsing behavior with active formatting elements nad PLAINTEXT
4 participants