PHP 8.3: "yield /comment/ from" is no longer a parse error ? #14926

jrfnl · 2024-07-11T21:53:25Z

Description

The following code:

<?php
function generator()
{
    yield from gen2();

    yield /* comment */ from gen2();

    yield
    from
    gen2();

    yield // comment
	from gen2();

    yield
    /* comment */
    from
    gen2();
}

https://3v4l.org/2SI2Q#veol

Resulted in this output:

_no output_

But I expected this output instead:

Parse error: syntax error, unexpected identifier "gen2" in /in/2SI2Q on line 7

For a full analysis of the differences in tokenization, see: PHPCSStandards/PHP_CodeSniffer#529 (comment)

While I'm not necessarily challenging this change, I'd like to know if this was a deliberate/intentional change.

I have not been able to find anything in the PHP 8.3 CHANGELOG about this change, nor in the NEWS file. I can't even seem to find the commit which caused this change.

If this change was intentional, this is probably a documentation issue and the change should be annotated in the PHP 8.3 CHANGELOG.

If this change was unintentional, I believe it may be prudent to revert the change (or at least revert the side-effects of the commit which incidentally caused this change).

I'd like to get some clarity about this as it will inform how the PHP_CodeSniffer issue linked above should be fixed.

PHP Version

PHP 8.3.8

Operating System

Not relevant

The text was updated successfully, but these errors were encountered:

damianwadley · 2024-07-11T22:15:50Z

This was changed by @iluuu1994 during f291d37 in response to #10083...

Tip: for grammar questions, one of the first places to check for answers is zend_language_scanner.

nielsdos · 2024-07-11T22:16:43Z

Damnit, beat me on commenting literally by seconds 😂
Indeed, this is a side effect of the bugfix and imo not a bug.

jrfnl · 2024-07-11T22:37:01Z

I do kind of consider it a bug as it is an undocumented change in behaviour - the changelog entry really doesn't cover the potential/actual impact of the change.

Knowing where this is coming from, I'm surprised I haven't seen more breakage due to this change in PHPCS. Then again, that would require the PHPCS tests to hit the changed tokenization behaviour and as this kind of thing was previously a parse error, I can understand why this is not covered by the existing tests.

If this change stays, in my opinion, the changelog entry needs to be updated to more comprehensively state what has actually changed and what the impact of this. I also think it should be mentioned in the CHANGELOG, not just in NEWS.

Additionally, I'd recommend for tests to be added with the above code sample for yield from to prevent a regression in the future. Preferably tests which would not just check that the above code sample is not regarded as a parse error, but tests which also check the tokenization is consistent.

jrfnl · 2024-07-11T22:37:59Z

P.S.: Thanks @damianwadley and @nielsdos for your fast response to my query!

cmb69 · 2024-07-12T12:30:48Z

Since this affects PHP 8.3, I'm not sure whether updating CHANGELOG makes much sense now, but it's certainly worth documenting it in the migration guide of the PHP documentation.

iluuu1994 · 2024-07-12T13:01:37Z

@jrfnl I indeed didn't think about linters, this should have gotten an UPGRADING entry. Sorry about that.

mvorisek · 2024-07-14T00:07:47Z

IMO it is not a bug, as comments are (should be) allowed between any code token in general.

This is probably already documented or at least it is "generally expected". So f291d37 is a fix.

iluuu1994 · 2024-07-14T00:10:48Z

@mvorisek What Juliette references is the patch changing the output of PhpToken::tokenize(), with the tokenizer treating yield /* comment */ from as a singular token. This is the only feasible solution without switching to a GLR parser.

mvorisek · 2024-07-14T00:25:04Z

That is indeed not good and such token should be broken into separate tokens further.

repro: https://3v4l.org/7VmX1

jrfnl · 2024-07-14T00:44:30Z

Forgive me if I don't use the correct terminology for some things, I'm hoping that even if the terminology is not 100% correct for C, you can still understand what I'm trying to say.

IMO it is not a bug, as comments are (should be) allowed between any code token in general.

... which is exactly the reason why I was testing the Tokenizer behaviour with comments between the words.

Having said that, it's also not true....

In nearly all cases, tokens consist of a "single entity", so no comments can exist within a token.
This includes "compound operators", like ??, where no whitespace or comment is allowed between the first and the second ?.

A typical case where the tokenization changed from multiple tokens to single token is the PHP 8.0 "namespaced names" change.
However, that change also explicitly removed the ability to have whitespace or comments within a namespaced name.

Another one which comes to mind are cast tokens, like (bool). These tokens can contain whitespace, i.e. ( bool ), but they cannot contain comments. See: https://3v4l.org/A6Sgj and https://3v4l.org/nE9H8

So, in practice, yield from is (AFAICS) the only "multi-entity" token which can contain comments... (since PHP 8.3).

Retrospectively, it would have been better if yield from would have been introduced as yield_from (note the underscore). This would be in line with other multi-word keywords, like include_once or as yieldfrom, which would be in line with keywords like endif, but considering it has been in the language for such a long time now, changing this now would be a huge BC-break.

I consider the change in f291d37 a bug as it is an undocumented (breaking) change in behaviour for the Tokenizer for projects consuming the token output.
That bug could be fixed by either fixing/updating the documentation or by reverting the change.

So, what remains then is the discussion about how yield from should be tokenized.

I'm not sure about this and you can also see some thoughts about this in the linked PHPCS ticket (as PHPCS needs to ensure the token stream sniffs receive is the same on all PHP versions, so I need to make a choice on how to get round the changed tokenization).

On the one hand I agree that comments between the keywords should be ignored - as they mostly are elsewhere -, on the other hand, comments cause a parse error in other "multi entity" tokens, so this change introduces an inconsistency.

iluuu1994 · 2024-07-14T07:37:31Z

In nearly all cases, tokens consist of a "single entity", so no comments can exist within a token. This includes "compound operators", like ??, where no whitespace or comment is allowed between the first and the second ?.

That's true, but what qualifies as a token is usually very intuitive. yield from is one of the few exceptions, and it's primary rationale for being a single token to avoid making from itself a keyword.

So, in practice, yield from is (AFAICS) the only "multi-entity" token which can contain comments... (since PHP 8.3).

I think that's correct. The other cases only use lookahead for comments.

So, what remains then is the discussion about how yield from should be tokenized.

Unfortunately, for yield from to not cause parser ambiguitites, it must be a single token. Hence, when a comment does appear in that spot, it is implicitly part of the token. Since 8.3 has been out for a while, I don't think removing it now is feasible, and if it is removed from 8.4, then that still leaves 8.3 as an issue for you.

It might be possible for you to artificially split yield /* comment */ from into multiple tokens. It's not great to push that responsibility onto users, but I don't see a great alternative at the moment. If you have any other ideas that might help with your situation, please let me know.

jrfnl · 2024-07-14T09:31:34Z

Unfortunately, for yield from to not cause parser ambiguitites, it must be a single token. Hence, when a comment does appear in that spot, it is implicitly part of the token.

And that's what I don't necessarily agree with - and nor did PHP 7.0 - 8.2 in which this was a parse error.

Since 8.3 has been out for a while, I don't think removing it now is feasible

And again, I don't agree with this. I only discovered this bug now as the change wasn't documented. That doesn't make it any less of a bug/inconsistency with every single other token, as none of them allow for comments within the token.

If it had been included in the changelog, I would probably have questioned the change/introduction of this inconsistency before the release was out.

Note: I'm only challenging the change to the yield from tokenization. Not the other changes in the same commit.

cmb69 · 2024-07-14T10:29:18Z

Well, we can't go back in time, so the question is what would break more code: reverting the change for yield from, or leaving it as is. I'd hope the former, but I'm afraid the latter might actually be the case (coders are so "creative": "let's just put a comment between yield and from, because we can").

iluuu1994 · 2024-07-14T10:58:22Z

I would agree with Christoph. From what I understand, PHP_CodeSniffer would either not correctly format the comment, or mistakenly remove it. While that's not great, it's not breaking. Removing parsing at this point would fatal error on a PHP patch update, and I think that's generally a no-go. Please correct me if my understanding is incorrect.

jrfnl · 2024-07-14T11:17:27Z

The other side of that argument is that the chances of people having discovered that they can place a comment between yield and from now are slim as the change was not documented/publicized and has only been out 7 months...

cmb69 · 2024-07-14T11:29:07Z

Maybe we should ask for opinions on the internals mailing list?

nielsdos · 2024-07-15T11:04:40Z

But what is the "solution" anyway? Complicating the lexer to deal with "yield from" specially?

cmb69 · 2024-07-15T11:13:48Z

But what is the "solution" anyway? Complicating the lexer to deal with "yield from" specially?

I think this is about reverting

<ST_IN_SCRIPTING>"yield"{WHITESPACE_OR_COMMENTS}"from"[^a-zA-Z0-9_\x80-\xff] {

to the previous

<ST_IN_SCRIPTING>"yield"{WHITESPACE}"from"[^a-zA-Z0-9_\x80-\xff] {

or not.

bwoebi · 2024-07-15T16:00:46Z

I think the proper way of fixing this would be introducing T_FROM, in a special lexer mode.

I.e.:

<ST_IN_SCRIPTING>"yield"{WHITESPACE_OR_COMMENTS}"from"[^a-zA-Z0-9_\x80-\xff] {
	yyless(5);
	yy_push_state(ST_LOOKING_FOR_FROM);
	RETURN_TOKEN(T_YIELD);
}

<ST_LOOKING_FOR_FROM>"from" {
	yy_pop_state();
	RETURN_TOKEN(T_FROM);
}

And then add ST_LOOKING_FOR_FROM to the whitespace, comment etc. targets.

Sort of related issue is #14961. Basically needs the same handling than that one.

jrfnl · 2024-07-15T16:50:26Z

@bwoebi Wouldn't that be a much larger BC-break as it would turn from into a (partially?) reserved keyword ? and would remove the T_YIELD_FROM token ?

bwoebi · 2024-07-15T16:56:53Z

@jrfnl It would be a keyword only if preceded by yield, i.e. changing nothing on the status quo. But yes, it would remove the T_YIELD_FROM token (and a T_COMMENT could now be between T_YIELD and T_FROM).

cmb69 · 2024-07-17T13:13:00Z

@bwoebi, I guess that would cause even more trouble for @jrfnl and others.

Anyhow, we need a solution soon; otherwise the solution would be to stick with what we have since PHP 8.3.10 is already on it's way (RC1 has been tagged yesterday).

jrfnl · 2024-07-17T23:33:55Z

I've posted to Internals now: https://externals.io/message/124462 Let's see if we get a response.

Fixes phpGH-14926.

cmb69 · 2024-08-05T10:40:50Z

Hmm, the discussion on the internals mailing list ended without consensus, not even agreement that the current behavior is a bug; as such we can at most keep that as feature request, but I guess that would not help @jrfnl in any way, and may cause even more work(-arounds).

So, what to do, @jrfnl?

jrfnl · 2024-08-05T11:54:19Z

@cmb69 Ha, you beat me to it.

Interesting to see your take on it. The way I read it, is that there is consensus that there is a bug, just not whether the bug is the change in the tokenizer or the missing documentation ;-)

I also see consensus that the current tokenization is not ideal, but no consensus on how it should be changed and whether there should be consistency rules for whitespace/comments in tokens.

So, what to do, @jrfnl?

As I also shared on the list:

I agree the improved tokenization proposed in the PR makes better sense.

Having said that, it will make the breaking change much much larger as
it now would no longer just be a change which can be handled in the
PHPCS token stream, but one which will impact individual sniffs, what
with the removal and introduction of new tokens.

If that PR gets merged, it will mean that PHPCS will need to undo the
PHP 8.3 and PHP 8.4 tokenization for the time being (in the 3.x major)
and will only "polyfill" the new tokenization as of PHPCS 4.0. In
practice, we would be moving the breaking change to PHPCS 4.0 version to
not break sniffs.

cmb69 · 2024-08-06T10:18:45Z

Interesting to see your take on it. The way I read it, is that there is consensus that there is a bug, just not whether the bug is the change in the tokenizer or the missing documentation ;-)

Oh, right. In this case I was referring to an implementation bug, since documenting the issue wouldn't really help you.

Regarding PR #15041: I feel that this is curing the symptoms, but not the desease, namely that from is not a keyword. Changing this in a minor version is off the table, and even changing this in a major may not be accepted. And anyway, it wouldn't help right now with the issue at hand.

jrfnl · 2024-08-06T11:28:11Z

And anyway, it wouldn't help right now with the issue at hand.

Well, until there is clarity where this is going, I can't fix things for PHPCS, so PHPCS will stay broken until a solution is found.

cmb69 · 2024-08-06T11:55:00Z

Well, who is up to decide what we do about this? Maybe the 8.3 release managers have something to say: @adoy, @bukka, @ericmann, thoughts?

bukka · 2024-08-06T12:44:54Z

I don't think changing that in 8.3 is acceptable. It's been out for some time and doing such break in bug fixing release is not a good idea IMHO. We allow some breaks in minor version so this is just a break in minor version even though it is a bit unfortunate.

[ci skip]

cmb69 · 2024-08-07T14:57:27Z

Okay, if we don't change the behavior, we should document the issue. Thus, I've submitted PR #15276.

iluuu1994 · 2024-08-07T15:01:33Z

Thank you! That makes sense to me. Let's not merge #15041 without a further discussion or RFC then.

jrfnl added Bug Status: Needs Triage labels Jul 11, 2024

jrfnl mentioned this issue Jul 11, 2024

Reminder: revisit yield from tokenization PHPCSStandards/PHP_CodeSniffer#529

Open

cmb69 added Category: Engine Extension: tokenizer labels Jul 17, 2024

bwoebi pushed a commit to bwoebi/php-src that referenced this issue Jul 20, 2024

Tokenize yield from as T_YIELD+T_FROM

3af8fdc

Fixes phpGH-14926.

bwoebi linked a pull request Jul 20, 2024 that will close this issue

Tokenize yield from as T_YIELD+T_FROM #15041

Open

bwoebi pushed a commit to bwoebi/php-src that referenced this issue Jul 20, 2024

Tokenize yield from as T_YIELD+T_FROM

42e9f66

Fixes phpGH-14926.

cmb69 added Status: Needs Feedback and removed Status: Needs Triage labels Aug 5, 2024

github-actions bot added Status: Needs Triage and removed Status: Needs Feedback labels Aug 5, 2024

cmb69 added a commit to cmb69/php-src that referenced this issue Aug 7, 2024

Fix phpGH-14926: yield /*comment*/ from is no longer a parse error

5d8e547

cmb69 added a commit to cmb69/php-src that referenced this issue Aug 7, 2024

Fix phpGH-14926: yield /*comment*/ from is no longer a parse error

eda25eb

[ci skip]

cmb69 linked a pull request Aug 7, 2024 that will close this issue

Fix GH-14926: yield /*comment*/ from is no longer a parse error #15276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHP 8.3: "yield /comment/ from" is no longer a parse error ? #14926

PHP 8.3: "yield /comment/ from" is no longer a parse error ? #14926

jrfnl commented Jul 11, 2024

damianwadley commented Jul 11, 2024

nielsdos commented Jul 11, 2024 •

edited

Loading

jrfnl commented Jul 11, 2024

jrfnl commented Jul 11, 2024

cmb69 commented Jul 12, 2024

iluuu1994 commented Jul 12, 2024

mvorisek commented Jul 14, 2024 •

edited

Loading

iluuu1994 commented Jul 14, 2024

mvorisek commented Jul 14, 2024 •

edited

Loading

jrfnl commented Jul 14, 2024 •

edited

Loading

iluuu1994 commented Jul 14, 2024

jrfnl commented Jul 14, 2024

cmb69 commented Jul 14, 2024

iluuu1994 commented Jul 14, 2024 •

edited

Loading

jrfnl commented Jul 14, 2024

cmb69 commented Jul 14, 2024

nielsdos commented Jul 15, 2024

cmb69 commented Jul 15, 2024

bwoebi commented Jul 15, 2024 •

edited

Loading

jrfnl commented Jul 15, 2024 •

edited

Loading

bwoebi commented Jul 15, 2024 •

edited

Loading

cmb69 commented Jul 17, 2024

jrfnl commented Jul 17, 2024

cmb69 commented Aug 5, 2024

jrfnl commented Aug 5, 2024

cmb69 commented Aug 6, 2024

jrfnl commented Aug 6, 2024

cmb69 commented Aug 6, 2024

bukka commented Aug 6, 2024 •

edited

Loading

cmb69 commented Aug 7, 2024

iluuu1994 commented Aug 7, 2024

PHP 8.3: "yield /*comment*/ from" is no longer a parse error ? #14926

PHP 8.3: "yield /*comment*/ from" is no longer a parse error ? #14926

Comments

jrfnl commented Jul 11, 2024

Description

PHP Version

Operating System

damianwadley commented Jul 11, 2024

nielsdos commented Jul 11, 2024 • edited Loading

jrfnl commented Jul 11, 2024

jrfnl commented Jul 11, 2024

cmb69 commented Jul 12, 2024

iluuu1994 commented Jul 12, 2024

mvorisek commented Jul 14, 2024 • edited Loading

iluuu1994 commented Jul 14, 2024

mvorisek commented Jul 14, 2024 • edited Loading

jrfnl commented Jul 14, 2024 • edited Loading

iluuu1994 commented Jul 14, 2024

jrfnl commented Jul 14, 2024

cmb69 commented Jul 14, 2024

iluuu1994 commented Jul 14, 2024 • edited Loading

jrfnl commented Jul 14, 2024

cmb69 commented Jul 14, 2024

nielsdos commented Jul 15, 2024

cmb69 commented Jul 15, 2024

bwoebi commented Jul 15, 2024 • edited Loading

jrfnl commented Jul 15, 2024 • edited Loading

bwoebi commented Jul 15, 2024 • edited Loading

cmb69 commented Jul 17, 2024

jrfnl commented Jul 17, 2024

cmb69 commented Aug 5, 2024

jrfnl commented Aug 5, 2024

cmb69 commented Aug 6, 2024

jrfnl commented Aug 6, 2024

cmb69 commented Aug 6, 2024

bukka commented Aug 6, 2024 • edited Loading

cmb69 commented Aug 7, 2024

iluuu1994 commented Aug 7, 2024

PHP 8.3: "yield /comment/ from" is no longer a parse error ? #14926

PHP 8.3: "yield /comment/ from" is no longer a parse error ? #14926

nielsdos commented Jul 11, 2024 •

edited

Loading

mvorisek commented Jul 14, 2024 •

edited

Loading

mvorisek commented Jul 14, 2024 •

edited

Loading

jrfnl commented Jul 14, 2024 •

edited

Loading

iluuu1994 commented Jul 14, 2024 •

edited

Loading

bwoebi commented Jul 15, 2024 •

edited

Loading

jrfnl commented Jul 15, 2024 •

edited

Loading

bwoebi commented Jul 15, 2024 •

edited

Loading

bukka commented Aug 6, 2024 •

edited

Loading