Wiring in Generalized Identifiers #167

bgribaudo · 2024-02-02T15:40:08Z

In the Lexical grammar, under heading Identifiers, there's a definition for generalized-identifier. However, this rule name does not seem to be directly or indirectly referenced by that grammar's root rule (lexical-unit). That is, there is no path from rule lexical-unit to generalized-identifier.

I think generalized-identifier needs to be wired in under the identifier non-terminal. However, since generalized-identifier can only be used in certain contexts, my suspicion is that it can’t simply be added as an alternative option under identifier, but instead needs some qualifier rule added to it.

The first commit in this PR adds generalized-identifier as an option under identifier—but does not include the needed? qualifier rule. I haven't figured out exactly what that rule should be. Can someone help?

Thanks!

prmerger-automator · 2024-02-02T15:40:16Z

@bgribaudo : Thanks for your contribution! The author(s) have been notified to review your proposed change.

bgribaudo · 2024-02-02T15:40:26Z

/CC: @ehrenMSFT

prmerger-automator · 2024-02-02T15:40:27Z

@bgribaudo : Thanks for your contribution! The author(s) have been notified to review your proposed change.

learn-build-service-prod · 2024-02-02T15:41:17Z

Learn Build status updates of commit b77e9e4:

✅ Validation status: passed

File	Status	Preview URL	Details
query-languages/m/m-spec-consolidated-grammar.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

ehrenMSFT · 2024-02-02T17:44:51Z

In the Lexical grammar, under heading Identifiers, there's a definition for generalized-identifier. However, this rule name does not seem to be directly or indirectly referenced by that grammar's root rule (lexical-unit). That is, there is no path from rule lexical-unit to generalized-identifier.

I think generalized-identifier needs to be wired in under the identifier non-terminal. However, since generalized-identifier can only be used in certain contexts, my suspicion is that it can’t simply be added as an alternative option under identifier, but instead needs some qualifier rule added to it.

The first commit in this PR adds generalized-identifier as an option under identifier—but does not include the needed? qualifier rule. I haven't figured out exactly what that rule should be. Can someone help?

Thanks!

Generalized identifiers are used in the context of record fields, either when declaring them or referencing them. See https://learn.microsoft.com/en-us/powerquery-m/m-spec-lexical-structure#generalized-identifiers, as well as the references to generalized-identifier in the page you changed.

I don't think the change you're proposing is correct.

bgribaudo · 2024-02-02T18:18:46Z

Thanks, @ehrenMSFT.

Doesn't the entire lexical structure of a valid M file need to be matchable/reachable from the root lexical rule? :-) That is, shouldn't a lexer be able to start with the root rule and work down through/follow its alternatives until the entire file's contents have been lexed?

The catch with generalized-identifier is that it is defined as part of the lexical grammar, but is not reachable from its root rule (lexical-unit)—so if a lexer follows the lexical grammar as written, it will never match any generalized-identifiers. If it never matches that rule, it will never generate any generalized identifier tokens.

The fact that the syntactical grammar allows generalized identifiers to be used in certain record field contexts then becomes a moot point, as those parser rules will never match because no generalized identifier tokens will ever be handed to the parser from the lexer.

This makes me think that (like all the other lexical rules defined by the spec) lexical rule generalized-identifier should somehow be reachable from lexical-unit.

But maybe I am missing something here?? :-)

Or maybe generalized-identifier really isn't a lexical rule, so should be refactored to live in the syntactic grammar?

ehrenMSFT · 2024-02-15T00:22:04Z

Thanks, @ehrenMSFT.

Doesn't the entire lexical structure of a valid M file need to be matchable/reachable from the root lexical rule? :-) That is, shouldn't a lexer be able to start with the root rule and work down through/follow its alternatives until the entire file's contents have been lexed?

The catch with generalized-identifier is that it is defined as part of the lexical grammar, but is not reachable from its root rule (lexical-unit)—so if a lexer follows the lexical grammar as written, it will never match any generalized-identifiers. If it never matches that rule, it will never generate any generalized identifier tokens.

The fact that the syntactical grammar allows generalized identifiers to be used in certain record field contexts then becomes a moot point, as those parser rules will never match because no generalized identifier tokens will ever be handed to the parser from the lexer.

This makes me think that (like all the other lexical rules defined by the spec) lexical rule generalized-identifier should somehow be reachable from lexical-unit.

But maybe I am missing something here?? :-)

Or maybe generalized-identifier really isn't a lexical rule, so should be refactored to live in the syntactic grammar?

Unfortunately the people on the team with the domain knowledge to review this are all tied up with other things right now. The published grammar just isn't a high priority at the moment.

bgribaudo · 2024-02-16T20:22:25Z

Thanks, @ehrenMSFT. Do you think there is a chance a review sometime in the next couple months might happen? :-)

ehrenMSFT · 2024-02-20T18:58:59Z

Thanks, @ehrenMSFT. Do you think there is a chance a review sometime in the next couple months might happen? :-)

I can't say for certain, but based on internal discussion it sounds like someone may be able to do that.

bgribaudo · 2024-02-21T15:56:03Z

Thanks, @ehrenMSFT. I understand that is not a promise, but it is still helpful.

Maybe this issue should be left open as a placeholder?

ehrenMSFT · 2024-02-21T16:35:32Z

Thanks, @ehrenMSFT. I understand that is not a promise, but it is still helpful.

Maybe this issue should be left open as a placeholder?

Yes, please leave this PR open for now.

egorelik93 · 2024-02-22T19:16:54Z

@bgribaudo I believe it is accurate to say that generalized-identifier is not actually part of lexical analysis in practice, despite its documentation being under that section. It is currently referenced by the syntactic grammar, for example by records.

bgribaudo · 2024-02-23T13:24:06Z

Thanks, @egorelik93!

What you said makes sense. Could you propose what the correct syntactic grammar rule(s) would be for it? It would be great to have the grammar reflecting how generalized-identifier actually works/is implemented.

egorelik93 · 2024-02-23T18:31:46Z

@bgribaudo It looks to me that the only change needed is that generalized-identifier should not be part of the definition for identifier. Otherwise, generalized-identifier shows up as part of field-name in every instance, which seems accurate to how it actually works.

bgribaudo · 2024-02-23T20:19:37Z

Thanks, @egorelik93.

If I'm following, you're suggesting to remove generalized-identifier from the identifiers section of the lexical grammar, right? If this were done, how would field-name in the semantic grammar ever match generalized-identifier as no generalized-identifier tokens would ever be handed to it from the lexer.

I think generalized-identifier can't be removed altogether but must either be defined somewhere in the lexical or syntactic grammars.

Hmm...should generalized-identifier's definition be moved from the lexical grammar to the syntactic grammar and there be defined as something sort of like "(any tokens)* where the spanned text complies with the generalized identifier format rules"?

Jak-MS · 2024-03-07T22:33:56Z

@DougKlopfenstein @ehrenMSFT

IMPORTANT: When this content is ready to merge, you must add #sign-off in a comment or the approval may get overlooked.

#label:"aq-pr-triaged"
@MicrosoftDocs/public-repo-pr-review-team

learn-build-service-prod · 2024-03-27T16:51:49Z

Learn Build status updates of commit 16b073d:

✅ Validation status: passed

File	Status	Preview URL	Details
query-languages/m/m-spec-consolidated-grammar.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

egorelik93 · 2024-03-27T19:18:00Z

@bgribaudo Yes it's weird. As far as I can tell, generalized-identifier only uses tokens to identify a range of text and then it just grabs all the text in that range completely ignoring any tokenization. I don't think this can be described in a traditional grammar.

learn-build-service-prod · 2024-03-28T14:48:16Z

Learn Build status updates of commit 5d5514c:

✅ Validation status: passed

File	Status	Preview URL	Details
query-languages/m/m-spec-consolidated-grammar.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

bgribaudo · 2024-03-28T14:48:54Z

I don't think this can be described in a traditional grammar.

@egorelik93, what do you think of something like how the PR looks now? It's not traditional, but maybe something like it would work?

(Note: PR is in draft form.)

egorelik93 · 2024-03-28T17:24:44Z

@bgribaudo I'm not a fan of this. I think both the explanation and the placement with the other identifiers obscures the fact that this 'lexical rule' really only activates when the syntax rules for field names says it does. Anything else that matches this lexical rule isn't just going to get classified as a generalized identifier.

I think what makes most sense to me at least, would be to either stick the explanation of generalized-identifier into the record expression section next to field name, or just make it its own section in the lexical part disjoint from everything else. The 'range of text' explanation is almost accurate, but also needs Comma as one of the excluded characters.

bgribaudo · 2024-04-25T18:19:42Z

@egorelik93, thanks for the feedback. Is something like the latest more along the lines of what you're thinking?

learn-build-service-prod · 2024-04-25T18:20:08Z

Learn Build status updates of commit 7a0dca5:

✅ Validation status: passed

File	Status	Preview URL	Details
query-languages/m/m-spec-consolidated-grammar.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

egorelik93 · 2024-04-25T19:55:06Z

@bgribaudo That's the right location but it lost the correct information when you took out 'generalized-identifier'. 'generalized-identifier' is the intended syntactic rule under field name (I don't remember if we finished verifying generalized-identifier-syntax, but let's ignore that for a moment). The stuff about a range of text is sort of like a retokenization that fires when we apply the generalized-identifier syntactic rule, but it still feeds into the latter.

learn-build-service-prod · 2024-04-30T13:20:23Z

Learn Build status updates of commit f8762b3:

✅ Validation status: passed

File	Status	Preview URL	Details
query-languages/m/m-spec-consolidated-grammar.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

bgribaudo · 2024-04-30T13:21:00Z

Thanks, @egorelik93! Is now closer to what you're thinking?

bgribaudo · 2024-05-20T14:33:28Z

Hi @egorelik93! Hope you are doing well! Did you have a chance to look at this? :-)

egorelik93 · 2024-05-20T20:07:22Z

@bgribaudo No, this doesn't really address my concern that you're describing the lexical rule for generalized identifier as if it were the syntactic rule. I understand the lexical rule needs to go somewhere, but it is not correct to equate the two.

bgribaudo · 2024-06-05T17:48:31Z

Thanks for the feedback, @egorelik93! Can you give me an example of what you think the lexical rule should be?

egorelik93 · 2024-06-05T19:57:43Z

@bgribaudo I thought what you were writing out now is the lexical rule? The syntactic rule is what I remember we were discussing earlier when we were trying to figure out a correct regex. That (once corrected) is the actual syntactic rule, which runs on top of the lexical rule that more or less matches what you currently have written there (possibly with corrections). The weird bit is that said lexical rule is not part of the actual tokenization phase at all, but basically gets run at syntax time when at a generalized-identifier node but right before the corresponding syntactic rule. Or, let me put this a different way - if the syntax phase reaches the generalized-identifier rule, it will identify a range of text using the existing tokenization, then throw away the tokenization within that range and feed the raw text to the syntactic rule. I don't know if there is a way to express this in a more traditional grammar, but this is what the implementation does.

egorelik93 · 2024-06-05T20:05:56Z

@bgribaudo On further reflection, I think the least confusing thing to do is to not refer to this range identification as a lexical rule at all. Let's just say we have a weird syntactic rule that, rather than just being fed tokens, uses those tokens to identify a range of text and then applies the syntactic rule to that range of text rather than the token stream.

bgribaudo · 2024-06-14T11:47:40Z

@egorelik93, thanks! So kind of, sort of, what is in the PR now?

DougKlopfenstein · 2024-09-17T15:20:13Z

@bgribaudo, @egorelik93 - has this pull request reached a conclusion?

bgribaudo · 2024-09-18T15:28:27Z

No. :-) Waiting on feedback from @egorelik93.

egorelik93 · 2024-09-20T20:11:09Z

No, it's the other way around.

This is the tokenization (or maybe retokenization) rule:

The range of text spanned by a sequence of one or more tokens, other than =, , or ],

whereas what you described as the generalized identifier grammar is the true syntactic rule (though I forget if we verified that the regex was correct already).

Your note about the contextual token rule is correct, but the current PR reverses which rule is the syntactic.

Tying generalized-identifier to lexical-element

b77e9e4

prmerger-automator bot added the do-not-merge label Feb 2, 2024

prmerger-automator bot requested a review from DougKlopfenstein February 2, 2024 15:40

prmerger-automator bot assigned DougKlopfenstein Feb 2, 2024

prmerger-automator bot added Change sent to author powerquery/svc labels Feb 2, 2024

prmerger-automator bot added the qualifies-for-auto-merge label Feb 2, 2024

prmerger-automator bot added the aq-pr-triaged tracking label for the PR review team label Mar 7, 2024

Merge branch 'main' into patch-4

16b073d

prmerger-automator bot removed the qualifies-for-auto-merge label Mar 27, 2024

prmerger-automator bot added the qualifies-for-auto-merge label Mar 27, 2024

Idea

5d5514c

prmerger-automator bot removed the qualifies-for-auto-merge label Mar 28, 2024

prmerger-automator bot added the qualifies-for-auto-merge label Mar 28, 2024

Update m-spec-consolidated-grammar.md

7a0dca5

prmerger-automator bot removed the qualifies-for-auto-merge label Apr 25, 2024

prmerger-automator bot added the qualifies-for-auto-merge label Apr 25, 2024

Update m-spec-consolidated-grammar.md

f8762b3

prmerger-automator bot removed the qualifies-for-auto-merge label Apr 30, 2024

prmerger-automator bot added the qualifies-for-auto-merge label Apr 30, 2024

Wiring in Generalized Identifiers #167

Are you sure you want to change the base?

Wiring in Generalized Identifiers #167

Conversation

bgribaudo commented Feb 2, 2024

prmerger-automator bot commented Feb 2, 2024

bgribaudo commented Feb 2, 2024

prmerger-automator bot commented Feb 2, 2024

learn-build-service-prod bot commented Feb 2, 2024

✅ Validation status: passed

ehrenMSFT commented Feb 2, 2024

bgribaudo commented Feb 2, 2024 • edited Loading

ehrenMSFT commented Feb 15, 2024

bgribaudo commented Feb 16, 2024

ehrenMSFT commented Feb 20, 2024

bgribaudo commented Feb 21, 2024

ehrenMSFT commented Feb 21, 2024

egorelik93 commented Feb 22, 2024

bgribaudo commented Feb 23, 2024

egorelik93 commented Feb 23, 2024

bgribaudo commented Feb 23, 2024

Jak-MS commented Mar 7, 2024

learn-build-service-prod bot commented Mar 27, 2024

✅ Validation status: passed

egorelik93 commented Mar 27, 2024

learn-build-service-prod bot commented Mar 28, 2024

✅ Validation status: passed

bgribaudo commented Mar 28, 2024

egorelik93 commented Mar 28, 2024

bgribaudo commented Apr 25, 2024

learn-build-service-prod bot commented Apr 25, 2024

✅ Validation status: passed

egorelik93 commented Apr 25, 2024

learn-build-service-prod bot commented Apr 30, 2024

✅ Validation status: passed

bgribaudo commented Apr 30, 2024

bgribaudo commented May 20, 2024

egorelik93 commented May 20, 2024

bgribaudo commented Jun 5, 2024

egorelik93 commented Jun 5, 2024 • edited Loading

egorelik93 commented Jun 5, 2024

bgribaudo commented Jun 14, 2024

DougKlopfenstein commented Sep 17, 2024

bgribaudo commented Sep 18, 2024

egorelik93 commented Sep 20, 2024 • edited Loading

bgribaudo commented Feb 2, 2024 •

edited

Loading

egorelik93 commented Jun 5, 2024 •

edited

Loading

egorelik93 commented Sep 20, 2024 •

edited

Loading