-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wiring in Generalized Identifiers #167
base: main
Are you sure you want to change the base?
Conversation
@bgribaudo : Thanks for your contribution! The author(s) have been notified to review your proposed change. |
/CC: @ehrenMSFT |
@bgribaudo : Thanks for your contribution! The author(s) have been notified to review your proposed change. |
Learn Build status updates of commit b77e9e4: ✅ Validation status: passed
For more details, please refer to the build report. For any questions, please:
|
Generalized identifiers are used in the context of record fields, either when declaring them or referencing them. See https://learn.microsoft.com/en-us/powerquery-m/m-spec-lexical-structure#generalized-identifiers, as well as the references to generalized-identifier in the page you changed. I don't think the change you're proposing is correct. |
Thanks, @ehrenMSFT. Doesn't the entire lexical structure of a valid M file need to be matchable/reachable from the root lexical rule? :-) That is, shouldn't a lexer be able to start with the root rule and work down through/follow its alternatives until the entire file's contents have been lexed? The catch with The fact that the syntactical grammar allows generalized identifiers to be used in certain record field contexts then becomes a moot point, as those parser rules will never match because no generalized identifier tokens will ever be handed to the parser from the lexer. This makes me think that (like all the other lexical rules defined by the spec) lexical rule But maybe I am missing something here?? :-) Or maybe |
Unfortunately the people on the team with the domain knowledge to review this are all tied up with other things right now. The published grammar just isn't a high priority at the moment. |
Thanks, @ehrenMSFT. Do you think there is a chance a review sometime in the next couple months might happen? :-) |
I can't say for certain, but based on internal discussion it sounds like someone may be able to do that. |
Thanks, @ehrenMSFT. I understand that is not a promise, but it is still helpful. Maybe this issue should be left open as a placeholder? |
Yes, please leave this PR open for now. |
@bgribaudo I believe it is accurate to say that generalized-identifier is not actually part of lexical analysis in practice, despite its documentation being under that section. It is currently referenced by the syntactic grammar, for example by records. |
Thanks, @egorelik93! What you said makes sense. Could you propose what the correct syntactic grammar rule(s) would be for it? It would be great to have the grammar reflecting how generalized-identifier actually works/is implemented. |
@bgribaudo It looks to me that the only change needed is that generalized-identifier should not be part of the definition for identifier. Otherwise, generalized-identifier shows up as part of field-name in every instance, which seems accurate to how it actually works. |
Thanks, @egorelik93. If I'm following, you're suggesting to remove I think Hmm...should |
#label:"aq-pr-triaged" |
Learn Build status updates of commit 16b073d: ✅ Validation status: passed
For more details, please refer to the build report. For any questions, please:
|
@bgribaudo Yes it's weird. As far as I can tell, generalized-identifier only uses tokens to identify a range of text and then it just grabs all the text in that range completely ignoring any tokenization. I don't think this can be described in a traditional grammar. |
Learn Build status updates of commit 5d5514c: ✅ Validation status: passed
For more details, please refer to the build report. For any questions, please:
|
@egorelik93, what do you think of something like how the PR looks now? It's not traditional, but maybe something like it would work? (Note: PR is in draft form.) |
@bgribaudo I'm not a fan of this. I think both the explanation and the placement with the other identifiers obscures the fact that this 'lexical rule' really only activates when the syntax rules for field names says it does. Anything else that matches this lexical rule isn't just going to get classified as a generalized identifier. I think what makes most sense to me at least, would be to either stick the explanation of generalized-identifier into the record expression section next to field name, or just make it its own section in the lexical part disjoint from everything else. The 'range of text' explanation is almost accurate, but also needs Comma as one of the excluded characters. |
@egorelik93, thanks for the feedback. Is something like the latest more along the lines of what you're thinking? |
Learn Build status updates of commit 7a0dca5: ✅ Validation status: passed
For more details, please refer to the build report. For any questions, please:
|
@bgribaudo That's the right location but it lost the correct information when you took out 'generalized-identifier'. 'generalized-identifier' is the intended syntactic rule under field name (I don't remember if we finished verifying generalized-identifier-syntax, but let's ignore that for a moment). The stuff about a range of text is sort of like a retokenization that fires when we apply the generalized-identifier syntactic rule, but it still feeds into the latter. |
Learn Build status updates of commit f8762b3: ✅ Validation status: passed
For more details, please refer to the build report. For any questions, please:
|
Thanks, @egorelik93! Is now closer to what you're thinking? |
Hi @egorelik93! Hope you are doing well! Did you have a chance to look at this? :-) |
@bgribaudo No, this doesn't really address my concern that you're describing the lexical rule for generalized identifier as if it were the syntactic rule. I understand the lexical rule needs to go somewhere, but it is not correct to equate the two. |
Thanks for the feedback, @egorelik93! Can you give me an example of what you think the lexical rule should be? |
@bgribaudo I thought what you were writing out now is the lexical rule? The syntactic rule is what I remember we were discussing earlier when we were trying to figure out a correct regex. That (once corrected) is the actual syntactic rule, which runs on top of the lexical rule that more or less matches what you currently have written there (possibly with corrections). The weird bit is that said lexical rule is not part of the actual tokenization phase at all, but basically gets run at syntax time when at a generalized-identifier node but right before the corresponding syntactic rule. Or, let me put this a different way - if the syntax phase reaches the generalized-identifier rule, it will identify a range of text using the existing tokenization, then throw away the tokenization within that range and feed the raw text to the syntactic rule. I don't know if there is a way to express this in a more traditional grammar, but this is what the implementation does. |
@bgribaudo On further reflection, I think the least confusing thing to do is to not refer to this range identification as a lexical rule at all. Let's just say we have a weird syntactic rule that, rather than just being fed tokens, uses those tokens to identify a range of text and then applies the syntactic rule to that range of text rather than the token stream. |
@egorelik93, thanks! So kind of, sort of, what is in the PR now? |
@bgribaudo, @egorelik93 - has this pull request reached a conclusion? |
No. :-) Waiting on feedback from @egorelik93. |
No, it's the other way around. This is the tokenization (or maybe retokenization) rule: The range of text spanned by a sequence of one or more tokens, other than whereas what you described as the generalized identifier grammar is the true syntactic rule (though I forget if we verified that the regex was correct already). Your note about the contextual token rule is correct, but the current PR reverses which rule is the syntactic. |
In the Lexical grammar, under heading Identifiers, there's a definition for
generalized-identifier
. However, this rule name does not seem to be directly or indirectly referenced by that grammar's root rule (lexical-unit
). That is, there is no path from rulelexical-unit
togeneralized-identifier
.I think
generalized-identifier
needs to be wired in under theidentifier
non-terminal. However, sincegeneralized-identifier
can only be used in certain contexts, my suspicion is that it can’t simply be added as an alternative option underidentifier
, but instead needs some qualifier rule added to it.The first commit in this PR adds
generalized-identifier
as an option underidentifier
—but does not include the needed? qualifier rule. I haven't figured out exactly what that rule should be. Can someone help?Thanks!