Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Keyword Question #273

Closed
wchristian opened this issue Jun 28, 2022 · 21 comments
Closed

The Keyword Question #273

wchristian opened this issue Jun 28, 2022 · 21 comments

Comments

@wchristian
Copy link
Member

wchristian commented Jun 28, 2022


Edit: I have implemented a proof-of-concept, please review and comment: #280




This spawned from discussion in #194 and Perl-Critic/Perl-Critic#591

There are a lot of hard problems with Perl parsing that lead people to think there are simple answers, which leave out real problems. Here is a demonstration of not even how PPI parses things, but of how just by looking at code, it is impossible to predict what the Perl Parser itself will do.

strect.pm

package strect;
use strictures; use Import::Into; use Syntax::Keyword::Try (); use Try::Tiny (); 1;

sub import {
    strictures->import::into(1);
    ( @ARGV ? "Try::Tiny" : "Syntax::Keyword::Try" )->import::into(1);
}

legacy.pl

use lib '.';
use strect;

try { die "marp" } catch { print "caught $_" };

print 4;
$ perl legacy.pl 1
caught marp at legacy.pl line 4.
4
$ perl legacy.pl
Use of uninitialized value $_ in concatenation (.) or string at legacy.pl line 4.

modern.pl

use lib '.';
use strect;

try { die "meep" } catch { print "caught $@" }

print 5;
$ perl modern.pl 1
try() encountered an unexpected argument (1) - perhaps a missing semi-colon before or at modern.pl line 6.
5
$ perl modern.pl
caught meep at modern.pl line 5.
5

This demonstrates how in a plausible use case, the parsing of keywords can be affected even on the same interpreter, by something entirely outside the code itself, like command line arguments.

This is problematic because, depending on the imported keywords, this block:

try { die "meep" } catch { print "caught $@" }
print 5;

is seen by the Perl Parser as either 1 statement (with try consuming the return value of the print), or 2 statements (try, followed by print).

NOTE ALSO: Some people might like to mentally short-circuit here with "just assume try is the modern form". That would be useless. I used try here for brevity and reproducability. Other modules might import other keywords.


And to top this off, here is an example within a single file:

stroct.pm

package stroct;
use strictures; use Import::Into; use Syntax::Keyword::Try (); use Try::Tiny (); 1;

sub import {
    strictures->import::into(1);
    ( $_[1] ? "Try::Tiny" : "Syntax::Keyword::Try" )->import::into(1);
}

mixed.pl

use lib '.';

{
    use stroct "a";
    try { die "marp" } catch { print "caught $_" };
    print 4;
}

{
    use stroct;
    try { die "meep" } catch { print "caught $@" }
    print 5;
}
$ perl mixed.pl
caught marp at mixed.pl line 5.
4caught meep at mixed.pl line 11.
5

This demonstrates how even within the same file, based on lexical imports with no sensible hints PPI could possibly predict, try {} catch {} print 5; might be one or two statements.

@wchristian
Copy link
Member Author

wchristian commented Jun 28, 2022

One possible solution that i can think of here, would be for PPI to be import-aware, in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

Thoughts @adamkennedy @oalders @Grinnz @EvanCarroll ?

@wchristian
Copy link
Member Author

<Mithaldu> keywords cannot be enabled by a require statement ... right?
<LeoNerd> Correct
<LeoNerd> But any code inside a BEGIN block could
<Mithaldu> .... fuck
<LeoNerd> BEGIN { require Syntax::Keyword::Try;  Syntax::Keyword::Try->import; }   for one thing.
<LeoNerd> BEGIN { require Syntax::Keyword::Try;  $^H{"Syntax::Keyword::Try/try"}++ }    for more subtle gut-wrenching

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2022

Unfortunately my only conclusion on this is that PPI on its own cannot solve this. Chasing imports is a losing battle and would be a hack at best, literally any use statement or BEGIN block can arbitrarily change the parser state. I haven't thought of an alternative.

@oalders
Copy link
Collaborator

oalders commented Jun 28, 2022

I agree that there are problems which PPI can't solve on its own and, having some experience with chasing imports, that's not a road I'd want to go down.

Having said that, I think there's a use case that says

this is code which I own, and I'm confident that it doesn't do unorthodox or clever things and if it does, I'm ok with PPI not always getting it right.

I think in cases like these if PPI gets it right most of the time, that would probably be ok for most people. It's already not really getting it right in some of these cases, so I do think that having a switch to turn on some more permissive parsing would probably suit a lot of use cases.

If PPI could experimentally support a config file, like Perl::Critic and perltidy do, then it would be easier to switch this behaviour on and off on a per-project basis. Turn it off by default, but maybe allow people to shoot themselves in the foot and see what happens.

One of the places where I'd like to see improvements is in better Perl tooling, especially in my editor. Having PPI take some hints about what is going on in the code, could be a real improvement in some cases.

@wchristian
Copy link
Member Author

I think solving for 90% of cases if it can be done in a useful manner without breaking the remaining 10% of cases should be a goal.

Personally my worry with per-file-config without any import tracking whatsoever is that it makes it very hard to configure how and when to change parse modes if people have something as basic as files with different perl feature imports in the same project, or different imports for different blocks in a file.

That's why i think chasing imports is more helpful than doing file-level parse mode declarations because it's easy to allow people to configure what imports do what, while also allowing the option of doing per-file configurations.

@oalders
Copy link
Collaborator

oalders commented Jun 28, 2022

I think solving for 90% of cases if it can be done in a useful manner without breaking the remaining 10% of cases should be a goal.

👍🏻

Personally my worry with per-file-config without any import tracking whatsoever is that it makes it very hard to configure how and when to change parse modes if people have something as basic as files with different perl feature imports in the same project, or different imports for different blocks in a file.

To be clear, in my case, just a per-project config would cover most of my cases. At $work 99% of the code will contain some pragmas which apply the same imports everywhere. So, in this case it's not complicated and just saying "assume signatures everywhere in this directory or below" is fine.

That's why i think chasing imports is more helpful

Any thoughts on how you would want to go about this?

@zmughal
Copy link

zmughal commented Jun 28, 2022

I can give an example of how that would work with Function::Parameters (which can add user-defined keywords).

I believe a common pattern is to use Import::Into in a shared "setup" package (like the strect.pm example) which normally should only have the side-effects of configuration of the caller. If one is allowed to only load that package, you can have a set of handlers that read in the hints and stash to find what each one is doing. This is what I do here

https://github.com/zmughal-CPAN/p5-Dist-Zilla-PluginBundle-Author-ZMUGHAL/blob/86c8ac100a3679258bbe6f45a0ae4724a7fbf79c/lib/Dist/Zilla/PluginBundle/Author/ZMUGHAL/Babble/FunctionParameters.pm#L8-L28

which lets me know what keywords and parameters were used by Function::Parameters. If configured to be cacheable with the import-list as a key, this only needs to be done once.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2022

The problem is that executing arbitrary code of any sort is not acceptable as a PPI implementation.

@zmughal
Copy link

zmughal commented Jun 28, 2022

That's true. If there is a plugin framework to support various keyword/keyword-like modules from CPAN, then at the very least they would need to have a way to configure which ones are loaded in a given file and with what settings. That should perhaps be a callback. PPI itself can not load the arbitrary code, but the user of PPI can decide if they want to return a static configuration or not. The parser would have to do multiple passes to get all the use/BEGIN callbacks before parsing the rest of the file.

@wchristian
Copy link
Member Author

To be clear, in my case, just a per-project config would cover most of my cases. At $work 99% of the code will contain some pragmas which apply the same imports everywhere. So, in this case it's not complicated and just saying "assume signatures everywhere in this directory or below" is fine.

Yeah, i recognize the value, and maybe we should just do that to start off with and mark it highly experimental. My hesitation stems from having a large $work project with code that applies pragmas in like 40% of the files. :)

Also i think it's better if PPI fails to parse something as a more complex form of what it seems to be, rather than succeeding in parsing a complex structure that is actually a mistake by the code's author and only looks like a complex thing it isn't. But i may be wrong on this.

That's why i think chasing imports is more helpful

Any thoughts on how you would want to go about this?

As a general thought:

  1. implement an attribute for PPI statements that tracks activated features and is passed on during parsing to each subsequent statement or structure so they can modify it and pass the modified version to the next one
  2. allow parsing code to look at said attribute and e.g. parse try {} catch {} as a complete block without semicolon if the relevant feature is activated, or disable parsings such as indirect (if applicable)
  3. at this point, we can already configure features per file by modifying the value of the attribute at the start of a parse
  4. implement code at the end of parsing of an import statement or begin block that hands it off to a callback which is expected to return a list of features-to-be-enabled and a list of features-to-be-disabled, which then modifies the attribute before handing it to the next statement
  5. implement a default callback there that, as per @leonerd, knows about core features https://metacpan.org/dist/perl/source/lib/feature.pm#L33
  6. extend the callback to receive user-defined matchings, such as having use strect enable try/catch parsing
  7. extend the callback with a predefined list of popular cpan matchings that can be enabled by default

The specific contents of the callback can be left up in the air for now, but there's plenty options there, up to and including executing an arbitrary callback passed in by the user, making it a parse-hook, which if absolutely necessary, might also allow things such as what @zmughal suggested there i think.

While i agree with @Grinnz that PPI itself will not execute the code it parses, i think an exception could be made for allowing users to define hooks that might execute parsed code.

Hm.

@wchristian
Copy link
Member Author

Edit: This also enables some interesting "parsing features" such as using PPI to determine which features are likely to be enabled or disabled, and maybe even based on that to ask PPI what the minimum Perl version for a block or a parsed document is.

@EvanCarroll
Copy link

in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

This is the solution that I would propose. This seems like a requirement. You're being more comprehensive by allowing other modules on CPAN to enable these options, but that's a step more than what I think is required for a minimal solution. You're given two options here,

I suggested Keywords over in #194 as just one method there but I don't see these at all as discrete questions.

  • Keywords (as in say and Corina)
  • Syntax: Signatures vs prototypes (and even indirect syntax, and multidimensional array syntax)

Are all related to the same core problem. It's just a parsing problem. The parser should be viewed as a customizable thing which is subject to versioning and modifications with pragmas and not something we can make any sense of divorced of those pragmas.

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2022

in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

This is the solution that I would propose. This seems like a requirement. You're being more comprehensive by allowing other modules on CPAN to enable these options, but that's a step more than what I think is required for a minimal solution. You're given two options here,

To repeat, literally any use statement or BEGIN statement can enable these parser changes without mentioning them. It is in fact extremely common due to modules like Moose and Mojolicious.

@wchristian
Copy link
Member Author

  1. implement an attribute for PPI statements that tracks activated features and is passed on during parsing to each subsequent statement or structure so they can modify it and pass the modified version to the next one

this might need some consideration and thought in regards to what happens when the user changes the tree 😑

@EvanCarroll
Copy link

EvanCarroll commented Jun 28, 2022

To repeat, literally any use statement or BEGIN statement can enable these parser changes without mentioning them. It is in fact extremely common due to modules like Moose and Mojolicious.

This isn't something I don't know; you're not telling me something new there. I addressed it explicitly. I just don't particularly care because,

  • I think it's somewhat reasonable to tell people using Mojo and Moose they must also enable the features the want lexically and as documented by Perl 5 so static analysis can work, because obviously it simply will not work if you're ignorant to these features.
  • I think the solution to infer from Mojo or Moose's existence the compiler options as you would if they were set directly is also a good solution. It's been previously brought up here, and you haven't mentioned any problems with that proposal.
  • I think CORE Perl needs to determine whether or not third party modules should be able to change parser semantics for the caller. It's not clear to me that this is supposed to be supported: go read perldoc feature. No where in there does it mention any other supported mechanism to get the feature enabled by using a third party module or by not using an explicit feature (use feature 'foo'), or a "Feature Bundle" (version string), or the "-E" flag. So that you this doesn't mean it's supported by Perl 5 or needs to be supported here.

Moreover,

  • PPI makes no claim to support everything on CPAN and owns not supporting ACME Bleach and source filters.
  • PPI claims to be a parser for Perl, and without this functionality can't actually parse modern Perl.

It would seem preferable to me to support the methods that Perl supports to the best humanly possible. Including all pragmas that change parsing semantics, and to make modules which push an unsupported method of changing parsing semantics as second class citizens. Even if realistically we realistically can support them above it should never come at the expense of not supporting Perl functionality. You may never be able to support all second-class use cases (like CPAN), but if you can't support things documented and supported by Perl itself everything will suffer

@Grinnz
Copy link
Contributor

Grinnz commented Jun 28, 2022

It's used by thousands of CPAN modules and almost every Darkpan project over a certain size. "support" isn't particularly relevant.

@wchristian
Copy link
Member Author

I must admit i'm not entirely following the disagreement between the two of you.

I think it's true that massive amounts of code use a lot of almost untrackable ways to enable features and keywords.

I also think it's true that we can both make the parser aware of features, and able to automatically track (and particularly so with configuration) enough uses to make it worthwhile?

@wchristian
Copy link
Member Author

Implementation-wise, i remembered that we are indeed operating with a tree, so:

  • the attribute only needs to be on statements that enable/disable features
  • methods can walk the tree to determine feature-enabled-ness per statement
  • since no parsing happens after initial parsing, it needs only be documented that feature-enabled-ness can be unreliable after changing the tree (i don't recall, but i think there's no way to mark a document or parts of it as dirty)

@wchristian
Copy link
Member Author

To keep this documented:

18:54 we basically only need two things:
18:54 1. parse and decide whether an import can activate features
18:54 2. a method on an element to ask whether a feature is active for it, by walking the tree
18:54 and while parsing things that might behave differently if feature active, use the previous two
18:54 and for 1. we can have configurations like perltidy and perlcritic have at different levels to accomodate customizations and shit

@wchristian
Copy link
Member Author

wchristian commented Jul 19, 2022

I have implemented a proof-of-concept, please review and comment: #280

@wchristian
Copy link
Member Author

Closing this, since the POC in #280 is moving closer to being ready to be released. Comments on the specific implementation are very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants