Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: parse as AST #6788

Draft
wants to merge 35 commits into
base: v5/develop
Choose a base branch
from

Conversation

rasteiner
Copy link
Contributor

@rasteiner rasteiner commented Nov 12, 2024

Description

This PR Introduces a new query parser and "runners" which allow Kirby to "compile" queries so that they can subsequently execute faster.
Based on initial tests, the runners seem to be about 30% faster (28% one, 35% the other "more experimental" one).

Summary of changes

There is a new Kirby\Toolkit\Query namespace which contains all logic for the new query runners. The general process is as follows:

The query string is split into a flat sequence of tokens (see Kirby\Toolkit\Query\Tokenizer).
The tokens are parsed into a recursive abstract syntax tree (see Kirby\Toolkit\Query\Parser).
The AST is then visited by an interpreter (Kirby\Toolkit\Query\Runners\Visitors\Interpreter) that directly evaluates the query, or
a code generator (Kirby\Toolkit\Query\Runners\Visitors\CodeGen) that transpiles it into PHP code.

The whole process and the caching is handled by the two "runner" classes:

Kirby\Toolkit\Query\Runners\Interpreted
Kirby\Toolkit\Query\Runners\Transpiled

The original Kirby\Query\Query class remains in place and chooses which runner to use based on the query.runner option.

Reasoning

Separating the parsing and execution steps allows for more flexibility and better performance, since it allows us to cache the AST, either as PHP code or in memory during a request (or in some other out of process memory cache).

Additional context

The parser is a predictive recursive descent parser, making the parsing step O(n). As such it can't support ambigous code, which leads to the following failing test:

$query = new Query('user.username');
$this->assertSame('homer', $query->resolve(['user.username' => 'homer']));
$query = new Query('user.callback');
$this->assertSame('homer', $query->resolve(['user.callback' => fn () => 'homer']));

Changelog

  • Introduce a new Kirby\Toolkit\Query namespace for query parsing and running.
  • Introduce query.runner option to opt-out of the new parser or switch to the transpiled version (requires File Write access).
  • Closures defined in Kirby queries can now accept arguments

The "no longer needed" classes in Kirby\Query (Argument, Arguments, Expression, Segment, Segments) could be deprecated and removed in a future version when you eventually remove support for the "legacy" runner.

Fixes

No fixes, just performance.

Breaking changes

The above mentioned ability of the original parser to match identifiers like user.username literally as array key "user.username" would no longer be valid, as a query like that, which changes "meaning" based on the data context, can't be compiled to an efficient code. An alternative syntax, like user\.username or maybe something like this['user.username'] would be needed.

Docs

Except for the above breaking change and deprecation, this should be a drop-in replacement.

The only new features are:

  1. the config option query.runner:
<?php 

return [
  // 'query.runner' => 'legacy',
  // 'query.runner' => 'interpreted', // default
  // 'query.runner' => 'transpiled',
];
  1. the ability to receive arguments in closures:
$query = new Query('(foo) => foo.homer');
$data  = [];

$bar = $query->resolve($data);
$bar = $bar(['homer' => 'simpson']);
$this->assertSame('simpson', $bar);

Ready?

No! This is a work in progress. The proposed API could change.
Also, for some reason VS Code decided to merge the main branch into this one... As this PR isn't really ment to be merged "as is" I'll just ignore this mistake: this is mainly a place for dicussion.

  • In-code documentation (wherever needed)
  • Unit tests for fixed bug/feature
  • Tests and CI checks all pass

For review team

  • Add lab and/or sandbox examples (wherever helpful)
  • Add changes & docs to release notes draft in Notion

src/Toolkit/Query/BaseParser.php Fixed Show fixed Hide fixed
src/Toolkit/Query/BaseParser.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Parser.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Parser.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Parser.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Runners/Transpiled.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Runtime.php Fixed Show fixed Hide fixed
src/Toolkit/Query/Tokenizer.php Fixed Show fixed Hide fixed
 - cast data objects to arrays before passing to query runner
 - allow integers as identifiers of array keys when used after a dot
 - don't emit warnings when an array key is missing
- Implements the intercept mechanism for both runners
- Renames the AST Node classes with a "Node" suffix to avoid confusion with some PHP internal classes (like `Closure` -> `ClosureNode`)
@rasteiner
Copy link
Contributor Author

Updates

  • the syntax user\.username has been implemented for now. The corresponding test has been updated to use that instead of the ambiguous user.username.
    Similarly, the following syntax is supported too: user\\\.username, for when the actual array key is really user\.username. (The first backslash escapes the second, the third escapes the dot.
  • intercept has been implemented.

Discussion

In regards to intercept, all evaluated objects and variables are now passed through intercept. Even stuff like literal strings or numbers. It could be worth discussing if this is actually useful or just slow. One idea could be to only intercept objects on which a member field or method is accessed. Like:

  • page('foo') would stay page('foo')
  • page('foo').title would become intercept(page('foo')).title.

@distantnative
Copy link
Member

One idea could be to only intercept objects on which a member field or method is accessed.
This would be an option if it also includes objects down the chain: E.g. page.files.first would also need the option to intercept the Files and the File object.

@rasteiner
Copy link
Contributor Author

rasteiner commented Nov 13, 2024

One idea could be to only intercept objects on which a member field or method is accessed.

This would be an option if it also includes objects down the chain: E.g. page.files.first would also need the option to intercept the Files and the File object.

Sure, page.files.first would be:

intercept(
  intercept(page).files
).first

page.files.first.delete would be:

intercept(
  intercept(
    intercept(page).files
  ).first
).delete

also foo.callMeBack(() => page('bar')).delete would be:

intercept(
  intercept(foo).callMeBack(() => page('bar'))
).delete

@rasteiner
Copy link
Contributor Author

rasteiner commented Nov 15, 2024

Having implemented the "user\.username" syntax, I've now realized that, while passing the unit test, I didn't really solve the underlying problem.
By allowing us to escape the dot, there's now an exception explicitly for the dot, but every other special symbol (even like whitespace) would still break the parsing:

$query = new Query('Shipping Address'); 
$this->assertSame('742 Evergreen Terrace', $query->resolve(['Shipping Address' => '742 Evergreen Terrace']));

I see 4 options:

  1. We ignore this: the only special character allowed in array keys remains the dot
  2. We escape literally everything, kinda like in regex
  3. We introduce an index operator (i.e. foo["Shipping Address"] or {"Shipping Address"})
  4. We introduce a new kind of string literal which is interpreted as identifier, like in MySQL queries the back ticks for complex column names `Shipping Address`

The index operator obviously gives the most bang for the buck as it allows any kind of expression as index (not just strings), but it either requires an unconventional syntax (the {} brackets in the example above) or, for the special case of wanting to access a "root level variable", we need to introduce a reserved keyword for the global context (foo in example)

@distantnative distantnative changed the title [v5] Feature / Perfomance: compiled queries Query: parse as AST Nov 20, 2024
@distantnative distantnative added type: enhancement ✨ Suggests an enhancement; improves Kirby type: performance ⚡️ Is about performance; improves performance labels Nov 20, 2024
 - cast data objects to arrays before passing to query runner
 - allow integers as identifiers of array keys when used after a dot
 - don't emit warnings when an array key is missing
- Implements the intercept mechanism for both runners
- Renames the AST Node classes with a "Node" suffix to avoid confusion with some PHP internal classes (like `Closure` -> `ClosureNode`)
instead of intercepting everything, intercept only objects or array on which the query actually want's to access methods or fields
Dependency Injecting the query cache into runners allows the Query class can manage it, since it's also the Query class which controls the global functions available to queries
@rasteiner
Copy link
Contributor Author

I've implememted the intercept logic like described above (see test here)

Other notable change would be moving the "resolver cache" (the cache that maps query strings to closures) to the Kirby\Query\Query class.
Before this change the cache was hidden away in the Runner classes, but we want that cache to be both static (so that one can simply create a new Query instance and benefit from the cache) as well as under the control of Query (should the "Query::entries" change during a request).

@distantnative
Copy link
Member

@rasteiner we are currently trying to wrap some things up for the v5 beta, that's why I'm not active here much these days, but will get back to it probably next week! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement ✨ Suggests an enhancement; improves Kirby type: performance ⚡️ Is about performance; improves performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants