-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHPLIB-1254 Create Yaml for all operators and stages #1180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very WIP
$filename = $dirname . '/' . $name . '.yaml'; | ||
|
||
// Add a schema reference to the top of the file | ||
$schema = '# $schema: ../schema.json' . PHP_EOL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHPStorm will support it soon.
Will be available in 2023.3 Release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed .idea/jsonSchemas.xml for the time being.
|
||
public function execute(InputInterface $input, OutputInterface $output): int | ||
{ | ||
$index = file_get_contents('https://docs.google.com/spreadsheets/d/e/2PACX-1vROpGTJGXAKf2SVuSZaw16NwMVtzMVGH9b-YiMtddgZRZOjOO6jK2YLbTUZ0N_qe74nxGY9hYhUe-l2/pubhtml'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I published the Google Sheet publicly as an HTML document because csv and tsv doesn't have correct support for newline. Also, I can edit in Google and run my command without all the hassle of Google authentication.
|
||
use const PHP_EOL; | ||
|
||
final class ScrapeCommand extends Command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably remove this command before merging. The google sheet is a step for generating hundreds of Yaml file. But it is not suitable for day-to-day maintenance and versioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it potentially come in handy down the line to track necessary changes to YAML files? Drivers typically do not get notified of all updates to MQL (query and aggregation) syntax, so I think it's likely we'll fall behind on this and may want some tooling to compare docs against our own mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all done by hand. Unless you use AI, I don't see how you can recreate this file simply to detect changes.
The form in which things are documented varies greatly and some informations where extracted from the server source code. I'd better maintain Yaml files and get notified of changes.
generator/config/expressions.php
Outdated
@@ -20,14 +24,44 @@ function typeFieldPath(string $resolvesTo): array | |||
} | |||
|
|||
return [ | |||
'mixed' => ['scalar' => true, 'types' => ['mixed']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixed
is very PHP-ish. Should we rename this to any
to make it more applicable to other programming languages?
generator/config/expressions.php
Outdated
'Regex' => ['scalar' => true, 'types' => [BSON\Regex::class]], | ||
'Constant' => ['scalar' => true, 'types' => ['mixed']], | ||
'Binary' => ['scalar' => true, 'types' => ['string', BSON\Binary::class]], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason these are capitalised? If we're supporting all BSON types here, there's no reason to change the capitalisation of names.
generator/config/expressions.php
Outdated
], | ||
DateFieldPath::class => typeFieldPath(ResolvesToDate::class), | ||
ResolvesToTimestamp::class => [ | ||
'implements' => [ResolvesToInt::class], | ||
'types' => ['int', BSON\Int64::class], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should include the BSON\Timestamp
classs.
@@ -32,7 +32,10 @@ class BuilderEncoder implements Encoder | |||
*/ | |||
public function canEncode($value): bool | |||
{ | |||
return $value instanceof Pipeline || $value instanceof StageInterface || $value instanceof ExpressionInterface; | |||
return $value instanceof Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely consider splitting this class.
- "7.4" | ||
- "8.0" | ||
#- "7.4" | ||
#- "8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be reverted by PHPLIB-1268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you come to a solution to still run PHPLIB against 7.4+ but limit the aggregation builder code to 8.1+?
generator/composer.json
Outdated
@@ -18,6 +18,9 @@ | |||
"mongodb/mongodb": "@dev", | |||
"nette/php-generator": "^4", | |||
"symfony/console": "^6.3", | |||
"symfony/css-selector": "^6.3", | |||
"symfony/dom-crawler": "^6.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove when the scrape command is removed.
}; | ||
|
||
// Convert arguments to ArgumentDefinition objects | ||
// Optional arguments must be after required arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optional argument have to move to the end, as PHP doesn't support optional arguments before required ones.
That doesn't matter for "object" encoding, but this is an issue for "array" encoding since the order of arguments matter.
$slice
have an optional argument in the middle of required arguments.
To be reworked in PHPLIB-1274
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ask the server team to allow an object here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They certainly can't change the array order due to BC reasons.
Argument lists for operators are typically lists, so I don't imagine using an object would be consistent. Even if such a change was made, we'd still have to support the legacy array format for older server versions, which in turns means we'd need to implement version checking here. AFAIK, all of the query syntax is modeled irrespective of the server version, so I'd rather not open that can of worms.
I expect $slice
will just need to be handled specially.
- | ||
name: config | ||
type: | ||
- object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not modelize the full object. Let's use object
function for that. People will have to read the documentation anyway.
I could have made it a variadic map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, this agrees with what we do elsewhere for certain operations (e.g. ModifyCollection).
- | ||
name: range | ||
type: | ||
- Range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
range
model is a plain object for now. Not modelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases where we are choosing not to model the object, can we incorporate links in the YAML so that we get @see
references in the generated code? Then, if folks happen to view the generated operator file they can be directed to the server documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a @see
comment with the link to the doc on each operator class header, and factory method.
@@ -0,0 +1,23 @@ | |||
# $schema: ../schema.json | |||
name: $firstN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a name conflict with $firstN
that is both aggregation accumulator and array operator. Same for $lastN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these fall under aggregation, correct? And not the query namespace.
The API for both of these operators looks similar, but I assume they are distinct and there's nothing to say other operators and accumulators with the same name might differ more significantly. In this case, would it make sense to organize accumulators under their own namespace?
I actually notice that there are two groupings for accumulators in the docs:
It's hard to predict how we might encounter naming conflicts in the future, but do you think it'd be worth organizing the many aggregation operators by separate namespaces to group them logically?
We could also do the same for Query and Projection operators, which have far fewer categories, if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we over-categorize, it could be confusing for developers, who will have to guess which category we've put the operator they want to use.
Also, I don't want to duplicate Yaml files for operator what would be in multiple categories, like $avg
or $count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file modeling the array operator or accumulator? The link
and description
fields below suggest the array operator , but I see type: [ Accumulator ]
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file modeling the array operator or accumulator? The link
and description
fields below suggest the array operator , but I see type: [ Accumulator ]
as well.
I can see how over-categorization could be problematic (assuming users need to reference class names and don't use factory methods). What is your proposed solution?
private function recursiveEncode(mixed $value): mixed | ||
{ | ||
if (is_array($value)) { | ||
foreach ($value as $key => $val) { | ||
$value[$key] = $this->recursiveEncode($val); | ||
} | ||
|
||
return $value; | ||
} | ||
|
||
if ($value instanceof stdClass) { | ||
foreach (get_object_vars($value) as $key => $val) { | ||
$value->{$key} = $this->recursiveEncode($val); | ||
} | ||
} | ||
|
||
return $this->encodeIfSupported($value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is all about the ArrayCodec
and ObjectCodec
that were described in the codec architecture doc.
https://github.com/mongodb/mongo-php-library/pull/1125/files#diff-8bbcff6724826a3ca2b160e0718dbee5520987bbaa22e78eda96d0d448cad5e8R27
- "7.4" | ||
- "8.0" | ||
#- "7.4" | ||
#- "8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you come to a solution to still run PHPLIB against 7.4+ but limit the aggregation builder code to 8.1+?
@@ -0,0 +1,27 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, this got picked up as a generated file in GitHub:
Some generated files are not rendered by default.
I realize it's temporary, but that doesn't seem right.
examples/aggregation-builder.php
Outdated
@@ -48,15 +50,15 @@ function toJSON(object $document): string | |||
totalCount: Aggregation::sum(1), | |||
evenCount: Aggregation::sum( | |||
Aggregation::mod( | |||
Expression::fieldPath('randomValue'), | |||
Expression::numberFieldPath('randomValue'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is numberFieldPath
required over fieldPath
since the code would expect ResolvesToNumber
here? Do users still have the choice of passing an arbitrary field path as a string?
Looking at this now, I can see how these might be annoying if users must use type in the query builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a request from @alcaeus in the Technical Design doc.
To complete the type experience, there are also typed fieldPath helpers (e.g. stringFieldPath) which communicate to the type system that the field path resolves to the given type. This can again be used by an ORM to verify that the field will actually resolve to the given type (if known).
tests/Builder/BuilderEncoderTest.php
Outdated
); | ||
|
||
$expected = [ | ||
[ | ||
'$match' => [ | ||
'$or' => [ | ||
['score' => ['$gt' => 70, '$lt' => 90]], | ||
// same as ['score' => ['$gt' => 70, '$lt' => 90]], | ||
['score' => [['$gt' => 70], ['$lt' => 90]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these equivalent? One is matching a field against an array, which I expect would be an exact match. Consider:
> db.foo.insertOne({score:10})
> db.foo.insertOne({score:80})
> db.foo.insertOne({score:[{$gt:70},{$lt:90}]})
> db.foo.aggregate([{$match:{$or:[{score:[{$gt:70},{$lt:90}]}]}}])
[
{
_id: ObjectId("6525b7d5f411a41fea45873b"),
score: [ { '$gt': 70 }, { '$lt': 90 } ]
}
]
> db.foo.aggregate([{$match:{$or:[{score:{$gt:70,$lt:90}}]}}])
[ { _id: ObjectId("6525b859f411a41fea45873c"), score: 80 } ]
The document with {score: 10}
is never matched, as it doesn't satisfy both range operators. In the case where the range operators are passed separately in an array, the server performs an equality match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was too easy... to be correct. I reworked the implementation to generate the expected result.
*/ | ||
public function __construct(FieldPath|string ...$field) | ||
{ | ||
if (\count($field) < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that generated files reference global functions this way? It's inconsistent with what we do in the hand-written sources, so if it's easily configurable it may be worth changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added $namespace->addUseFunction
calls to add this. The code generator don't know about the function body, so it cannot be detected.
Replaced by mongodb/mongo-php-builder#1 |
Fix PHPLIB-1254
Operators and stages specifications
To create Yaml for all the stages and operators (query, pipeline), I started by filling a Google Sheet manually, from online documentation. Even if that took hours, it was a good learning. This document is temporary and will not be used after this PR.
The Yaml files are generated with the
scrape
command, that will be removed before merging.The
schema.json
applies to Yaml files.All operators and stages have an "encoding" type, that describes how to convert the object into BSON.
mongo-php-library/generator/config/schema.json
Lines 36 to 40 in 27c9f21
Types
The types will be reworked in PHPLIB-1251.
AccumulatorInterface
to type hint Group Accumulator Operators. It might be necessary to distinct the accumulators for each stage ($group
,$setWindowFields
...)QueryInterface
distinct query operators from expressions marked withExpressionInterface
.Add the
MongoDB\object
function to createstdClass
from variadic named arguments.mongo-php-library/src/functions.php
Lines 62 to 71 in 27c9f21