-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evalengine: Add support for enum and set #15783
evalengine: Add support for enum and set #15783
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
The evalengine currently doesn't handle enum and set types properly. The comparison function always returns 0 at the moment and we don't consider ordering of elements etc. Here we add two new native types to the evalengine for set and enum and instantiate those appropriately. We also ensure we can compare them correctly. In case we don't have the schema information with the values, we do a best effort case of depending on the string representation. This is not correct always of course, but at least makes equality comparison work for those cases and only ordering is off in that scenario. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
4cb5f7d
to
b0521ad
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15783 +/- ##
==========================================
+ Coverage 68.42% 68.43% +0.01%
==========================================
Files 1556 1558 +2
Lines 195494 195822 +328
==========================================
+ Hits 133758 134017 +259
- Misses 61736 61805 +69 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice addition! I have one big piece of feedback, which is about the Values []string
type that has been wired up throughout the codebase. It's heavy! It's super heavy with 24-bytes + backing storage + all the string backing storage.
Clearly, neither ENUM
or SET
is a commonly used feature (as evidenced by the fact that we've gotten away without implementing support for it so far). Hence, I think this warrants spending some time optimizing the storage of these values so they don't weight down the rest of the evalengine whenever they're not used (which will be 99% of the time). This is particularly important for the evalengine's internal and external Type
, which are currently pretty well packed (yey!) and they would almost triple in size after adding the []string
values.
I'm not 100% sure of what's the ideal way to optimize this. This is how I would try though: start with a type EnumSetValues string
and store values *EnumSetValues
everywhere where they must be kept. The full set of values is serialized into a single immutable string, and the EnumSetValues
would have helpers to iterate each value individually / get their index/etc without having to understand how they're serialized.
This (or something close to this) feels like the cheapest way to pass along the set of values without requiring the 24 bytes of the slice. Give this a go, or otherwise hit me up and we can pair on it. 😉
705487d
to
b0ba7f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing in Slack, the sweet middle ground is probably *[]string
to hide the slice behind a pointer without having to bother with a packed serialization format. LGTM!
b0ba7f0
to
f59e322
Compare
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
f59e322
to
595ec8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll merge this into #15723 tomorrow.
One tangentially related thing that I also realized in working on that is that for ENUMs we can get the length of the column when building vevents in unit tests this way:
int64(len(col.Type.EnumValues) + 1)
There is no equivalent for Set in the parser.
@@ -303,7 +303,7 @@ func (tp *TablePlan) isOutsidePKRange(bindvars map[string]*querypb.BindVariable, | |||
|
|||
rowVal, _ := sqltypes.BindVariableToValue(bindvar) | |||
// TODO(king-11) make collation aware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this TODO now if you want 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one still passes in Unknown
for the collation, so I don't think we can remove it (yet)?
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@mattlord A |
The evalengine currently doesn't handle enum and set types properly. The comparison function always returns 0 at the moment and we don't consider ordering of elements etc.
Here we add two new native types to the evalengine for set and enum and instantiate those appropriately. We also ensure we can compare them correctly.
In case we don't have the schema information with the values, we do a best effort case of depending on the string representation. This is not correct always of course, but at least makes equality comparison work for those cases and only ordering is off in that scenario.
Related Issue(s)
Fixes #15782
Checklist