evalengine: Add support for enum and set #15783

dbussink · 2024-04-23T09:21:15Z

The evalengine currently doesn't handle enum and set types properly. The comparison function always returns 0 at the moment and we don't consider ordering of elements etc.

Here we add two new native types to the evalengine for set and enum and instantiate those appropriately. We also ensure we can compare them correctly.

In case we don't have the schema information with the values, we do a best effort case of depending on the string representation. This is not correct always of course, but at least makes equality comparison work for those cases and only ordering is off in that scenario.

Related Issue(s)

Fixes #15782

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

vitess-bot · 2024-04-23T09:21:18Z

The evalengine currently doesn't handle enum and set types properly. The comparison function always returns 0 at the moment and we don't consider ordering of elements etc. Here we add two new native types to the evalengine for set and enum and instantiate those appropriately. We also ensure we can compare them correctly. In case we don't have the schema information with the values, we do a best effort case of depending on the string representation. This is not correct always of course, but at least makes equality comparison work for those cases and only ordering is off in that scenario. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

codecov · 2024-04-23T09:44:35Z

Codecov Report

Attention: Patch coverage is 71.28713% with 87 lines in your changes are missing coverage. Please review.

Project coverage is 68.43%. Comparing base (5e2a873) to head (89ffe52).
Report is 4 commits behind head on main.

Files	Patch %	Lines
go/vt/vtgate/evalengine/compiler_asm_push.go	0.00%	22 Missing ⚠️
go/vt/vtgate/evalengine/arena.go	9.09%	20 Missing ⚠️
go/vt/vtgate/evalengine/eval_numeric.go	0.00%	16 Missing ⚠️
go/vt/vtgate/evalengine/compiler.go	71.42%	8 Missing ⚠️
go/vt/vtgate/evalengine/expr_column.go	25.00%	6 Missing ⚠️
go/vt/vtgate/evalengine/eval.go	78.94%	4 Missing ⚠️
go/vt/vtgate/evalengine/api_hash.go	70.00%	3 Missing ⚠️
go/vt/vtgate/evalengine/translate.go	0.00%	3 Missing ⚠️
go/vt/wrangler/vdiff.go	83.33%	2 Missing ⚠️
go/vt/vtgate/engine/distinct.go	50.00%	1 Missing ⚠️
... and 2 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15783      +/-   ##
==========================================
+ Coverage   68.42%   68.43%   +0.01%     
==========================================
  Files        1556     1558       +2     
  Lines      195494   195822     +328     
==========================================
+ Hits       133758   134017     +259     
- Misses      61736    61805      +69

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vmg

Very nice addition! I have one big piece of feedback, which is about the Values []string type that has been wired up throughout the codebase. It's heavy! It's super heavy with 24-bytes + backing storage + all the string backing storage.

Clearly, neither ENUM or SET is a commonly used feature (as evidenced by the fact that we've gotten away without implementing support for it so far). Hence, I think this warrants spending some time optimizing the storage of these values so they don't weight down the rest of the evalengine whenever they're not used (which will be 99% of the time). This is particularly important for the evalengine's internal and external Type, which are currently pretty well packed (yey!) and they would almost triple in size after adding the []string values.

I'm not 100% sure of what's the ideal way to optimize this. This is how I would try though: start with a type EnumSetValues string and store values *EnumSetValues everywhere where they must be kept. The full set of values is serialized into a single immutable string, and the EnumSetValues would have helpers to iterate each value individually / get their index/etc without having to understand how they're serialized.

This (or something close to this) feels like the cheapest way to pass along the set of values without requiring the 24 bytes of the slice. Give this a go, or otherwise hit me up and we can pair on it. 😉

vmg

After discussing in Slack, the sweet middle ground is probably *[]string to hide the slice behind a pointer without having to bother with a packed serialization format. LGTM!

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

mattlord

Thanks! I'll merge this into #15723 tomorrow.

One tangentially related thing that I also realized in working on that is that for ENUMs we can get the length of the column when building vevents in unit tests this way:

int64(len(col.Type.EnumValues) + 1)

There is no equivalent for Set in the parser.

go/sqltypes/value.go

mattlord · 2024-04-23T23:24:55Z

go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go

@@ -303,7 +303,7 @@ func (tp *TablePlan) isOutsidePKRange(bindvars map[string]*querypb.BindVariable,

 		rowVal, _ := sqltypes.BindVariableToValue(bindvar)
 		// TODO(king-11) make collation aware


I think we can remove this TODO now if you want 🙂

This one still passes in Unknown for the collation, so I don't think we can remove it (yet)?

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink · 2024-04-24T07:35:41Z

One tangentially related thing that I also realized in working on that is that for ENUMs we can get the length of the column when building vevents in unit tests this way:
int64(len(col.Type.EnumValues) + 1)
There is no equivalent for Set in the parser.

@mattlord A set should always be backed under the hood storage wise by a single uint64. A set is then stored as a bitmap. So you can always make that assumption then.

dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Evalengine changes to the evaluation engine labels Apr 23, 2024

dbussink requested review from deepthi, mattlord, rohit-nayak-ps, harshit-gangal, systay, shlomi-noach, GuptaManan100, frouioui, arthurschreiber and vmg as code owners April 23, 2024 09:21

github-actions bot added this to the v20.0.0 milestone Apr 23, 2024

dbussink mentioned this pull request Apr 23, 2024

VReplication: Move ENUM and SET mappings from vplayer to vstreamer #15723

Merged

5 tasks

dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from 4cb5f7d to b0521ad Compare April 23, 2024 09:27

dbussink mentioned this pull request Apr 23, 2024

Store Decimal precision and size while normalising #15785

Merged

5 tasks

vmg reviewed Apr 23, 2024

View reviewed changes

dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from 705487d to b0ba7f0 Compare April 23, 2024 14:47

vmg approved these changes Apr 23, 2024

View reviewed changes

dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from b0ba7f0 to f59e322 Compare April 23, 2024 15:17

Introduce type and use pointer to reduce size

595ec8f

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from f59e322 to 595ec8f Compare April 23, 2024 15:23

mattlord approved these changes Apr 24, 2024

View reviewed changes

Fix comments

89ffe52

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink merged commit 4c2df48 into vitessio:main Apr 24, 2024
104 checks passed

dbussink deleted the dbussink/add-evalengine-enum-set-support branch April 24, 2024 07:39

dbussink mentioned this pull request Apr 24, 2024

Fix Scale and length handling in CASE and JOIN bind variables #15787

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evalengine: Add support for enum and set #15783

evalengine: Add support for enum and set #15783

dbussink commented Apr 23, 2024

vitess-bot bot commented Apr 23, 2024

codecov bot commented Apr 23, 2024 •

edited

Loading

vmg left a comment

vmg left a comment

mattlord left a comment

mattlord Apr 23, 2024

dbussink Apr 24, 2024

dbussink commented Apr 24, 2024

		@@ -303,7 +303,7 @@ func (tp TablePlan) isOutsidePKRange(bindvars map[string]querypb.BindVariable,

		rowVal, _ := sqltypes.BindVariableToValue(bindvar)
		// TODO(king-11) make collation aware

evalengine: Add support for enum and set #15783

evalengine: Add support for enum and set #15783

Conversation

dbussink commented Apr 23, 2024

Related Issue(s)

Checklist

vitess-bot bot commented Apr 23, 2024

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov bot commented Apr 23, 2024 • edited Loading

Codecov Report

vmg left a comment

Choose a reason for hiding this comment

vmg left a comment

Choose a reason for hiding this comment

mattlord left a comment

Choose a reason for hiding this comment

mattlord Apr 23, 2024

Choose a reason for hiding this comment

dbussink Apr 24, 2024

Choose a reason for hiding this comment

dbussink commented Apr 24, 2024

codecov bot commented Apr 23, 2024 •

edited

Loading