-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mapping-optimizer): Support in operator for mapping optimizer #5691
feat(mapping-optimizer): Support in operator for mapping optimizer #5691
Conversation
Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```
@@ -366,6 +366,7 @@ jobs: | |||
tests/sentry/search/events \ | |||
tests/sentry/event_manager \ | |||
tests/sentry/api/endpoints/test_organization_profiling_functions.py \ | |||
tests/snuba/api/endpoints/test_organization_events_stats_mep.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test in here was failing in the last attempt
# tags[foo] IN array('') is not optimizable | ||
if param.value == "": | ||
return ConditionClass.NOT_OPTIMIZABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the cause of the test failure. Due to the way we query for tags, even if foo
is not a tag on the row, it has a value of ''
and won't exist in the tags hash map.
To handle this, we just mark these cases are not optimizable. and run the query as is.
( | ||
Literal(None, "a"), | ||
Literal(None, "b"), | ||
Literal(None, ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new test with an empty value in the list to capture this failure case
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found Additional details and impacted files@@ Coverage Diff @@
## master #5691 +/- ##
==========================================
- Coverage 92.17% 89.82% -2.36%
==========================================
Files 859 900 +41
Lines 42080 43770 +1690
Branches 0 301 +301
==========================================
+ Hits 38788 39317 +529
- Misses 3292 4411 +1119
- Partials 0 42 +42 ☔ View full report in Codecov by Sentry. |
PR reverted: 7ed7441 |
…5691) Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```
…5691) (#5692) Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```
Re-apply #5685
This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like
This results in a sql like
which scans the entire
sentry_tags.key
andsentry_tags.value
columns. The optimization here is to use the tags hash map which gives us a condition like