Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mapping-optimizer): Support in operator for mapping optimizer #5691

Merged

Conversation

Zylphrex
Copy link
Member

Re-apply #5685

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like

sentry_tags[key] IN (value1, value2)

This results in a sql like

in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])

which scans the entire sentry_tags.key and sentry_tags.value columns. The optimization here is to use the tags hash map which gives us a condition like

hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1')))

Re-apply #5685

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like
```
sentry_tags[key] IN (value1, value2)
```
This results in a sql like
```
in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])
```
which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like
```
hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1')))
```
@Zylphrex Zylphrex requested a review from a team as a code owner March 26, 2024 17:32
@@ -366,6 +366,7 @@ jobs:
tests/sentry/search/events \
tests/sentry/event_manager \
tests/sentry/api/endpoints/test_organization_profiling_functions.py \
tests/snuba/api/endpoints/test_organization_events_stats_mep.py \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test in here was failing in the last attempt

Comment on lines +202 to +204
# tags[foo] IN array('') is not optimizable
if param.value == "":
return ConditionClass.NOT_OPTIMIZABLE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cause of the test failure. Due to the way we query for tags, even if foo is not a tag on the row, it has a value of '' and won't exist in the tags hash map.

To handle this, we just mark these cases are not optimizable. and run the query as is.

(
Literal(None, "a"),
Literal(None, "b"),
Literal(None, ""),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test with an empty value in the list to capture this failure case

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.82%. Comparing base (8c6329d) to head (b378e13).
Report is 1 commits behind head on master.

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5691      +/-   ##
==========================================
- Coverage   92.17%   89.82%   -2.36%     
==========================================
  Files         859      900      +41     
  Lines       42080    43770    +1690     
  Branches        0      301     +301     
==========================================
+ Hits        38788    39317     +529     
- Misses       3292     4411    +1119     
- Partials        0       42      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Zylphrex Zylphrex merged commit 87f601a into master Mar 26, 2024
32 checks passed
@Zylphrex Zylphrex deleted the txiao/feat/support-in-operator-for-mapping-optimizers-2 branch March 26, 2024 18:07
@getsentry-bot
Copy link
Contributor

PR reverted: 7ed7441

getsentry-bot added a commit that referenced this pull request Mar 26, 2024
…mizer (#5691)"

This reverts commit 87f601a.

Co-authored-by: Zylphrex <10239353+Zylphrex@users.noreply.github.com>
Zylphrex added a commit that referenced this pull request Mar 26, 2024
…5691)

Re-apply #5685

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like
```
sentry_tags[key] IN (value1, value2)
```
This results in a sql like
```
in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])
```
which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like
```
hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1')))
```
Zylphrex added a commit that referenced this pull request Mar 27, 2024
…5691) (#5692)

Re-apply #5685

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like
```
sentry_tags[key] IN (value1, value2)
```
This results in a sql like
```
in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])
```
which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like
```
hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1')))
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants