Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimizer rewrite #5676

Merged
merged 18 commits into from
Mar 28, 2024
Merged

feat: optimizer rewrite #5676

merged 18 commits into from
Mar 28, 2024

Conversation

kylemumma
Copy link
Member

@kylemumma kylemumma commented Mar 21, 2024

https://getsentry.atlassian.net/browse/SNS-2669
This PR should get all of riccardos broken tests passing.

What motivated the rewrite?

  • I needed to implement new functionality to get riccardos tests passing
  • It was gunna be a lot of work to do
  • After talking to enoch we thought of an alternative approach, this turned out to a better solution that would be faster to implement

Major Changes:

  • pretty much full optimizer rewrite
  • optimizer feature flag on by default
  • fixed breaking integration tests

heres how it works now:

  1. use a visitor to get all the conditional aggregates (ex. sumMergeIf) in the ast
  2. or all the conditions in these functions, and add this to the where clause.

ex:

SELECT sumIf(val, metric_id in [1,2,3] and status=200),
       avgIf(val, metric_id=7), 
       maxIf(val, metric_id=5 and status=400)
WHERE org_id=1
...

will generate the following

(metric_id in [1,2,3] and status=200) or (metric_id=7) or (metric_id=5 and status=400)

and add it to the where clause

SELECT sumIf(val, metric_id in [1,2,3] and status=200),
       avgIf(val, metric_id=7), 
       maxIf(val, metric_id=5 and status=400)
WHERE org_id=1 and (
        (metric_id in [1,2,3] and status=200) or (metric_id=7) or (metric_id=5 and status=400)
    )
...

Copy link

codecov bot commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 93.79845% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 89.91%. Comparing base (4788281) to head (c744ea9).

✅ All tests successful. No failed tests found ☺️

Files Patch % Lines
...y/processors/logical/filter_in_select_optimizer.py 86.53% 7 Missing ⚠️
snuba/query/dsl.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5676      +/-   ##
==========================================
+ Coverage   89.86%   89.91%   +0.04%     
==========================================
  Files         900      900              
  Lines       43770    43758      -12     
  Branches      301      301              
==========================================
+ Hits        39333    39344      +11     
+ Misses       4395     4372      -23     
  Partials       42       42              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kylemumma kylemumma marked this pull request as ready for review March 21, 2024 23:30
@kylemumma kylemumma requested a review from a team as a code owner March 21, 2024 23:30
"""This function is deprecated please use snuba.query.dsl.binary_condition"""
from snuba.query.dsl import binary_condition as dsl_binary_condition

return dsl_binary_condition(function_name, lhs, rhs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldnt use this function in dsl.py bc of circular import with conditions.py. I thought this belonged in dsl.py anyways so I moved it.

op="processor", description="filter_in_select_optimize"
):
if settings is None:
FilterInSelectOptimizer().process_query(query, HTTPQuerySettings())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to point out that in this context settings can be None, LogicalQueryProcessor interface requires settings not None, thus I had to create dummy object.

Since settings arent used by my processor it doesnt matter, but the fact that i have to create this dummy instead of None smells to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just get these tests working rather than do a refactor that loosens coupling between parsing and post_processing/optimization. Maybe its something I could think about in the future.

Evan made an interesting point about end-to-end testing as a single source of truth, and the increased confidence it brings in the pipeline,
the tradeoff seems to be that this testing file has to change any time another stage is added. Im not sure what the right answer is.

@kylemumma kylemumma merged commit f746e1d into master Mar 28, 2024
32 checks passed
@kylemumma kylemumma deleted the kylemumma-optimizer-update branch March 28, 2024 16:40
@getsentry-bot
Copy link
Contributor

PR reverted: af29c13

getsentry-bot added a commit that referenced this pull request Mar 28, 2024
This reverts commit f746e1d.

Co-authored-by: kylemumma <24424170+kylemumma@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants