Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance join prefilter optimizer #24223

Open
kaikalur opened this issue Dec 9, 2024 · 4 comments
Open

Enhance join prefilter optimizer #24223

kaikalur opened this issue Dec 9, 2024 · 4 comments

Comments

@kaikalur
Copy link
Contributor

kaikalur commented Dec 9, 2024

Currently we only support it when the other side is scan. We should enhance this feature to:

a) union all of scan
b) if the side has an aggregation on the join key we can still do it (and not do distinct)
c) drop any rows where any of the keys is null (especially important for multi-key joins)
d) make it cost based so that we prefilter the "correct" side which gives best results

  • also do it for outer joins - flipping the sides and left/right outer join if needed

CC: @feilong-liu

@Shravanpmk
Copy link

Can you please give me more details on where exactly we should do this optimization ?
PS : I am starting with presto. so some details will help me navigate the problem and contribute to the solution. Thanks

@Shravanpmk
Copy link

If I am not wrong it should in the file : presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/JoinPrefilter.java

@Shravanpmk
Copy link

@kaikalur Can you please guide me through this issue
Would be of much help thanks

@kaikalur
Copy link
Contributor Author

@kaikalur Can you please guide me through this issue Would be of much help thanks

Yes you found the code. Basically, right now we do it only if the other side is only a scan+filter+project. So extend it for other cases that I listed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants