Replies: 2 comments
-
Posting on behalf of @tlento! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Following this thread, looks very promising! 👀 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Predicate Pushdown improvements, lessons learned, broader implications - a mostly technical retro
Improvements, in no particular order:
Lessons learned:
Broader implications:
It's my opinion that we should, prototype a two-stage DataflowPlanBuilder to see if it makes things easier to reason about while producing more readable (and probably more efficient) SQL.
My idea is the first pass would construct input sources as CTEs. What we'd do is collect all of the elements requested in the query and determine which joins we need to make. We'd build a set of CTE nodes for each distinct denormalized metric source. We can apply a union filter against it (i.e., we'd apply a where constraint to each CTE that was effectively a big OR between all the filters) and push down whatever we could past the join within the CTE.
Note the implicit empty filter - if there is no query filter and one metric requests booking__is_instant while the other does not have a filter, we cannot apply the booking__is_instant filter inside the CTE.
Once we have those nodes we can build the metric branches more or less as we do today. They will point at the CTEs instead of the raw measure sources.
So we can, at that point, eliminate or greatly simplify the source scan optimizer, and predicate pushdown also gets easier to reason about, because we can simply apply all of the filters for a metric branch to the CTE input as needed.
I think if we do this well we could even allow for things like aggregate awareness in a more natural way, because the CTE builder could provide appropriate measure aggregations off of partially-aggregated inputs defined in the semantic manifest (or similar).
This can be prototyped today off of extensions of what we have in metricflow_semantics. It probably wouldn't be production-viable without the entity graph we keep talking about, but at least we can experiment a little bit.
Beta Was this translation helpful? Give feedback.
All reactions