-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CTEs in the column pruner #1503
Conversation
@@ -55,6 +55,7 @@ def __init__(self, tagged_column_alias_set: TaggedColumnAliasSet) -> None: | |||
traverses the SQL-query representation DAG. | |||
""" | |||
self._column_alias_tagger = tagged_column_alias_set | |||
self._cte_alias_to_cte_node: Dict[str, SqlCteNode] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we rename this to something more clear like _required_aliases_for_cte_nodes
?
@@ -86,17 +87,32 @@ def _search_for_expressions( | |||
|
|||
@override | |||
def visit_cte_node(self, node: SqlCteNode) -> None: | |||
raise NotImplementedError | |||
select_statement = node.select_statement | |||
# Copy the tagged aliases from the CTE to the SELECT since when visiting a SELECT, the CTE node (not the SELECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to tag both the CTE node and CTE's select node with the same aliases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before visiting a node, it's assumed that we have already recorded the column aliases required at the node. So for a SELECT statement, the hierarchy looks like:
SELECT node -> CTE node -> SELECT node of the CTE
If we don't do this copy, this is what happens:
Let's say we start the process with having recorded that col_0
is required in SELECT node
.
- We visit the
SELECT node
and record thatcol_0
is required inCTE node
. - Then we visit the CTE node. At the CTE node, we just go visit the
SELECT node of the CTE
- At the
SELECT node of the CTE
, then there's no information about which columns were required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was resolved as a part of changes related to another comment.
if cte_node is not None: | ||
self._column_alias_tagger.tag_aliases(cte_node, column_aliases) | ||
# Propagate the required aliases to parents, which could be other CTEs. | ||
cte_node.accept(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't CTEs be considered parent nodes? Why is this handled here instead of in _visit_parents()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not quite correct, so let me update. But CTE nodes are only the parents of the top level SELECT statement. So for something like:
-- node_id=cte_0
WITH some_table AS (
SELECT 1
)
-- node_id=ss_0
SELECT * FROM (
-- node_id=ss_1
SELECT *
FROM
-- node_id=st_0
some_table
) a
The parents of ss_0
are [ss_1, cte_0]
, but the only parent of ss_1
is st_0
.
This updates the column pruner optimizer to support CTEs. To support CTEs, the required columns are mapped to the corresponding CTE and propagated like other parent nodes.