Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTEs in the column pruner #1503

Merged
merged 3 commits into from
Nov 11, 2024
Merged

Support CTEs in the column pruner #1503

merged 3 commits into from
Nov 11, 2024

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 4, 2024

This updates the column pruner optimizer to support CTEs. To support CTEs, the required columns are mapped to the corresponding CTE and propagated like other parent nodes.

@cla-bot cla-bot bot added the cla:yes label Nov 4, 2024
@plypaul plypaul marked this pull request as ready for review November 4, 2024 17:07
@@ -55,6 +55,7 @@ def __init__(self, tagged_column_alias_set: TaggedColumnAliasSet) -> None:
traverses the SQL-query representation DAG.
"""
self._column_alias_tagger = tagged_column_alias_set
self._cte_alias_to_cte_node: Dict[str, SqlCteNode] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we rename this to something more clear like _required_aliases_for_cte_nodes?

@@ -86,17 +87,32 @@ def _search_for_expressions(

@override
def visit_cte_node(self, node: SqlCteNode) -> None:
raise NotImplementedError
select_statement = node.select_statement
# Copy the tagged aliases from the CTE to the SELECT since when visiting a SELECT, the CTE node (not the SELECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to tag both the CTE node and CTE's select node with the same aliases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before visiting a node, it's assumed that we have already recorded the column aliases required at the node. So for a SELECT statement, the hierarchy looks like:

SELECT node -> CTE node -> SELECT node of the CTE

If we don't do this copy, this is what happens:

Let's say we start the process with having recorded that col_0 is required in SELECT node.

  • We visit the SELECT node and record that col_0 is required in CTE node.
  • Then we visit the CTE node. At the CTE node, we just go visit the SELECT node of the CTE
  • At the SELECT node of the CTE, then there's no information about which columns were required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was resolved as a part of changes related to another comment.

if cte_node is not None:
self._column_alias_tagger.tag_aliases(cte_node, column_aliases)
# Propagate the required aliases to parents, which could be other CTEs.
cte_node.accept(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't CTEs be considered parent nodes? Why is this handled here instead of in _visit_parents()?

Copy link
Contributor Author

@plypaul plypaul Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not quite correct, so let me update. But CTE nodes are only the parents of the top level SELECT statement. So for something like:

-- node_id=cte_0
WITH some_table AS (
  SELECT 1
)

-- node_id=ss_0
SELECT * FROM (
  -- node_id=ss_1
  SELECT * 
  FROM 
   -- node_id=st_0
    some_table
) a

The parents of ss_0 are [ss_1, cte_0], but the only parent of ss_1 is st_0.

@plypaul plypaul merged commit a00d9e3 into main Nov 11, 2024
15 checks passed
@plypaul plypaul deleted the p--cte--08 branch November 11, 2024 21:40
plypaul added a commit that referenced this pull request Nov 13, 2024
Similar to the updates for the other optimizers in
#1503 and
#1504, this PR implements the
CTE case for the `SqlTableAliasSimplifier`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants