Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTEs in sub-query reducer #1504

Merged
merged 4 commits into from
Nov 11, 2024
Merged

Support CTEs in sub-query reducer #1504

merged 4 commits into from
Nov 11, 2024

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 4, 2024

This updates the sub-query reducer to support CTEs. The SELECT statement in each CTE is treated like a SELECT, and all CTEs are retained. i.e. a CTE is not reduced with another CTE. The CTEs are retained to make mapping to dataflow plan nodes easier in later PRs.

@cla-bot cla-bot bot added the cla:yes label Nov 4, 2024
@plypaul plypaul changed the title Update sub-query reducer to support CTEs. Update sub-query reducer to support CTEs Nov 4, 2024
@plypaul plypaul changed the title Update sub-query reducer to support CTEs Support CTEs in sub-query reducer Nov 4, 2024
@plypaul plypaul marked this pull request as ready for review November 5, 2024 21:34
@@ -196,7 +199,7 @@ def _is_simple_source(node: SqlSelectStatementNode) -> bool:
if select_column.expr.lineage.contains_aggregate_exprs:
return False
return (
len(node.parent_nodes) <= 1
len(node.join_descs) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a behavior change - were there changes to the snapshots from this change alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no snapshot changes. This is not a behavior change as without CTEs, the number of parent nodes is 1 + the number of joins.

@@ -612,10 +615,11 @@ def visit_select_statement_node(self, node: SqlSelectStatementNode) -> SqlQueryP
# JOIN dim_listings c
# ON a.listing_id = b.listing_id

assert len(node_with_reduced_parents.parent_nodes) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to remove this assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - since CTEs are modeled as a parent node of a SELECT node, this assertion does not hold.

@@ -579,6 +578,13 @@ def _rewrite_node_with_join(node: SqlSelectStatementNode) -> SqlSelectStatementN
select_columns=tuple(clauses_to_rewrite.select_columns),
from_source=from_source,
from_source_alias=from_source_alias,
cte_sources=tuple(
SqlCteNode.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use cte_source.with_new_select?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. with_new_select got added as a convenience method late in the stack.

Previously, "parent nodes" was used for either nodes in the FROM clause or the
JOIN clause. With the addition of CTEs that are also considered parent nodes,
this renames a few variables / updates some conditionals for clarity.
@plypaul plypaul merged commit 71e144a into main Nov 11, 2024
15 checks passed
@plypaul plypaul deleted the p--cte--09 branch November 11, 2024 23:03
plypaul added a commit that referenced this pull request Nov 13, 2024
Similar to the updates for the other optimizers in
#1503 and
#1504, this PR implements the
CTE case for the `SqlTableAliasSimplifier`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants