Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generation of CTEs in DataflowToSqlQueryPlanConverter #1521

Merged
merged 5 commits into from
Nov 14, 2024
Merged

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 10, 2024

This PR:

  • Adds SqlGenerationOptionSet to encapsulate the options for how SQL should be generated from the dataflow plan.
  • Adds O5 level that uses all previous optimizers and generates CTEs.
  • Updates DataflowToSqlQueryPlanConverter to use SqlGenerationOptionSet. When the allow_cte option it set, converts the common nodes (as implemented in Add a method to figure out common nodes in a dataflow plan #1520) in a dataflow plan to map to a CTE instead of a subquery.

Since CTEs are not generated by default, the generated SQL is the same for test cases and there are no snapshot changes (aside from the ones that specifically test this feature).

@cla-bot cla-bot bot added the cla:yes label Nov 10, 2024
@plypaul plypaul marked this pull request as ready for review November 12, 2024 01:45
Base automatically changed from p--cte--15 to main November 13, 2024 05:50
optimizers=option_set.optimizers,
)

def convert_using_specifics(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a better name for this one.

SqlColumnPrunerOptimizer(),
SqlRewritingSubQueryReducer(use_column_alias_in_group_bys=use_column_alias_in_group_by),
SqlTableAliasSimplifier(),
)
allow_cte = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an optimizer? Why treat it differently? Did you turn it into an optimizer in a later PR or did I hallucinate that? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little different from the SQL optimizers because it changes how the dataflow plan converts to the SQL plan. The SQL optimizers convert a SQL plan to another SQL plan. Might have been a similar class you saw?

logger.debug(LazyFormat("Converting to SQL", nodes_to_convert_to_cte=nodes_to_convert_to_cte))

if len(nodes_to_convert_to_cte) == 0:
# Avoid `DataflowNodeToSqlCteVisitor` code path for better isolation during rollout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying this 👍

@plypaul plypaul merged commit 5c6e448 into main Nov 14, 2024
15 checks passed
@plypaul plypaul deleted the p--cte--16 branch November 14, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants