-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce UserDefinedLogicalNodeUnparser
for User-defined Logical Plan unparsing
#13880
Introduce UserDefinedLogicalNodeUnparser
for User-defined Logical Plan unparsing
#13880
Conversation
Could @phillipleblanc or @sgrebnov take a look at this? Thanks |
/// The child unparsers are called iteratively. | ||
/// There are two methods in [`Unparser`] will be called: | ||
/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement. | ||
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goldmedal - I'm not sure using the last unparsing result is the expected behavior. As a user, I would expect to get the result from the first udlp_unparser
that supports this node and stop checking the remaining udlp_unparsers
instead.
Is there a specific use case / reason for using the last supported udlp_unparser? They can be dynamically registered and the last one should override perviously registered? To match unparse
behavior where we don't know/track if unparsing is applied so we always apply all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would also expect this to short-circuit and have the first one win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific use case / reason for using the last supported udlp_unparser? They can be dynamically registered and the last one should override perviously registered.
Actually, I don't have a real case for it but yes, I imagined the user can append the new unparser to override the previous.
However, I guess it could be a rare use case (?
As a user, I would expect to get the result from the first udlp_unparser that supports this node and stop checking the remaining udlp_unparsers instead.
Maybe you guys are right. We should make the first one win. It's more efficient and simpler.
It's also how ExprPlanner
worked in the planner.
datafusion/datafusion/sql/src/expr/mod.rs
Lines 222 to 229 in 75202b5
for planner in self.context_provider.get_expr_planners() { | |
match planner.plan_extract(extract_args)? { | |
PlannerResult::Planned(expr) => return Ok(expr), | |
PlannerResult::Original(args) => { | |
extract_args = args; | |
} | |
} | |
} |
Anyway, I'll change it. Thanks!
datafusion/sql/src/unparser/plan.rs
Outdated
select: &mut Option<&mut SelectBuilder>, | ||
relation: &mut Option<&mut RelationBuilder>, | ||
) -> Result<()> { | ||
for unparser in &self.udlp_unparsers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to add indication that unparse
applied to be consistent with unparse_to_statement
and throw error if non of registered udlps applied / successfully processed the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if none of the extension unparsers were able to process it, it should throw an error IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound greats. I think we could provide some enum
for it like
UnparsedResult::Unparsed()
UnparsedResult::Original()
..
I'll try it
@goldmedal - looks great, two minor questions/comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal! I have a few minor comments, but this this a good upgrade for the unparser!
datafusion/sql/src/unparser/mod.rs
Outdated
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned. | ||
/// - `extension_to_sql`: This method is called when the custom logical node is part of a statement. | ||
/// If multiple child unparsers are registered for the same custom logical node, all of them will be called in order. | ||
pub fn with_udlp_unparsers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this name - udlp
takes effort to understand what it means. How about renaming udlp_*
to extension_*
? i.e. with_extension_unparsers
. It conveys the same meaning in an easier to understand way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. I'll rename it.
/// The child unparsers are called iteratively. | ||
/// There are two methods in [`Unparser`] will be called: | ||
/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement. | ||
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would also expect this to short-circuit and have the first one win.
/// - `extension_to_statement`: This method is called when the custom logical node is a custom statement. | ||
/// If multiple child unparsers return a non-None value, the last unparsing result will be returned. | ||
/// - `extension_to_sql`: This method is called when the custom logical node is part of a statement. | ||
/// If multiple child unparsers are registered for the same custom logical node, all of them will be called in order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also short-circuit and only do the first one?
datafusion/sql/src/unparser/plan.rs
Outdated
select: &mut Option<&mut SelectBuilder>, | ||
relation: &mut Option<&mut RelationBuilder>, | ||
) -> Result<()> { | ||
for unparser in &self.udlp_unparsers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if none of the extension unparsers were able to process it, it should throw an error IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also rename this file to extension_unparser.rs
if let Some(err) = plan_to_sql(&plan).err() { | ||
assert_eq!( | ||
err.to_string(), | ||
"External error: `relation` must be initialized" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the original error for unsupported relation plan 🤔 I guess it could be improved by #13880 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
} | ||
|
||
/// The result of unparsing a custom logical node. | ||
pub enum UnparseResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use two separate enums here since one variant can never be emitted in one of the functions, and vice-versa - better type safety, and removes one arm of the match clause that just returns an error anyway.
Also perhaps Unmodified
or Unmatched
instead of Original
? It's not immediately obvious what an UnparseResult::Original means, but that is quite minor - also fine to leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. It can avoid the potential user error. I did it in 7b6c37f. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a very brief look at this PR and it looks really nice -- thank you @goldmedal
I suspect part of why it looks great is due to the reviews from @phillipleblanc and @sgrebnov -- thank you both as well for helping to keep the code flowing
/// This example demonstrates how to unparse a custom logical plan as a statement. | ||
/// The custom logical plan is a simple extension of the logical plan that reads from a parquet file. | ||
/// It can be unparse as a statement that reads from the same parquet file. | ||
async fn unparse_my_logical_plan_as_statement() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool
Thanks @phillipleblanc , @sgrebnov and @alamb for the reviews 🙇 |
Which issue does this PR close?
Closes #13753 .
Rationale for this change
See the previous discussion for the design: #13753 (comment)
UserDefinedLogicalNodeUnparser
provides two APIs for user-defined behavior:unparse
: Unparse the custom logical node to SQL within a statement.unparse_to_statement
: Unparse the custom logical node to a statement.What changes are included in this PR?
UserDefinedLogicalNodeUnparser
Are these changes tested?
yes
Are there any user-facing changes?
New trait and API