-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[substrait] Add support for ExtensionTable #13772
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ use datafusion::logical_expr::expr::{Exists, InSubquery, Sort}; | |
|
||
use datafusion::logical_expr::{ | ||
Aggregate, BinaryExpr, Case, Cast, EmptyRelation, Expr, ExprSchemable, Extension, | ||
LogicalPlan, Operator, Projection, SortExpr, Subquery, TryCast, Values, | ||
LogicalPlan, Operator, Projection, SortExpr, Subquery, TableScan, TryCast, Values, | ||
}; | ||
use substrait::proto::aggregate_rel::Grouping; | ||
use substrait::proto::expression as substrait_expression; | ||
|
@@ -86,6 +86,7 @@ use substrait::proto::expression::{ | |
SingularOrList, SwitchExpression, WindowFunction, | ||
}; | ||
use substrait::proto::read_rel::local_files::file_or_files::PathType::UriFile; | ||
use substrait::proto::read_rel::ExtensionTable; | ||
use substrait::proto::rel_common::{Emit, EmitKind}; | ||
use substrait::proto::set_rel::SetOp; | ||
use substrait::proto::{ | ||
|
@@ -438,6 +439,22 @@ pub trait SubstraitConsumer: Send + Sync + Sized { | |
user_defined_literal.type_reference | ||
) | ||
} | ||
|
||
fn consume_extension_table( | ||
&self, | ||
extension_table: &ExtensionTable, | ||
_schema: &DFSchema, | ||
_projection: &Option<MaskExpression>, | ||
) -> Result<LogicalPlan> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I like where your head is at with this, but I almost want to go further. You already called out:
Maybe the interface for this should just be: fn consume_extension_table(
&self,
read_rel: &ReadRel
extension_table: &ExtensionTable) -> Result<LogicalPlan> which will be future proofed for if fields are ever added to the ReadRel, and also provides access to common fields on the ReadRel. We could even go further and add fn consume_named_table(
&self,
read_rel: &ReadRel
named_table: &NamedTable) -> Result<LogicalPlan>
fn consume_virtual_table(
&self,
read_rel: &ReadRel
named_table: &VirtualTable) -> Result<LogicalPlan> to make it easier to customize behaviour for specific read_types. This last idea might be better as it's own PR, as we would need to factor out some of the code in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see your point, especially as I've been toying with the same idea myself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this is maybe slightly against the stated goal of "to allow custom implementations to use that information for fully restoring their custom tables if needed", but I'm not sure why someone would need custom impl for that behavior based on the read type? How about something like:
That way the ReadRel handling doesn't need to happen in multiple places, its projection and schema are by default handled the same way for all relations (which I'd think they should?), but if a user wants they can easily override the whole from_read_rel (or more specifically, SubstraitConsumer::consume_read) and compose their desired result. (Actually, dunno if there's reason to have |
||
if let Some(ext_detail) = extension_table.detail.as_ref() { | ||
substrait_err!( | ||
"Missing handler for extension table: {}", | ||
&ext_detail.type_url | ||
) | ||
} else { | ||
substrait_err!("Unexpected empty detail in ExtensionTable") | ||
} | ||
} | ||
} | ||
|
||
/// Convert Substrait Rel to DataFusion DataFrame | ||
|
@@ -559,6 +576,32 @@ impl SubstraitConsumer for DefaultSubstraitConsumer<'_> { | |
let plan = plan.with_exprs_and_inputs(plan.expressions(), inputs)?; | ||
Ok(LogicalPlan::Extension(Extension { node: plan })) | ||
} | ||
|
||
fn consume_extension_table( | ||
&self, | ||
extension_table: &ExtensionTable, | ||
schema: &DFSchema, | ||
projection: &Option<MaskExpression>, | ||
) -> Result<LogicalPlan> { | ||
if let Some(ext_detail) = &extension_table.detail { | ||
let source = self | ||
.state | ||
.serializer_registry() | ||
.deserialize_custom_table(&ext_detail.type_url, &ext_detail.value)?; | ||
let table_name = ext_detail | ||
.type_url | ||
.rsplit_once('/') | ||
.map(|(_, name)| name) | ||
.unwrap_or(&ext_detail.type_url); | ||
let table_scan = TableScan::try_new(table_name, source, None, vec![], None)?; | ||
let plan = LogicalPlan::TableScan(table_scan); | ||
ensure_schema_compatibility(plan.schema(), schema.clone())?; | ||
let schema = apply_masking(schema.clone(), projection)?; | ||
apply_projection(plan, schema) | ||
} else { | ||
substrait_err!("Unexpected empty detail in ExtensionTable") | ||
} | ||
} | ||
} | ||
|
||
// Substrait PrecisionTimestampTz indicates that the timestamp is relative to UTC, which | ||
|
@@ -1449,8 +1492,11 @@ pub async fn from_read_rel( | |
) | ||
.await | ||
} | ||
_ => { | ||
not_impl_err!("Unsupported ReadType: {:?}", read.read_type) | ||
Some(ReadType::ExtensionTable(ext)) => { | ||
consumer.consume_extension_table(ext, &substrait_schema, &read.projection) | ||
} | ||
None => { | ||
substrait_err!("Unexpected empty read_type") | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
SerializerRegistry
trait now has two more methods for handling tables (with default implementations for backwards compatibility), so it makes sense for the existing methods to have default implementations as well.This will allow implementors to conveniently implement the trait for user-defined logical nodes only or for tables only.
Since the implementations here are perfect as trait defaults, this PR just moves them into the trait itself.