-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for emit in aggregate relations #122
Conversation
01631be
to
c31787d
Compare
src/substrait/textplan/parser/SubstraitPlanSubqueryRelationVisitor.cpp
Outdated
Show resolved
Hide resolved
e2f54bf
to
522e999
Compare
auto relationData = | ||
ANY_CAST(std::shared_ptr<RelationData>, actualScope->blob); | ||
const SymbolInfo* symbol{nullptr}; | ||
if (isAggregate(currentScope)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer, thanks. It might be worthwhile to include the type of currentScope (and therefore the range of fieldReferences which would have been valid) below in the out of range error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that better error messages are useful but I believe the substrait validator is better suited for helping developers figure what is wrong with their plan generation (allowing this code to be simpler). I've added the type here though.
// TODO -- Use a more explicit check. | ||
if (!relationData->outputFieldReferences.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of more explicit check were you thinking?
// TODO -- Use a more explicit check. | |
if (!relationData->outputFieldReferences.empty()) { | |
if (!relationData->outputFieldReferences.empty()) { | |
assert(relationData->relation.rel_type_case() == ::substrait::proto::Rel::kAggregate)); |
I'm not familiar enough with substrait to know at what level aggregate nodes are the only relations with output field references. Do you mean they are the only ones supported by substrait-cpp? The only relation for which textplan will ever provide a representation? I don't think they're the only relations which substrait itself allows to have an output mapping...
I think the comment should explain this in greater depth. If it's the last thing (IE, part of general substrait knowledge) maybe a link to an appropriate section of the substrait doc is in order.
In any case, if this block of code is specific to relationData->relation.rel_type_case() == ::substrait::proto::Rel::kAggregate
, it should probably be moved into the switch above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was relation into the comment below. I've replaced the output field references with a check on the incoming proto's type. I did not move it above as the function is currently separated into a section where the fields are populated with a later emit handling step. I can see some potential refactoring where those two parts of this rather long function could be split up.
@@ -316,6 +316,13 @@ bool isRelationEmitDetail(SubstraitPlanParser::Relation_detailContext* ctx) { | |||
nullptr; | |||
} | |||
|
|||
bool isAggregate(const SymbolInfo* symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two implementations of this function (PlanPrinterVisitor.cpp:74)? The definitions are even different. If the difference is intentional, that's definitely worth a comment or at least renaming the functions to clarify their distinct behaviors. If the difference is not intentional, could this become SymbolInfo::isAggregate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term there shouldn't be this function at all and just a single type in the representation. Right now the intermediate form can have both types (depending on how it was parsed) which needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, a comment would definitely be advisable just to give future readers a clue that this is a stopgap and that in the future there should only be RelationType::kAggregate
// TODO -- Add support for aggregate relations. | ||
errorListener_->addError( | ||
token, "Aggregate relations do not yet support emit sections."); | ||
if (relationData->relation.has_aggregate()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to the code at the end of InitialPlanProtoVisitor::updateLocalSchema
. Any chance it could be extracted to a helper function?
@@ -374,6 +372,13 @@ comparisonToProto(const std::string& text) { | |||
Expression_Subquery_SetComparison_ComparisonOp_COMPARISON_OP_UNSPECIFIED; | |||
} | |||
|
|||
bool isAggregate(const SymbolInfo* symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// TODO -- Add support for aggregate relations. | ||
errorListener_->addError( | ||
token, "Aggregate relations do not yet support emit sections."); | ||
if (relationData->relation.has_aggregate()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more ditto
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
@@ -316,6 +316,13 @@ bool isRelationEmitDetail(SubstraitPlanParser::Relation_detailContext* ctx) { | |||
nullptr; | |||
} | |||
|
|||
bool isAggregate(const SymbolInfo* symbol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, a comment would definitely be advisable just to give future readers a clue that this is a stopgap and that in the future there should only be RelationType::kAggregate
This adds support for emit in aggregations where there is no more than one grouping section.
This addresses #121 .