-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add row expression optimizer #22927
base: master
Are you sure you want to change the base?
Conversation
822f79f
to
2352cd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pramodsatya : Have done a first round of comments. Will read your tests more closely once you address the comments here.
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxExpr.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/tests/RowExpressionEvaluatorTest.cpp
Outdated
Show resolved
Hide resolved
2352cd4
to
496bbc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @aditi-pandit, addressed the comments. Could you please take another look?
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
void RowExpressionEvaluator::evaluateExpression( | ||
const std::vector<std::unique_ptr<folly::IOBuf>>& body, | ||
proxygen::ResponseHandler* downstream) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, added helper functions and updated accordingly. I could not find a similar utility function in the codebase. Could you please take another look?
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionEvaluator.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/tests/RowExpressionEvaluatorTest.cpp
Outdated
Show resolved
Hide resolved
496bbc5
to
d9c3dcd
Compare
d9c3dcd
to
e19ae51
Compare
e19ae51
to
747ef08
Compare
Summary: prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator. When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`). This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header). This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`. Pull Request resolved: #10657 Reviewed By: amitkdutta Differential Revision: D66044754 Pulled By: pedroerp fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
747ef08
to
10bfc08
Compare
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
json res; | ||
res["@type"] = "call"; | ||
protocol::Signature signature; | ||
std::string exprName = expr->name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a dynamic_pointer_cast to a CallExpr to ensure it is a correct Velox CallExpression be done ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression here can be either a special form or a call expression so a dynamic_pointer_cast will likely fail here. Also, core::CallExpr
derives from core::IExpr
and not exec::Expr
, so I'm not sure if a dynamic_pointer_cast could be done here. Please let me know if I am missing something.
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
10bfc08
to
37ef905
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @aditi-pandit, addressed the review comments. Could you please take another look?
presto-native-execution/presto_cpp/main/expression/RowExpressionOptimizer.cpp
Outdated
Show resolved
Hide resolved
json res; | ||
res["@type"] = "call"; | ||
protocol::Signature signature; | ||
std::string exprName = expr->name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression here can be either a special form or a call expression so a dynamic_pointer_cast will likely fail here. Also, core::CallExpr
derives from core::IExpr
and not exec::Expr
, so I'm not sure if a dynamic_pointer_cast could be done here. Please let me know if I am missing something.
37ef905
to
7425558
Compare
416d768
to
b244385
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya. Have reviewed the RowExpressionConverter class so far.
} else { | ||
std::vector<TypePtr> childTypes; | ||
if (type->isRow()) { | ||
typeSignature = "row("; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a bit cleaner to abstract a std::string variable called "complexTypeString" say, and only set it to "row", "array", "map" in the if condition.
Then post the if condition, when assembling typeSignature for the complex type with children, you can start with
typeSignature = complexTypeString + "("; and then add the typeSignatures for children and then closing parentheses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be sufficient to add this code in presto_cpp/main/types along with the other code like VeloxPlanConversion and FunctionMetadata that is used by the native engine. If you prefer, you can add expression as a separate folder under presto_cpp/main/types
} | ||
|
||
json toVariableReferenceExpression( | ||
const std::shared_ptr<const exec::FieldReference>& fieldReference) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit rename fieldReference to field
|
||
json::array_t getInputExpressions( | ||
const std::vector<std::unique_ptr<folly::IOBuf>>& body) { | ||
std::ostringstream oss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a function in presto_cpp/main/common/Utils.h called extractMessageBody to get a std::string from the body variable. Please use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move RowExpressionConverter and RowExpressionOptimizer to separate files as this file has become very big.
Would be also good to write RowExpressionConverterTest separate from RowExpressionOptimizerTest so that their functionality is clear.
} else { | ||
json::array_t inputArguments = input["arguments"]; | ||
const auto numInputs = exprInputs.size(); | ||
VELOX_USER_CHECK_LE(numInputs, inputArguments.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this LE and not EQ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For coalesce
expressions containing NULLs such as:
coalesce(2 * 3 * unbound_long, 1 - 1, null)
The velox expression does not have NULLs whereas the Presto expression has NULL inputs as well:
Velox expression:
coalesce(presto.default.multiply(6:BIGINT, unbound_long), 0:BIGINT)
Presto expression:
{"@type":"special","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAgAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAwAAAA=="}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"variable","name":"unbound_long","type":"bigint"}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["bigint","bigint"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":18,"line":1}},{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="}],"displayName":"SUBTRACT","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$subtract","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"call","arguments":[{"@type":"constant","sourceLocation":{"column":39,"line":1},"type":"unknown","valueBlock":"AwAAAFJMRQEAAAAKAAAAQllURV9BUlJBWQEAAAABgA=="}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["unknown"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":39,"line":1}}],"form":"COALESCE","returnType":"bigint","sourceLocation":{"column":18,"line":1}}
LE was used here to cover this case. Changed it to EQ check for non-coalesce expressions.
// Presto requires the field form to be in upper case. | ||
std::transform(form.begin(), form.end(), form.begin(), ::toupper); | ||
res["form"] = form; | ||
auto exprInputs = expr->inputs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this variable closer to its use.
std::transform(form.begin(), form.end(), form.begin(), ::toupper); | ||
res["form"] = form; | ||
auto exprInputs = expr->inputs(); | ||
res["arguments"] = json::array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this line after the comment and just above the if condition that involves setting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it inside the else clause closer to it's usage, please let me know if that is fine.
return std::make_shared<protocol::ConstantExpression>(cexpr); | ||
} | ||
|
||
// When the second value in the returned pair is true, the arguments for switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give examples of this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this function to return a struct SwitchFormArguments
with distinct fields for these cases, and added examples in comments. Could you PTAL?
// special form are returned. Otherwise, the switch expression has been | ||
// simplified and the first value corresponding to the switch case that always | ||
// evaluates to true is returned. | ||
std::pair<json::array_t, bool> RowExpressionConverter::getSwitchSpecialFormArgs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is too long. Please break this into smaller more readable pieces.
6297ece
to
7cbdc9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @aditi-pandit. Refactored RowExpressionConverter
and RowExpressionOptimizer
into separate files in presto_cpp/main/types
, and addressed the remaining comments. Could you please take another look?
std::transform(form.begin(), form.end(), form.begin(), ::toupper); | ||
res["form"] = form; | ||
auto exprInputs = expr->inputs(); | ||
res["arguments"] = json::array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it inside the else clause closer to it's usage, please let me know if that is fine.
return std::make_shared<protocol::ConstantExpression>(cexpr); | ||
} | ||
|
||
// When the second value in the returned pair is true, the arguments for switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this function to return a struct SwitchFormArguments
with distinct fields for these cases, and added examples in comments. Could you PTAL?
} else { | ||
json::array_t inputArguments = input["arguments"]; | ||
const auto numInputs = exprInputs.size(); | ||
VELOX_USER_CHECK_LE(numInputs, inputArguments.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For coalesce
expressions containing NULLs such as:
coalesce(2 * 3 * unbound_long, 1 - 1, null)
The velox expression does not have NULLs whereas the Presto expression has NULL inputs as well:
Velox expression:
coalesce(presto.default.multiply(6:BIGINT, unbound_long), 0:BIGINT)
Presto expression:
{"@type":"special","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAgAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAwAAAA=="}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"variable","name":"unbound_long","type":"bigint"}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["bigint","bigint"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":18,"line":1}},{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="}],"displayName":"SUBTRACT","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$subtract","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"call","arguments":[{"@type":"constant","sourceLocation":{"column":39,"line":1},"type":"unknown","valueBlock":"AwAAAFJMRQEAAAAKAAAAQllURV9BUlJBWQEAAAABgA=="}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["unknown"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":39,"line":1}}],"form":"COALESCE","returnType":"bigint","sourceLocation":{"column":18,"line":1}}
LE was used here to cover this case. Changed it to EQ check for non-coalesce expressions.
if (expr->inputs().empty()) { | ||
return toConstantRowExpression(expr); | ||
} else { | ||
// Inputs to constant expressions are constant, eg: divide(1, 2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else
condition was added to handle cases like 0 / 0
that would throw if evaluated in Velox. One of the testcases is:
assertOptimizedMatches("if(false, 1, 0 / 0)", "cast(fail(8, 'ignored failure message') as integer)");
Removing this condition causes the test to fail, so the input json was returned unchanged in this case:
com.facebook.presto.spi.PrestoException: / by zero
at com.facebook.presto.type.IntegerOperators.divide(IntegerOperators.java:108)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:649)
at com.facebook.presto.sql.InterpretedFunctionInvoker.invoke(InterpretedFunctionInvoker.java:109)
at com.facebook.presto.sql.InterpretedFunctionInvoker.invoke(InterpretedFunctionInvoker.java:60)
at com.facebook.presto.sql.planner.RowExpressionInterpreter$Visitor.visitCall(RowExpressionInterpreter.java:285)
at com.facebook.presto.spi.relation.CallExpression.accept(CallExpression.java:131)
at com.facebook.presto.sql.planner.RowExpressionInterpreter.optimize(RowExpressionInterpreter.java:189)
at com.facebook.presto.sql.planner.RowExpressionInterpreter.optimize(RowExpressionInterpreter.java:180)
at com.facebook.presto.sql.relational.RowExpressionOptimizer.optimize(RowExpressionOptimizer.java:52)
at com.facebook.presto.sql.relational.DelegatingRowExpressionOptimizer.optimize(DelegatingRowExpressionOptimizer.java:78)
I have updated the comment with this example. Please let me know whether this check is fine or should be modified.
const exec::ExprPtr& expr, | ||
const json& inputRowExpr); | ||
|
||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was unintentional, modified access spec to private
.
for (const auto& entry : kPrestoOperatorMap) { | ||
veloxToPrestoOperatorMap[entry.second] = entry.first; | ||
} | ||
veloxToPrestoOperatorMap.insert({"cast", "presto.default.$operator$cast"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for the cast
e2e tests, it was added separately so the existing map kPrestoOperatorMap
is not modified. It is also needed only in this function veloxToPrestoOperatorMap()
which returns an inverse map, and is not needed in the original kPrestoOperatorMap
.
Please let me know if this is fine or whether it should also be added in kPrestoOperatorMap
.
7cbdc9c
to
ad0a9e2
Compare
@@ -1546,6 +1546,26 @@ void PrestoServer::registerSidecarEndpoints() { | |||
proxygen::ResponseHandler* downstream) { | |||
http::sendOkResponse(downstream, getFunctionsMetadata()); | |||
}); | |||
rowExpressionOptimizer_ = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to have a single global function for RowExpressionOptimizer->optimize for registration here (It could construct a RowExpressionOptimizer internally though). Lets avoid constructing the object here.
for (const auto& entry : kPrestoOperatorMap) { | ||
veloxToPrestoOperatorMap[entry.second] = entry.first; | ||
} | ||
veloxToPrestoOperatorMap.insert({"cast", "presto.default.$operator$cast"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for specfic test functions, then it might be better to make this change in the test function logic instead of here. Its not particularly server side logic then.
// case that can be simplified since 'a' is a variable here, so the WHEN clauses | ||
// that are required by Presto as switch expression arguments are returned in | ||
// the field 'arguments'. | ||
struct SwitchFormArguments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needn't be defined in the header here. It seems to have local usage only in the cpp file. Please move it there.
}; | ||
|
||
// Helper class to convert Velox expressions of type exec::Expr to their | ||
// corresponding type of RowExpression in Presto. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are protocol expressions right ? That is an external format defined in Presto. Please state that explicitly.
class RowExpressionConverter { | ||
public: | ||
explicit RowExpressionConverter(memory::MemoryPool* pool) | ||
: pool_(pool), veloxToPrestoOperatorMap_(veloxToPrestoOperatorMap()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you are usign veloxToPrestoOperatorMap_ as a member variable ? You can always call the veloxToPrestoOperatorMap() function directly ? Or add a local function that returns a static variable.
resultJson = rowExpressionConverter_.veloxToPrestoRowExpression( | ||
compiledExpr, input[i]); | ||
} else { | ||
// Velox does not evaluate expressions that are non-deterministic during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract the else part in a separate function.
std::unique_ptr<expression::RowExpressionConverter> rowExpressionConverter_; | ||
}; | ||
|
||
TEST_F(RowExpressionConverterTest, constant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add more tests for constants of all simple types and few constants of complex types.
TEST_F(RowExpressionConverterTest, variable) { | ||
auto field = std::make_shared<exec::FieldReference>( | ||
VARCHAR(), std::vector<exec::ExprPtr>{}, "c0"); | ||
auto result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add tests for field references for complex types as well.
// Velox expression to Presto RowExpression conversion for different types of | ||
// expressions can be found in TestDelegatingExpressionOptimizer.java in | ||
// presto-native-sidecar-plugin. | ||
class RowExpressionConverterTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate RowExpressionConverterTest to a separate file as well.
EXPECT_EQ(result, json::parse(expected)); | ||
} | ||
|
||
TEST_F(RowExpressionConverterTest, variable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more tests for call expressions and special forms for RowExpressionConverter as well.
…bator#10657) Summary: prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator. When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`). This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header). This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`. Pull Request resolved: facebookincubator#10657 Reviewed By: amitkdutta Differential Revision: D66044754 Pulled By: pedroerp fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
Description
Introduces capability to optimize and constant fold row expressions in the Presto native sidecar.
Motivation and Context
Please refer to #24126 for full context of the changes, this is as described in RFC-0006.
Test Plan
Unit tests for simple cases are added in
RowExpressionOptimizerTest.cpp
. End to end tests will be added as in the PR #24126.Release Notes