Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add row expression optimizer #22927

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented Jun 5, 2024

Description

Introduces capability to optimize and constant fold row expressions in the Presto native sidecar.

Motivation and Context

Please refer to #24126 for full context of the changes, this is as described in RFC-0006.

Test Plan

Unit tests for simple cases are added in RowExpressionOptimizerTest.cpp. End to end tests will be added as in the PR #24126.

Release Notes

== NO RELEASE NOTE ==

@pramodsatya pramodsatya changed the title [WIP] Add proxygen endpoint for expression evaluation [native] Add proxygen endpoint for expression evaluation Aug 5, 2024
@tdcmeehan tdcmeehan self-assigned this Aug 5, 2024
@pramodsatya pramodsatya force-pushed the expr_endpt branch 2 times, most recently from 822f79f to 2352cd4 Compare September 11, 2024 14:52
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya : Have done a first round of comments. Will read your tests more closely once you address the comments here.

Copy link
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the comments. Could you please take another look?

void RowExpressionEvaluator::evaluateExpression(
const std::vector<std::unique_ptr<folly::IOBuf>>& body,
proxygen::ResponseHandler* downstream) {
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, added helper functions and updated accordingly. I could not find a similar utility function in the codebase. Could you please take another look?

facebook-github-bot pushed a commit to facebookincubator/velox that referenced this pull request Nov 16, 2024
Summary:
prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`).

This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).
This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`.

Pull Request resolved: #10657

Reviewed By: amitkdutta

Differential Revision: D66044754

Pulled By: pedroerp

fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
json res;
res["@type"] = "call";
protocol::Signature signature;
std::string exprName = expr->name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a dynamic_pointer_cast to a CallExpr to ensure it is a correct Velox CallExpression be done ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression here can be either a special form or a call expression so a dynamic_pointer_cast will likely fail here. Also, core::CallExpr derives from core::IExpr and not exec::Expr, so I'm not sure if a dynamic_pointer_cast could be done here. Please let me know if I am missing something.

@pramodsatya pramodsatya changed the title [native] Add proxygen endpoint for expression evaluation [native] Add row expression optimizer Dec 10, 2024
Copy link
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the review comments. Could you please take another look?

json res;
res["@type"] = "call";
protocol::Signature signature;
std::string exprName = expr->name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression here can be either a special form or a call expression so a dynamic_pointer_cast will likely fail here. Also, core::CallExpr derives from core::IExpr and not exec::Expr, so I'm not sure if a dynamic_pointer_cast could be done here. Please let me know if I am missing something.

@pramodsatya pramodsatya marked this pull request as ready for review December 10, 2024 17:49
@pramodsatya pramodsatya requested a review from a team as a code owner December 10, 2024 17:49
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 10, 2024
@prestodb-ci prestodb-ci requested review from a team and imjalpreet and removed request for a team December 10, 2024 17:49
@pramodsatya pramodsatya force-pushed the expr_endpt branch 4 times, most recently from 416d768 to b244385 Compare December 11, 2024 17:28
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Have reviewed the RowExpressionConverter class so far.

} else {
std::vector<TypePtr> childTypes;
if (type->isRow()) {
typeSignature = "row(";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a bit cleaner to abstract a std::string variable called "complexTypeString" say, and only set it to "row", "array", "map" in the if condition.

Then post the if condition, when assembling typeSignature for the complex type with children, you can start with

typeSignature = complexTypeString + "("; and then add the typeSignatures for children and then closing parentheses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be sufficient to add this code in presto_cpp/main/types along with the other code like VeloxPlanConversion and FunctionMetadata that is used by the native engine. If you prefer, you can add expression as a separate folder under presto_cpp/main/types

}

json toVariableReferenceExpression(
const std::shared_ptr<const exec::FieldReference>& fieldReference) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit rename fieldReference to field


json::array_t getInputExpressions(
const std::vector<std::unique_ptr<folly::IOBuf>>& body) {
std::ostringstream oss;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a function in presto_cpp/main/common/Utils.h called extractMessageBody to get a std::string from the body variable. Please use it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move RowExpressionConverter and RowExpressionOptimizer to separate files as this file has become very big.

Would be also good to write RowExpressionConverterTest separate from RowExpressionOptimizerTest so that their functionality is clear.

} else {
json::array_t inputArguments = input["arguments"];
const auto numInputs = exprInputs.size();
VELOX_USER_CHECK_LE(numInputs, inputArguments.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this LE and not EQ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For coalesce expressions containing NULLs such as:

coalesce(2 * 3 * unbound_long, 1 - 1, null)

The velox expression does not have NULLs whereas the Presto expression has NULL inputs as well:
Velox expression:

coalesce(presto.default.multiply(6:BIGINT, unbound_long), 0:BIGINT)

Presto expression:

{"@type":"special","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAgAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAwAAAA=="}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"variable","name":"unbound_long","type":"bigint"}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["bigint","bigint"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":18,"line":1}},{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="}],"displayName":"SUBTRACT","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$subtract","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"call","arguments":[{"@type":"constant","sourceLocation":{"column":39,"line":1},"type":"unknown","valueBlock":"AwAAAFJMRQEAAAAKAAAAQllURV9BUlJBWQEAAAABgA=="}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["unknown"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":39,"line":1}}],"form":"COALESCE","returnType":"bigint","sourceLocation":{"column":18,"line":1}}

LE was used here to cover this case. Changed it to EQ check for non-coalesce expressions.

// Presto requires the field form to be in upper case.
std::transform(form.begin(), form.end(), form.begin(), ::toupper);
res["form"] = form;
auto exprInputs = expr->inputs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this variable closer to its use.

std::transform(form.begin(), form.end(), form.begin(), ::toupper);
res["form"] = form;
auto exprInputs = expr->inputs();
res["arguments"] = json::array();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line after the comment and just above the if condition that involves setting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it inside the else clause closer to it's usage, please let me know if that is fine.

return std::make_shared<protocol::ConstantExpression>(cexpr);
}

// When the second value in the returned pair is true, the arguments for switch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give examples of this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this function to return a struct SwitchFormArguments with distinct fields for these cases, and added examples in comments. Could you PTAL?

// special form are returned. Otherwise, the switch expression has been
// simplified and the first value corresponding to the switch case that always
// evaluates to true is returned.
std::pair<json::array_t, bool> RowExpressionConverter::getSwitchSpecialFormArgs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too long. Please break this into smaller more readable pieces.

@pramodsatya pramodsatya force-pushed the expr_endpt branch 2 times, most recently from 6297ece to 7cbdc9c Compare January 6, 2025 23:31
Copy link
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @aditi-pandit. Refactored RowExpressionConverter and RowExpressionOptimizer into separate files in presto_cpp/main/types, and addressed the remaining comments. Could you please take another look?

std::transform(form.begin(), form.end(), form.begin(), ::toupper);
res["form"] = form;
auto exprInputs = expr->inputs();
res["arguments"] = json::array();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it inside the else clause closer to it's usage, please let me know if that is fine.

return std::make_shared<protocol::ConstantExpression>(cexpr);
}

// When the second value in the returned pair is true, the arguments for switch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this function to return a struct SwitchFormArguments with distinct fields for these cases, and added examples in comments. Could you PTAL?

} else {
json::array_t inputArguments = input["arguments"];
const auto numInputs = exprInputs.size();
VELOX_USER_CHECK_LE(numInputs, inputArguments.size());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For coalesce expressions containing NULLs such as:

coalesce(2 * 3 * unbound_long, 1 - 1, null)

The velox expression does not have NULLs whereas the Presto expression has NULL inputs as well:
Velox expression:

coalesce(presto.default.multiply(6:BIGINT, unbound_long), 0:BIGINT)

Presto expression:

{"@type":"special","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAgAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAwAAAA=="}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"variable","name":"unbound_long","type":"bigint"}],"displayName":"MULTIPLY","functionHandle":{"@type":"$static","signature":{"argumentTypes":["bigint","bigint"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$multiply","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":18,"line":1}},{"@type":"call","arguments":[{"@type":"call","arguments":[{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="},{"@type":"constant","type":"integer","valueBlock":"CQAAAElOVF9BUlJBWQEAAAAAAQAAAA=="}],"displayName":"SUBTRACT","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer","integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$subtract","returnType":"integer","typeVariableConstraints":[],"variableArity":false}},"returnType":"integer"}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["integer"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint"},{"@type":"call","arguments":[{"@type":"constant","sourceLocation":{"column":39,"line":1},"type":"unknown","valueBlock":"AwAAAFJMRQEAAAAKAAAAQllURV9BUlJBWQEAAAABgA=="}],"displayName":"CAST","functionHandle":{"@type":"$static","signature":{"argumentTypes":["unknown"],"kind":"SCALAR","longVariableConstraints":[],"name":"presto.default.$operator$cast","returnType":"bigint","typeVariableConstraints":[],"variableArity":false}},"returnType":"bigint","sourceLocation":{"column":39,"line":1}}],"form":"COALESCE","returnType":"bigint","sourceLocation":{"column":18,"line":1}}

LE was used here to cover this case. Changed it to EQ check for non-coalesce expressions.

if (expr->inputs().empty()) {
return toConstantRowExpression(expr);
} else {
// Inputs to constant expressions are constant, eg: divide(1, 2).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else condition was added to handle cases like 0 / 0 that would throw if evaluated in Velox. One of the testcases is:

assertOptimizedMatches("if(false, 1, 0 / 0)", "cast(fail(8, 'ignored failure message') as integer)");

Removing this condition causes the test to fail, so the input json was returned unchanged in this case:

com.facebook.presto.spi.PrestoException: / by zero
	at com.facebook.presto.type.IntegerOperators.divide(IntegerOperators.java:108)
	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:649)
	at com.facebook.presto.sql.InterpretedFunctionInvoker.invoke(InterpretedFunctionInvoker.java:109)
	at com.facebook.presto.sql.InterpretedFunctionInvoker.invoke(InterpretedFunctionInvoker.java:60)
	at com.facebook.presto.sql.planner.RowExpressionInterpreter$Visitor.visitCall(RowExpressionInterpreter.java:285)
	at com.facebook.presto.spi.relation.CallExpression.accept(CallExpression.java:131)
	at com.facebook.presto.sql.planner.RowExpressionInterpreter.optimize(RowExpressionInterpreter.java:189)
	at com.facebook.presto.sql.planner.RowExpressionInterpreter.optimize(RowExpressionInterpreter.java:180)
	at com.facebook.presto.sql.relational.RowExpressionOptimizer.optimize(RowExpressionOptimizer.java:52)
	at com.facebook.presto.sql.relational.DelegatingRowExpressionOptimizer.optimize(DelegatingRowExpressionOptimizer.java:78)

I have updated the comment with this example. Please let me know whether this check is fine or should be modified.

const exec::ExprPtr& expr,
const json& inputRowExpr);

protected:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this was unintentional, modified access spec to private.

for (const auto& entry : kPrestoOperatorMap) {
veloxToPrestoOperatorMap[entry.second] = entry.first;
}
veloxToPrestoOperatorMap.insert({"cast", "presto.default.$operator$cast"});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for the cast e2e tests, it was added separately so the existing map kPrestoOperatorMap is not modified. It is also needed only in this function veloxToPrestoOperatorMap() which returns an inverse map, and is not needed in the original kPrestoOperatorMap.
Please let me know if this is fine or whether it should also be added in kPrestoOperatorMap.

@@ -1546,6 +1546,26 @@ void PrestoServer::registerSidecarEndpoints() {
proxygen::ResponseHandler* downstream) {
http::sendOkResponse(downstream, getFunctionsMetadata());
});
rowExpressionOptimizer_ =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to have a single global function for RowExpressionOptimizer->optimize for registration here (It could construct a RowExpressionOptimizer internally though). Lets avoid constructing the object here.

for (const auto& entry : kPrestoOperatorMap) {
veloxToPrestoOperatorMap[entry.second] = entry.first;
}
veloxToPrestoOperatorMap.insert({"cast", "presto.default.$operator$cast"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for specfic test functions, then it might be better to make this change in the test function logic instead of here. Its not particularly server side logic then.

// case that can be simplified since 'a' is a variable here, so the WHEN clauses
// that are required by Presto as switch expression arguments are returned in
// the field 'arguments'.
struct SwitchFormArguments {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needn't be defined in the header here. It seems to have local usage only in the cpp file. Please move it there.

};

// Helper class to convert Velox expressions of type exec::Expr to their
// corresponding type of RowExpression in Presto.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are protocol expressions right ? That is an external format defined in Presto. Please state that explicitly.

class RowExpressionConverter {
public:
explicit RowExpressionConverter(memory::MemoryPool* pool)
: pool_(pool), veloxToPrestoOperatorMap_(veloxToPrestoOperatorMap()) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you are usign veloxToPrestoOperatorMap_ as a member variable ? You can always call the veloxToPrestoOperatorMap() function directly ? Or add a local function that returns a static variable.

resultJson = rowExpressionConverter_.veloxToPrestoRowExpression(
compiledExpr, input[i]);
} else {
// Velox does not evaluate expressions that are non-deterministic during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract the else part in a separate function.

std::unique_ptr<expression::RowExpressionConverter> rowExpressionConverter_;
};

TEST_F(RowExpressionConverterTest, constant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add more tests for constants of all simple types and few constants of complex types.

TEST_F(RowExpressionConverterTest, variable) {
auto field = std::make_shared<exec::FieldReference>(
VARCHAR(), std::vector<exec::ExprPtr>{}, "c0");
auto result =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add tests for field references for complex types as well.

// Velox expression to Presto RowExpression conversion for different types of
// expressions can be found in TestDelegatingExpressionOptimizer.java in
// presto-native-sidecar-plugin.
class RowExpressionConverterTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate RowExpressionConverterTest to a separate file as well.

EXPECT_EQ(result, json::parse(expected));
}

TEST_F(RowExpressionConverterTest, variable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more tests for call expressions and special forms for RowExpressionConverter as well.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
…bator#10657)

Summary:
prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`).

This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).
This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`.

Pull Request resolved: facebookincubator#10657

Reviewed By: amitkdutta

Differential Revision: D66044754

Pulled By: pedroerp

fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants