Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function serializeSingleColumn to PrestoVectorSerde #10657

Closed
wants to merge 1 commit into from

Conversation

pramodsatya
Copy link
Collaborator

@pramodsatya pramodsatya commented Aug 5, 2024

prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept RowExpressions from NativeSidecarExpressionInterpreter, optimize them if possible (rewrite special form expressions), and compile the RowExpression to a velox expression with constant folding enabled. This velox expression is then converted back to a RowExpression and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type velox::exec::ConstantExpr, we need to return a RowExpression of type ConstantExpression. This requires us to serialize the constant value from velox::exec::ConstantExpr into protocol::ConstantExpression::valueBlock. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from Base64Util.cpp::readBlock).

This PR adds a new function, serializeSingleColumn, to PrestoVectorSerde. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).
This function is not added to PrestoBatchVectorSerializer alongside the existing serialize function since that would require adding it as a virtual function in BatchVectorSerializer as well, and this is not desirable since the PrestoPage format is not relevant in this base class. There is an existing function deserializeSingleColumn in PrestoVectorSerde which is used to deserialize data from a single column, since serializeSingleColumn performs the inverse operation to this function, it is added alongside it in PrestoVectorSerde.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2024
Copy link

netlify bot commented Aug 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6e3d6dd
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67358db55634190008ca6af1

@aditi-pandit
Copy link
Collaborator

@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.

@pramodsatya
Copy link
Collaborator Author

@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.

Thanks @aditi-pandit, updated the PR description to explain why this change is needed.

@aditi-pandit
Copy link
Collaborator

@kevinwilfong : What do you think of these changes ? Please can you help with this review.

@pedroerp
Copy link
Contributor

Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.

https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43

@pramodsatya
Copy link
Collaborator Author

Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.

https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43

Hi @pedroerp, thanks for the suggestion. Initially I did try to use the serialize function in PrestoBatchVectorSerializer, which serializes the velox vector in PrestoPage format, to serialize the constant encoded vector from exec::ConstantExpr into protocol::Block::data (for constructing the protocol::ConstantExpression::valueBlock). However, this was found to result in a deserialization error on the Presto coordinator. Upon further investigation, we noticed this was because the Presto coordinator expects just the serialized column from PrestoPage, without the header, in the field protocol::ConstantExpression::valueBlock::data.
We also noticed that in PrestoToVeloxExpr.cpp, the valueBlock from Presto protocol::ConstantExpression is converted to velox vector by deserializing just the column from PrestoPage (without the header) in function VeloxExprConverter::getConstantValue() -> deserializeSingleColumn (we are attempting to do an inverse operation to this function when constructing a Presto ConstantExpression from velox ConstantExpr here).

Using serializeColumn to construct valueBlock in protocol::ConstantExpression fixed the error (ref: function RowExpressionEvaluator::getConstantRowExpression in RowExpressionEvaluator.cpp) and we were able to test it end to end. We were also able to solve the serialization error by just skipping over the fixed number of bytes occupied by the header in PrestoPage, but since there already is a serializeColumn function present in the codebase, we figured it would be better to expose it for use via velox.

Please let me know if I am missing something.

@pedroerp
Copy link
Contributor

@pramodsatya thank you for clarifying. Could we instead provide an option in PrestoOptions to make the serializer skip the header instead?

I think what I'm slightly concerned is moving VectorStream and all that stuff to the API (header) of this class, making the client more verbose, and the API a bit less intuitive.

velox/serializers/PrestoSerializer.cpp Outdated Show resolved Hide resolved
velox/serializers/PrestoSerializer.cpp Outdated Show resolved Hide resolved
velox/serializers/PrestoSerializer.cpp Show resolved Hide resolved
Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the comments and modified the existing unit-test for deserializeSingleColumn to test serializeSingleColumn as well. Could you please take another look?

velox/serializers/PrestoSerializer.cpp Show resolved Hide resolved
velox/serializers/PrestoSerializer.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Looks good minus the comments.

velox/serializers/tests/PrestoSerializerTest.cpp Outdated Show resolved Hide resolved
velox/serializers/PrestoSerializer.cpp Show resolved Hide resolved
@pramodsatya pramodsatya force-pushed the ser_col branch 2 times, most recently from 3d90559 to bc9f0c0 Compare November 5, 2024 07:42
@pramodsatya pramodsatya changed the title Add serializeColumn to PrestoVectorSerde Add function serializeSingleColumn to PrestoVectorSerde Nov 5, 2024
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Minor comment.

velox/serializers/tests/PrestoSerializerTest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

@pedroerp pedroerp added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 6, 2024
@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in 00e8149.

Copy link

Conbench analyzed the 1 benchmark run on commit 00e81494.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
…bator#10657)

Summary:
prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`).

This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).
This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`.

Pull Request resolved: facebookincubator#10657

Reviewed By: amitkdutta

Differential Revision: D66044754

Pulled By: pedroerp

fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants