-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function serializeSingleColumn to PrestoVectorSerde #10657
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well. |
Thanks @aditi-pandit, updated the PR description to explain why this change is needed. |
@kevinwilfong : What do you think of these changes ? Please can you help with this review. |
Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.
|
Hi @pedroerp, thanks for the suggestion. Initially I did try to use the Using Please let me know if I am missing something. |
@pramodsatya thank you for clarifying. Could we instead provide an option in PrestoOptions to make the serializer skip the header instead? I think what I'm slightly concerned is moving VectorStream and all that stuff to the API (header) of this class, making the client more verbose, and the API a bit less intuitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @aditi-pandit, addressed the comments and modified the existing unit-test for deserializeSingleColumn
to test serializeSingleColumn
as well. Could you please take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya. Looks good minus the comments.
3d90559
to
bc9f0c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya. Minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…bator#10657) Summary: prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept `RowExpression`s from `NativeSidecarExpressionInterpreter`, optimize them if possible (rewrite special form expressions), and compile the `RowExpression` to a velox expression with constant folding enabled. This velox expression is then converted back to a `RowExpression` and returned by the sidecar to the coordinator. When the constant folded velox expression is of type `velox::exec::ConstantExpr`, we need to return a `RowExpression` of type `ConstantExpression`. This requires us to serialize the constant value from `velox::exec::ConstantExpr` into `protocol::ConstantExpression::valueBlock`. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from `Base64Util.cpp::readBlock`). This PR adds a new function, `serializeSingleColumn`, to `PrestoVectorSerde`. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header). This function is not added to `PrestoBatchVectorSerializer` alongside the existing `serialize` function since that would require adding it as a virtual function in `BatchVectorSerializer` as well, and this is not desirable since the `PrestoPage` format is not relevant in this base class. There is an existing function `deserializeSingleColumn` in `PrestoVectorSerde` which is used to deserialize data from a single column, since `serializeSingleColumn` performs the inverse operation to this function, it is added alongside it in `PrestoVectorSerde`. Pull Request resolved: facebookincubator#10657 Reviewed By: amitkdutta Differential Revision: D66044754 Pulled By: pedroerp fbshipit-source-id: e509605067920f8207e5a3fa67552badc2ce0eba
prestodb/presto#23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). prestodb/presto#22927 implements a proxygen endpoint to accept
RowExpression
s fromNativeSidecarExpressionInterpreter
, optimize them if possible (rewrite special form expressions), and compile theRowExpression
to a velox expression with constant folding enabled. This velox expression is then converted back to aRowExpression
and returned by the sidecar to the coordinator.When the constant folded velox expression is of type
velox::exec::ConstantExpr
, we need to return aRowExpression
of typeConstantExpression
. This requires us to serialize the constant value fromvelox::exec::ConstantExpr
intoprotocol::ConstantExpression::valueBlock
. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic fromBase64Util.cpp::readBlock
).This PR adds a new function,
serializeSingleColumn
, toPrestoVectorSerde
. This can be used to serialize input data from vectors containing a single element into a single PrestoPage column format (without the PrestoPage header).This function is not added to
PrestoBatchVectorSerializer
alongside the existingserialize
function since that would require adding it as a virtual function inBatchVectorSerializer
as well, and this is not desirable since thePrestoPage
format is not relevant in this base class. There is an existing functiondeserializeSingleColumn
inPrestoVectorSerde
which is used to deserialize data from a single column, sinceserializeSingleColumn
performs the inverse operation to this function, it is added alongside it inPrestoVectorSerde
.