-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add native plan checker and native endpoint for Velox plan conversion #23596
Conversation
Making this PR a draft as recent discussions will likely require a change to send the |
5b6bb75
to
9396c95
Compare
d5941a7
to
9375271
Compare
I have separated out the first part of this, adding the session property to enable building and validating the plan eagerly at #23649 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Changes to SPI to support custom PlanChecker](https://github.com/prestodb/presto/pull/23596/commits/27b0c2ef2b4758de723712d8ad0cc142d9757b02)
✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to Java Presto main and server to support custom PlanChecker SPI
❓
@@ -371,6 +377,8 @@ else if (serverConfig.isCoordinator()) { | |||
driftClientBinder(binder).bindDriftClient(ThriftServerInfoClient.class, ForNodeManager.class) | |||
.withAddressSelector(((addressSelectorBinder, annotation, prefix) -> | |||
addressSelectorBinder.bind(AddressSelector.class).annotatedWith(annotation).to(FixedAddressSelector.class))); | |||
// NodeManager instance for plugins to use | |||
binder.bind(NodeManager.class).to(ConnectorAwareNodeManager.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than inject this, I would just construct it where needed.
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
...cebook/presto/sql/planner/sanity/plancheckerprovidermanagers/PlanCheckerProviderManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add plan checker provider and plugin for NativePlanChecker
❓
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
...n/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanCheckerProvider.java
Outdated
Show resolved
Hide resolved
|
||
// Create static taskId and empty TableWriteInfo needed for plan conversion | ||
protocol::TaskId taskId = "velox-plan-conversion.0.0.0"; | ||
auto tableWriteInfo = std::make_shared<protocol::TableWriteInfo>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this causes a problem for create table, need to address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be a problem anymore, correct me if I'm missing anything here
I just encountered this again, so looks like there is still an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @tdcmeehan and since TableWriteInfo
is not built until after the planning stage, we need to have an alternative to handle fragments with a TableWriterNode
. Some alternatives discussed:
- Adding a field to the TableWriteInfo to indicate that the converter should proceed and skip the required fields, like
ExecutionWriterTarget
- Make a dummy TableWriteInfo with the fields populated enough to pass the conversion
- Skip the TableWriteNode portion of the fragment, the source node can continue with the conversion
All of these options would result in an incomplete conversion of a fragment with a TableWriteNode, but since that is not available at the time the plan validation happens, there isn't much we can do. Here I implemented option (3) which is the least intrusive to the plan conversion.
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
if (error.empty()) { | ||
http::sendOkResponse(downstream, json(R"({ "status": "ok" })")); | ||
} else { | ||
http::sendErrorResponse(downstream, json(R"({ "status": "error", "message": ")" + error + R"(")})")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this look like on the user side. e.g. can we report different kinds of error messages depending on what the failure was? For regular tasks, i think we send and OkResponse for most kinds of failures, and put the error information in the TaskInfo. Maybe we should do something similar here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the response however you think is best. I didn't put too much thought into the message since we discussed that this would eventually be sending the converted plan back. Are you saying if the plan validation fails you would send an OkResponse and not an ErrorResponse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think we should follow the model for how errors get passed along in general for native workers, which i think is an OkResponse with the error information in the TaskInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel if I understand correctly, you are asking for the response to be a TaskInfo
message with ExecutionFailureInfo
to be populated if the conversion fails?
On the Java side, TaskInfo
and many of the related classes are not part of the SPI, so things will need to be moved around. Is that what we want here?
On the C++ side, there is lots of information in the TaskInfo
struct, I'm not sure how much is really useful in this case and if it needs to be populated with the TaskManager
and have a corresponding PrestoTask
.
Would it make sense to define a simple response message? cc @tdcmeehan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the difference with tasks is that 4xx is returned when we can not create the task. What is being reported by the TaskInfo is whether or not the already successfully created task ended up failing. That should be returned as a 200, because the call to check on the task status itself succeeded, and it is merely reporting the underlying failure. When we go to update or create the task, when 4xx is returned, we are returning that only when the task cannot be created.
In this case, what is being returned is the converted plan. That should probably return code 422, because we are returning whether or not the conversion failed. Like all 4xx error codes, this indicates the problem is with the input. It's similar to a task update or creation failing.
String responseBody = response.body() != null ? response.body().string() : "{}"; | ||
LOG.error("Native plan checker failed with code: %d, response: %s", response.code(), responseBody); | ||
if (config.isQueryFailOnError()) { | ||
throw new PrestoException(QUERY_REJECTED, "Query failed by native plan checker with code: " + response.code() + ", response: " + responseBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUERY_REJECTED seems like the wrong error code. We should propagate the error from the side car that caused the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean the response from the side car should have an error code we use here instead of QUERY_REJECTED
? The error message is propagated and shown to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error that the user sees shouldn't be QUERY_REJECTED. It should be whatever error caused the plan checking to fail.
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
try (Response response = httpClient.newCall(request).execute()) { | ||
if (!response.isSuccessful()) { | ||
String responseBody = response.body() != null ? response.body().string() : "{}"; | ||
LOG.error("Native plan checker failed with code: %d, response: %s", response.code(), responseBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we need to log the error if we're failing the query with that error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine, I just want to make sure it's clear that if failed due to native plan checker, so if the exception message contains that and is logged, then that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't, it should be changed so that it does. It's much more helpful to have the failure reason show up in the query error than to have to scour the coordinator logs for any relevant error messages.
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanCheckerProvider.java
Outdated
Show resolved
Hide resolved
9375271
to
873cab6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, new local doc build, looks good. Thanks!
Adding new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: #23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
Apologies for the delay in response @tdcmeehan and @rschlussel , I was focused on getting the other pr ready. I think I addressed most of this feedback, just a couple remaining things, and I'll do some testing then remove from draft soon. |
873cab6
to
7d39e04
Compare
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
@BryanCutler : Also please prefix the title of your second commit with [native] as well. |
13fdf8e
to
b9280be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more minor comments.
@@ -75,6 +78,16 @@ void sendErrorResponse( | |||
.sendWithEOM(); | |||
} | |||
|
|||
void sendErrorJsonResponse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is both a json and thrift for OkResponse, but the error has only json. Should we consider adding a thrift response as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what this means, but to be clear, we'll only send JSON back from this endpoint, never Thrift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the error response matters only if the native side-car will work with both JSON and Thrift.
Prestissimo has support for both formats for the control API. Its likely the thrift usage is legacy.
But if its not too much work, lets add a Thrift error response version for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit , just so I understand what you're asking for a little better - are you asking for an additional API in HttpServer that can send an error formatted as Thrift, e.g. void sendErrorThriftResponse(proxygen::ResponseHandler* downstream,const std::string& body)
or are you also asking that the /v1/velox/plan
endpoint could also send responses back in Thrift format?
If it's the latter, I think there would have to be some additional conversion of the message to Thrift and it doesn't sound like it would be useful since only JSON is being used for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler : It is the former.
It's not worth changing /v1/velox/plan to send responses in Thrift format.
b9280be
to
f2e8359
Compare
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#include "VeloxPlanConversion.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpp files always include the complete path of the headers even if in the local directory.
f2e8359
to
78df0d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BryanCutler for working through the iterations.
aade9c4
78df0d9
to
aade9c4
Compare
This adds a provider for a native plan checker that will send a plan fragment to the native sidecar where it is validated by performing a conversion to a Velox plan. If the conversion succeeds the query will continue, if it fails then the query will fail with an error from the native sidecar. The provider is added to the native sidecar plugin and is enabled with the config `native-plan-checker.plan-validation-enabled=true` from filename `etc/plan-checker-providers/native-plan-checker.properties`. See also: prestodb#23649 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds an endpoint to the native Presto server that will convert a Presto plan fragment to Velox. If the conversion is successful, the server will send an ok response. If it fails, the server will send an error response with a 422 status code as unprocessable. The error message will contain a PlanConversionFailureInfo struct with error type, code and message. See also prestodb#23649 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
aade9c4
to
5312038
Compare
Thanks @tdcmeehan @rschlussel and @aditi-pandit for reviewing! |
The previous commit to add the native plan checker mistakenly removed the binding for creating a SessionPropertyManager. This adds it back into PrestoServer. See prestodb#23596
The previous commit to add the native plan checker mistakenly removed the loading of SessionPropertyManager. This adds it back the call to loadSessionPropertyProviders into PrestoServer. See prestodb#23596
The previous commit to add the native plan checker mistakenly removed the loading of SessionPropertyManager. This adds it back the call to loadSessionPropertyProviders into PrestoServer. See #23596
This has broken our usage of the plan checker spi because the method to get the plan checkers was moved from Plugin to CoordinatorPlugin, but this fix prestodb/presto-maven-plugin#19 hasn't been released yet. I may revert this change temporarily depending on how quickly that release is ready. |
@rschlussel can you add the service descriptor yourself manually? The |
I see the native sidecar module uses a dummy plugin to work around the issue (that's needed in addition to adding the meta-inf resources file). We can do the same. |
@rschlussel yes, unfortunately that would be required if you were loading the plugin as a part of a multi-module build and loading the plugin by reading its Pom file. If the plugin were loaded as a JAR then it wouldn't be required. |
Description
This adds a provider for a native plan checker that will send a plan fragment to the native sidecar where it is validated by performing a conversion to a Velox plan. It is added to the native sidecar plugin and is enabled with the config
native-plan-checker.plan-validation-enabled=true
from filenameetc/plan-checker-providers/native-plan-checker.properties
.A new endpoint is added to the native worker to accept the plan node fragment and return an ok response if the conversion is successful and an error response with an HTTP code 422 if failed, along with the failure information.
This follows the addition from #23649 so that when
eager-plan-validation-enabled=true
then a query can be validated while queued for correctness on a native worker.RFC for this addition has been discussed and merged at https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
Motivation and Context
This is a useful addition to help fail queries that are incompatible with Velox before execution begins and cluster resources have been allocated.
Impact
Native plan checker enabled with config
native-plan-checker.plan-validation-enabled=true
from filenameetc/plan-checker-providers/native-plan-checker.properties
. The native plan checker is disabled by default.Test Plan
Added unit tests for Java and C++, as well as manual testing of queries that currently fail on a native worker.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.