Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set HTTP timeouts as downstream errors #1063

Merged
merged 6 commits into from
Aug 28, 2024

Conversation

wbrowne
Copy link
Member

@wbrowne wbrowne commented Aug 27, 2024

Fixes #1064

Copy link
Contributor

@marefr marefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need similar checks around here as well?

if isCancelledError(r.Error) {

backend/error_source.go Outdated Show resolved Hide resolved
backend/error_source.go Outdated Show resolved Hide resolved
@wbrowne
Copy link
Member Author

wbrowne commented Aug 27, 2024

We might need similar checks around here as well?

if isCancelledError(r.Error) {

Yeah so what should the behaviour be in this case:

  • If HTTP timeout error for DataResponse, set downstream error source and return RequestStatusError?
  • If HTTP timeout error from adapter, return RequestStatusError?

@wbrowne wbrowne changed the title Add timeout checks for downstream errors Add HTTP timeout checks for downstream errors Aug 27, 2024
@marefr
Copy link
Contributor

marefr commented Aug 27, 2024

  • If HTTP timeout error for DataResponse, set downstream error source and return RequestStatusError?

Yeah I guess so

  • If HTTP timeout error from adapter, return RequestStatusError?

Yes and let the error wrapper handle that

@wbrowne wbrowne changed the title Add HTTP timeout checks for downstream errors Set HTTP timeouts as downstream errors Aug 27, 2024
@wbrowne wbrowne self-assigned this Aug 27, 2024
@wbrowne wbrowne requested a review from marefr August 27, 2024 10:27
backend/error_source_test.go Show resolved Hide resolved
backend/error_source_test.go Show resolved Hide resolved
@wbrowne wbrowne marked this pull request as ready for review August 27, 2024 12:22
@wbrowne wbrowne requested a review from a team as a code owner August 27, 2024 12:22
@wbrowne wbrowne requested review from marefr, oshirohugo and xnyo and removed request for a team August 27, 2024 12:22
Copy link
Contributor

@marefr marefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

backend/request_status.go Outdated Show resolved Hide resolved
Co-authored-by: Giuseppe Guerra <giuseppe.guerra@grafana.com>
Copy link
Member

@xnyo xnyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@wbrowne
Copy link
Member Author

wbrowne commented Aug 27, 2024

@ivanahuckova Would you like to take this for a spin to make sure it's resolving the issue you've experienced?

Copy link
Member

@ivanahuckova ivanahuckova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm ✨ THANK YOU!!

image

Could we also add this to api server instrumentation? 😇

@wbrowne wbrowne merged commit e1e82c0 into main Aug 28, 2024
3 checks passed
@wbrowne wbrowne deleted the http-timeout-downstream-errorsource branch August 28, 2024 12:11
@wbrowne
Copy link
Member Author

wbrowne commented Aug 28, 2024

@ivanahuckova Thank you! I think we may have also solved this one with this change too BTW:

  • mark DeadlineExceeded desc = context deadline exceeded as downstream errors

Let me know if it's still an issue if you can 🙏

Could we also add this to api server instrumentation? 😇

If the error is returned from the handler, this is the classic problem of not having enough information on the other side since it's stripped over the wire. IIUC we'd need to add more to the plugin response (either directly to the protobuf) or some gRPC metadata that would indicate the appropriate status source.

For a data response error though, maybe we should be setting the error source on the response over here if it's not already set IE

if isCancelledError(r.Error) {
+  r.ErrorSource = ErrorSourceDownstream
  hasCancelledError = true
}
if isHTTPTimeoutError(r.Error) {
+  r.ErrorSource = ErrorSourceDownstream
  hasHTTPTimeoutError = true
}

It's probably safer to only do this if not already set. But that should then use the appropriate status source (would need to confirm though).

The more we touch here though in its current state, the more brittle it all feels 😬 Instrumenting the same thing across client and server is tricky.

Anything to add or correct @marefr?

@ivanahuckova
Copy link
Member

ivanahuckova commented Aug 28, 2024

mark DeadlineExceeded desc = context deadline exceeded as downstream errors
Let me know if it's still an issue if you can 🙏

I will try on friday

The more we touch here though in its current state, the more brittle it all feels 😬 Instrumenting the same thing across client and server is tricky.

Yeah, I agree. One solution could be to fully focus on updating the SDK in all plugins and skip the API server metrics for now. This would help resolve all the issues. 🤔 I initially wanted us to use the API server metrics until all plugins had updated their SDKs, but I've been realizing that using it temporarily requires too much effort. Instead, it seems we should just concentrate on updating the SDK in the plugins.

@marefr
Copy link
Contributor

marefr commented Aug 29, 2024

For a data response error though, maybe we should be setting the error source on the response over here if it's not already set IE

Isn't it what we do here

if hasCancelledError {
if err := WithDownstreamErrorSource(ctx); err != nil {
return RequestStatusError, fmt.Errorf("failed to set downstream status source: %w", errors.Join(innerErr, err))
}
return RequestStatusCancelled, nil
}
if hasHTTPTimeoutError {
if err := WithDownstreamErrorSource(ctx); err != nil {
return RequestStatusError, fmt.Errorf("failed to set downstream status source: %w", errors.Join(innerErr, err))
}
return RequestStatusError, nil
}
// A plugin error has higher priority than a downstream error,
// so set to downstream only if there's no plugin error
if hasDownstreamError && !hasPluginError {
if err := WithDownstreamErrorSource(ctx); err != nil {
return RequestStatusError, fmt.Errorf("failed to set downstream status source: %w", errors.Join(innerErr, err))
}
return RequestStatusError, nil
}
and what we partially fixed in this PR? 🤔

Something that might be a potential problem is that we don't check here if it's not a plugin error

if hasCancelledError {
if err := WithDownstreamErrorSource(ctx); err != nil {
return RequestStatusError, fmt.Errorf("failed to set downstream status source: %w", errors.Join(innerErr, err))
}
return RequestStatusCancelled, nil
}
if hasHTTPTimeoutError {
if err := WithDownstreamErrorSource(ctx); err != nil {
return RequestStatusError, fmt.Errorf("failed to set downstream status source: %w", errors.Join(innerErr, err))
}
return RequestStatusError, nil
}

We do that for hasDownstreamError
https://github.com/grafana/grafana-plugin-sdk-go/blob/e1e82c05cc52e8fb712dbf581a63e6f96f04965c/backend/data_adapter.go#L83C3-L85C45

@marefr
Copy link
Contributor

marefr commented Aug 29, 2024

Nevermind I think I see what you mean now. If there's a data response error and there's no ErrorSource set we currently default to plugin source, right?

@marefr
Copy link
Contributor

marefr commented Aug 29, 2024

I'll have a look at this and the new tasks added to #1050

@wbrowne
Copy link
Member Author

wbrowne commented Aug 29, 2024

Nevermind I think I see what you mean now. If there's a data response error and there's no ErrorSource set we currently default to plugin source, right?

Exactly!

I'll have a look at this and the new tasks added to #1050

Awesome - thank you 👍

@marefr
Copy link
Contributor

marefr commented Aug 29, 2024

PTAL #1066

@ivanahuckova
Copy link
Member

I think it also solves also DeadlineExceeded error marking as downstream so marking that in #1050 as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

httpClient should mark net/http timeouts as downstream errors, so every plugin does not have to do that
4 participants