-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix experimental_retry #6338
base: dev
Are you sure you want to change the base?
Fix experimental_retry #6338
Conversation
Signed-off-by: Benjamin <5719034+bnjjj@users.noreply.github.com>
Signed-off-by: Benjamin <5719034+bnjjj@users.noreply.github.com>
✅ Docs Preview ReadyNo new or changed pages found. |
This comment has been minimized.
This comment has been minimized.
CI performance tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to consider changing. Definitely change "retry" to "retries" though.
self.budget.deposit(); | ||
None | ||
Ok(resp) => { | ||
if resp.response.status() >= http::StatusCode::BAD_REQUEST { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? I mean, what if the subgraph does something "weird" like returning a redirect?
i.e.: should this be:
if resp.response.status() != http::StatusCode::OK {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently saw users using 302 status code as a correct status code and they were making some stuffs in rhai about this. So I prefer to be conservative and use 400 status code as a threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me. If we go with this, then it means that we are encoding "weird" behaviour into the router and that might not be what some customers want.
My preference is to do the expected thing, rather than produce something which fits into a "weird" behaviour of a customer. See what other people think, since I may really be in the minority here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a subgraph returns a redirect, I would not expect it to be retried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me. If we go with this, then it means that we are encoding "weird" behaviour into the router and that might not be what some customers want.
I agree, but I also agree with what @goto-bus-stop why would we retry on 3XX ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest: I find the concept of the retry module a bit weird. Maybe the best thing to do is to define exactly which codes we will retry on and document that?
@goto-bus-stop Says definitely on a 429. I'd probably also want to retry a 503. Apart for those two, I probably wouldn't want to retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like starting with a small selection of codes.
I'm second-guessing 429 because we don't support Retry-After
in this implementation and maybe that's something that clients should be doing, rather than us? If we just retry instantly then there is a good chance that we get a 429 again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after mentions 3xx is legit for retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it says that. I read that the user agent should wait Retry-After
seconds before following the redirect, not that it should retry the original request:
When sent with any 3xx (Redirection) response, Retry-After indicates the minimum time that the user agent is asked to wait before issuing the redirected request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to do that? Retry to the redirected request?
assert!(retry | ||
.retry( | ||
&subgraph_req, | ||
Ok::<&subgraph::Response, &BoxError>( | ||
&subgraph::Response::fake_builder() | ||
.status_code(StatusCode::BAD_REQUEST) | ||
.build() | ||
) | ||
) | ||
.is_some()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for a redirect: MOVED_PERMANENTLY
Signed-off-by: Benjamin <5719034+bnjjj@users.noreply.github.com>
Fix the behavior of
experimental_retry
and make sure both the feature and metrics are working.An entry in the context was also added, which would be useful later to implement a new standard attribute and selector for advanced telemetry.
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩