-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive retries #257
Adaptive retries #257
Conversation
25c41cf
to
726dc1a
Compare
I have a few questions regarding this draft.
|
What about adding token on failure/giving up? |
Isn't the "adaptiveness" or "token bucket" an orthogonal concern to the other pieces of configuration that we have currently: As for your questions:
|
I think that it would contradict what TokenBucketRetryCondition and AdaptiveRetryStrategy do. If we add token on failure we would effectively take token for retry and add it back after it fails again. I don't know if I understood that question correctly. |
Thanks for the changes, but I've got yet another variant to consider. Up until now, retries where stateless - it was just a config saying how many time to retry, regardless of other invocations. With the adaptive variant, retries become stateful - the token bucket must be shared. State is usually hard and we have to be cautious not to introduce "surprising" APIs. That's why maybe it would be better to make the state possible explicit. That is - keep // this will be defined at the top-level, shared among request handlers etc.
val adaptiveRetry = AdaptiveRetry(TokenBucket(...))
// usage:
adaptiveRetry(RetryConfig.backoff(maxRetries, initialDelay, maxDelay))(...) As for the implementation, we might use the callbacks in the retry policy (respecting the callbacks that already are there). Also, shouldn't we specify both What do you think? :) |
This WIP is mostly based on mechanism implemented in aws-sdk. There is only exceptionCost. But we can add also successReward. Maybe with new retry |
Hm, I think that's a good feature of the AWS SDK - if the call didn't even make it over the network, then we shouldn't count this as an "adaptive failure". So just as we have "isSuccess" etc in the policy, here we could also have one piece of config for adaptive retries - whether the exception should incur the retry penalty |
I am trying two approaches for writing AdaptiveRetry class. One defines methods like def apply(config: RetryConfig[E, T])(operation: => T): T because we can't narrow T to Second approach pushes definitions of this methods to apply. This also have some drawbacks. We can't have multiple apply methods with the same defaults, so we need to force user to define these, or hardcode defaults. Two argument lists also makes it hard for compiler to pick proper overloaded method. In the version I pushed I needed to specify with config E and T explicitly because without it compiler would pick method that returns That means that we would probably need to get rid of overloaded apply methods, because if compiler picks wrong one we only find out if it blows up in runtime. These are my thoughts after trying this variant, or maybe there is more ways to define this class. |
But isn't failure / success cost constant? Having them variable seems like a very advanced scenario |
I'd imagine them simply as |
These can be easily changed. I was wondering more if we are okey with signature like this: def apply[E, T, F[_]](
config: RetryConfig[E, T],
failureCost: Int,
successReward: Int,
isFailure: E => Boolean,
errorMode: ErrorMode[E, F]
)(operation: => F[T]): F[T] Because in this scenario overloading apply methods is not good idea because things like I mentioned in the comment can sneak by and blow out in runtime. But writing few methods, like |
I think I'd push all the "adaptive" configuration to the |
I think that would be nice middle ground for how much configuration should be provided at every invocation. Maybe running operations on I didn't add more tests in |
17a3462
to
e2616e6
Compare
*/ | ||
def apply[E, T, F[_]]( | ||
config: RetryConfig[E, T], | ||
isFailure: E => Boolean = (_: E) => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we get an error value (E
), that's always a failure. The additional flexibility that we provide in retries, is that success values T
might also cause a retry - so there's no exception thrown, for example (if E == Throwable), but we get back a value which signals a retry anyway. But if we get an E, that's a failure for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed signature to isFailure: T => Boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed changes according to this comment. shouldPayPenaltyCost: T => Boolean = (_: T) => true
decides if penalty will be paid for result T that was decided to be retried.
But I am not sure about the use case. What if we use library that has client side throttling and operation ends with error. It was throttling so we didn't make the call to the service so we should not pay for this retry but now we can't handle this case. Maybe it should be Either[E, T] => Boolean
. Just a thought, I don't know which case might be more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, Either[E, T] => Boolean
would be much more useful. Also let's rename this to shouldPayFailureCost
to keep the vocabulary consistent
367fc4c
to
04980f2
Compare
The code examples in the docs didn't have |
Closes #252
Implementation of adaptive retries based on implementation from aws-sdk-java-v2. It is backed by simple token bucket, where for every retry we take tokens, and replenish them for every successful operation. This way if we are making for example request for very resource-constrained set of resources, after system failure we don't generate more load by retrying every operation that failed.