-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circuit breakers #566
base: develop
Are you sure you want to change the base?
Circuit breakers #566
Conversation
no changes have been made yet!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good refactor 👍
just some questions.
|
||
def make_object_for_context(self, name: str, span: Span) -> Any: | ||
client = self.client_factory.make_object_for_context(name, span) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line?
# they will be None and passed through to the Breaker() constructor | ||
# which will override the defaults set in Breaker() | ||
assert prefix.endswith(".") | ||
parser = config.SpecParser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is difference between config.parse_config
vs config.SpecParser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/reddit/baseplate.py/blob/develop/baseplate/lib/config.py#L625-L635
parse_config
is essentially SpecParser
(returned by Parser.from_spec
) and then parser.parse("", app_config)
, so it seems it doesn't let you specify the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in the parser.parse{"", {prefix: }} can set the prefix
in the dict right
|
||
def __init__(self, client_factory: ContextFactory, breaker_box: "CircuitBreakerBox"): | ||
self.client_factory = client_factory | ||
self.breaker_box = breaker_box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in this way, people will have to initialize the breaker_box in. init file and pass it in to factory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah exactly. that's similar to how we do other clients in baseplate.py, and will make sure that all workers/greenlets share the same breakers.
if not success: | ||
self.failures += 1 | ||
|
||
if self._is_bucket_full and not self.results_bucket[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔕 can we grab this comment or put something like it here?
:param trip_for: how long to remain tripped before resetting the breaker | ||
:param fuzz_ratio: how much to randomly add/subtract to the trip_for time | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔕 My linter complains about variables initialized outside init, feel free to disregard
failures: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a little funky that it's only initialized in reset()
(which is called from the constructor), but it does prevent some duplication. I'm leaning towards leaving it as is because this is copied exactly from the gql implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be good to add the type annotation just so everything knows it can expect the attribute to be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all I think this is good and I like the ergonomics. Stoked to have this in baseplate! Thank you for putting in the time, Brian
Note from Manisha: only request is some docs that show how you set it up in the tests in the docstring somewhere. like add this to your init.py file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super cool! i've left a few comments. also, it'd be good to add docs/api/baseplate/lib/circuit_breaker.rst
to make this show up in docs. that's a great place to include example usage and any exposition on how it works. see the docs/api/baseplate/clients/
docs for some good examples.
:param trip_for: how long to remain tripped before resetting the breaker | ||
:param fuzz_ratio: how much to randomly add/subtract to the trip_for time | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be good to add the type annotation just so everything knows it can expect the attribute to be there
WORKING = "working" | ||
TRIPPED = "tripped" | ||
# trip immediately after failure | ||
TESTING = "testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the sequence of states described below, it's a bit weird that testing is after tripped in this enum
self.name = name | ||
self.samples = samples | ||
self.results_bucket: Deque = deque([], self.samples) | ||
self.tripped_until: datetime = datetime.utcnow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔕 would time.monotonic()
be better for this? much less baggage and no worries about the clock shifting under us
also happens to be a bit faster
$ python3 -m timeit -s 'import datetime' -- 'datetime.datetime.utcnow()'
2000000 loops, best of 5: 150 nsec per loop
$ python3 -m timeit -s 'import time' -- 'time.monotonic()'
5000000 loops, best of 5: 60.8 nsec per loop
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to add a config.Parser
subclass like https://github.com/reddit/baseplate.py/blob/develop/baseplate/clients/thrift.py#L24-L44 here so this circuit breaker can be used with baseplate.configure_context()
|
||
breaker = self.breaker_box.breaker_box["get_something"] | ||
assert breaker.name == "test_breaker.get_something" | ||
assert breaker.results_bucket == deque([True, True, False, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dequeue is an implementation detail, right? if we assert the exact internal state of the breaker like this it means we have to change all these tests if we change the implementation at all. it'd be less brittle to only test the externally visible state.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class CircuitBreakerClientWrapperFactory(ContextFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also be exposing a report_runtime_metrics
that calls one on the client_factory
if it exists? Or is the idea that you would still attach the bare client factory to the context?
return CircuitBreakerWrappedClient(span, self.breaker_box, client) | ||
|
||
|
||
class CircuitBreakerWrappedClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way may be ideally out of scope for this PR but it could be nice to have a ThriftCircuitBreakerWrappedClient
that acts like a proxy class and just decorates the thrift interface functions with the breaker_context
logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a little implementation in our codebase for this, works fine. I wonder if a patch to integrate a circuit breaker with the thrift client would be accepted after this merges.
Also, since retry logic can cause N requests to be sent before it lets the last exception bubble up to the breaker any approach not integrated with the thrift client doesn't really have an accurate picture of failures.
🏃♂️ |
This is largely copied from reddit/reddit-service-graphql/tree/master/graphql_api/lib/circuit_breaker. That code is great and running in production in a few service.
I added docstrings and types where missing and then added the
CircuitBreakerClientWrapperFactory
interface to hopefully make it a bit easier to integrate with baseplate service.I'm going to copy/paste this code into thing service and r2 to ensure it works as expected and isn't missing anything. Then we can consider merging this PR into baseplate.
👓 @suzie-su @pnovotnak @spladug @manishapme @jennLam