Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breakers #566

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Circuit breakers #566

wants to merge 2 commits into from

Conversation

bsimpson63
Copy link
Contributor

@bsimpson63 bsimpson63 commented Feb 26, 2021

This is largely copied from reddit/reddit-service-graphql/tree/master/graphql_api/lib/circuit_breaker. That code is great and running in production in a few service.

I added docstrings and types where missing and then added the CircuitBreakerClientWrapperFactory interface to hopefully make it a bit easier to integrate with baseplate service.

I'm going to copy/paste this code into thing service and r2 to ensure it works as expected and isn't missing anything. Then we can consider merging this PR into baseplate.

👓 @suzie-su @pnovotnak @spladug @manishapme @jennLam

@bsimpson63 bsimpson63 requested a review from a team as a code owner February 26, 2021 18:41
Copy link
Contributor

@suzie-su suzie-su left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good refactor 👍
just some questions.


def make_object_for_context(self, name: str, span: Span) -> Any:
client = self.client_factory.make_object_for_context(name, span)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line?

# they will be None and passed through to the Breaker() constructor
# which will override the defaults set in Breaker()
assert prefix.endswith(".")
parser = config.SpecParser(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is difference between config.parse_config vs config.SpecParser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/reddit/baseplate.py/blob/develop/baseplate/lib/config.py#L625-L635

parse_config is essentially SpecParser (returned by Parser.from_spec) and then parser.parse("", app_config), so it seems it doesn't let you specify the prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the parser.parse{"", {prefix: }} can set the prefix in the dict right


def __init__(self, client_factory: ContextFactory, breaker_box: "CircuitBreakerBox"):
self.client_factory = client_factory
self.breaker_box = breaker_box
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in this way, people will have to initialize the breaker_box in. init file and pass it in to factory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah exactly. that's similar to how we do other clients in baseplate.py, and will make sure that all workers/greenlets share the same breakers.

if not success:
self.failures += 1

if self._is_bucket_full and not self.results_bucket[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔕 can we grab this comment or put something like it here?

:param trip_for: how long to remain tripped before resetting the breaker
:param fuzz_ratio: how much to randomly add/subtract to the trip_for time
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔕 My linter complains about variables initialized outside init, feel free to disregard

Suggested change
failures: int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a little funky that it's only initialized in reset()(which is called from the constructor), but it does prevent some duplication. I'm leaning towards leaving it as is because this is copied exactly from the gql implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be good to add the type annotation just so everything knows it can expect the attribute to be there

Copy link
Contributor

@pnovotnak pnovotnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all I think this is good and I like the ergonomics. Stoked to have this in baseplate! Thank you for putting in the time, Brian

@bsimpson63
Copy link
Contributor Author

Note from Manisha:

only request is some docs that show how you set it up in the tests in the docstring somewhere. like add this to your init.py file

breaker_box = breaker_box_from_config(
            app_config={
                "brkr.samples": "4",
                "brkr.trip_failure_ratio": "0.75",
                "brkr.trip_for": "1 minute",
                "brkr.fuzz_ratio": "0.1",
            }
wrapped_client_factory = CircuitBreakerClientWrapperFactory(
            client_factory, self.breaker_box
        )
baseplate.add_to_context("wrapped_client", wrapped_client_factory)

Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super cool! i've left a few comments. also, it'd be good to add docs/api/baseplate/lib/circuit_breaker.rst to make this show up in docs. that's a great place to include example usage and any exposition on how it works. see the docs/api/baseplate/clients/ docs for some good examples.

:param trip_for: how long to remain tripped before resetting the breaker
:param fuzz_ratio: how much to randomly add/subtract to the trip_for time
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be good to add the type annotation just so everything knows it can expect the attribute to be there

WORKING = "working"
TRIPPED = "tripped"
# trip immediately after failure
TESTING = "testing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the sequence of states described below, it's a bit weird that testing is after tripped in this enum

self.name = name
self.samples = samples
self.results_bucket: Deque = deque([], self.samples)
self.tripped_until: datetime = datetime.utcnow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔕 would time.monotonic() be better for this? much less baggage and no worries about the clock shifting under us

also happens to be a bit faster

$ python3 -m timeit -s 'import datetime' -- 'datetime.datetime.utcnow()'
2000000 loops, best of 5: 150 nsec per loop

$ python3 -m timeit -s 'import time' -- 'time.monotonic()'
5000000 loops, best of 5: 60.8 nsec per loop



logger = logging.getLogger(__name__)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add a config.Parser subclass like https://github.com/reddit/baseplate.py/blob/develop/baseplate/clients/thrift.py#L24-L44 here so this circuit breaker can be used with baseplate.configure_context()


breaker = self.breaker_box.breaker_box["get_something"]
assert breaker.name == "test_breaker.get_something"
assert breaker.results_bucket == deque([True, True, False, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dequeue is an implementation detail, right? if we assert the exact internal state of the breaker like this it means we have to change all these tests if we change the implementation at all. it'd be less brittle to only test the externally visible state.

logger = logging.getLogger(__name__)


class CircuitBreakerClientWrapperFactory(ContextFactory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also be exposing a report_runtime_metrics that calls one on the client_factory if it exists? Or is the idea that you would still attach the bare client factory to the context?

return CircuitBreakerWrappedClient(span, self.breaker_box, client)


class CircuitBreakerWrappedClient:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way may be ideally out of scope for this PR but it could be nice to have a ThriftCircuitBreakerWrappedClient that acts like a proxy class and just decorates the thrift interface functions with the breaker_context logic.

Copy link
Contributor

@curioussavage curioussavage Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a little implementation in our codebase for this, works fine. I wonder if a patch to integrate a circuit breaker with the thrift client would be accepted after this merges.

Also, since retry logic can cause N requests to be sent before it lets the last exception bubble up to the breaker any approach not integrated with the thrift client doesn't really have an accurate picture of failures.

@manishapme
Copy link
Contributor

🏃‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants