Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(ddm): Dual write to Sentry metrics as well as Datadog #5474

Merged
merged 6 commits into from
Jan 31, 2024

Conversation

evanh
Copy link
Member

@evanh evanh commented Jan 30, 2024

This PR wraps the Datadog and Sentry metrics into a single backend, so we can
start dogfooding Sentry metrics. Currently the Sentry portion is feature
flagged and has a sampling rate.

Sentry also provides a way to specify the unit and record distributions, so add
that to the abstract so we can start using it later.

This PR wraps the Datadog and Sentry metrics into a single backend, so we can
start dogfooding Sentry metrics. Currently the Sentry portion is feature
flagged and has a sampling rate.
@evanh evanh requested a review from a team as a code owner January 30, 2024 21:19
if self._use_sentry():
self.sentry.gauge(name, value, tags, unit)

def timing(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timings are distributions -- i don't think we need both. the implementation in the SDK is identical

Copy link
Member Author

@evanh evanh Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not identical, because in the SDK they accept different units. The timing ones specifically only accept timing units, and distributions are more flexible.

So we should be using distribution when we are logging things that aren't timing based. See https://github.com/getsentry/snuba/blob/master/snuba/clickhouse/http.py#L253 for an example that should be changed to distribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a default impl for timing that uses distribution internally or the other way around? The impl really is identical in the SDK (the only thing that differs is type hints and return value) -- i get that you want to have different defaults for unit in each case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not both? Both exist in the Sentry SDK and in Datadog. Why not use the SDKs as intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in both SDKs they're literally just aliases and now you're forcing every backend to implement those aliases.

the purpose of sentry_sdk.metrics.timing is to provide a decorator and context manager to measure a codeblock -- we're not using that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forcing every backend to implement those aliases.

This is true, but we are missing the distribution functionality from all our backends, so I would have to add that no matter what (timing is a subset of distribution). Why not leave timing in place, add distributions, and then our backends line up with both of the production SDKs (DD and Sentry).

literally just aliases

If the aliases are already available, why not leverage them? Why in turn make timing another alias on distribution? The DD and Sentry SDKs already provide that alias for us.

Copy link
Member

@untitaker untitaker Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to add that no matter what

you really don't, you could continue to use timing or rename it to distribution if you want to. timing is not a subset of distribution in master branch, it's just distribution with maybe an unfortunate (not-technically-correct) name. the semantic difference was created by adding unit to the interface. i don't see the point in adding more methods that do the same thing but only differ in their default for unit, but I don't feel too strongly about all of this so i'm approving

Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Attention: 64 lines in your changes are missing coverage. Please review.

Comparison is base (cffe46b) 90.19% compared to head (bea4cba) 90.02%.

Files Patch % Lines
snuba/utils/metrics/backends/dualwrite.py 0.00% 33 Missing ⚠️
snuba/utils/metrics/backends/sentry.py 0.00% 17 Missing ⚠️
snuba/utils/metrics/backends/testing.py 33.33% 6 Missing ⚠️
snuba/utils/metrics/util.py 0.00% 3 Missing ⚠️
snuba/utils/metrics/backends/datadog.py 0.00% 2 Missing ⚠️
snuba/utils/metrics/backends/abstract.py 66.66% 1 Missing ⚠️
snuba/utils/metrics/wrapper.py 75.00% 1 Missing ⚠️
tests/backends/metrics.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5474      +/-   ##
==========================================
- Coverage   90.19%   90.02%   -0.18%     
==========================================
  Files         889      891       +2     
  Lines       43447    43530      +83     
  Branches      288      288              
==========================================
- Hits        39189    39188       -1     
- Misses       4216     4300      +84     
  Partials       42       42              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dbanda
Copy link
Contributor

dbanda commented Jan 30, 2024

Do we need to do something to blacklist the metrics consumer to avoid going into a loop or is this handled by s4s?

@untitaker
Copy link
Member

Do we need to do something to blacklist the metrics consumer to avoid going into a loop or is this handled by s4s?

at this sample rate probably not -- but as we increase it I would override the sample rate to 0 for those particular deployments

@untitaker
Copy link
Member

to be clear, there's sample rate setting that can be overridden with envvars, I suggest to set that envvar for the particular consumers where we see this feedback loop effect, either set it back to 0 or just a lower value

@evanh evanh merged commit a32d014 into master Jan 31, 2024
29 of 30 checks passed
@evanh evanh deleted the evanh/fix/use-sentry-metrics branch January 31, 2024 15:18
@getsentry-bot
Copy link
Contributor

PR reverted: 0b1242d

getsentry-bot added a commit that referenced this pull request Jan 31, 2024
)"

This reverts commit a32d014.

Co-authored-by: evanh <2727124+evanh@users.noreply.github.com>
evanh added a commit that referenced this pull request Jan 31, 2024
This PR wraps the Datadog and Sentry metrics into a single backend, so we can
start dogfooding Sentry metrics. Currently the Sentry portion is feature
flagged and has a sampling rate.

Sentry also provides a way to specify the unit and record distributions, so add
that to the abstract so we can start using it later.
evanh added a commit that referenced this pull request Feb 1, 2024
)

This PR wraps the Datadog and Sentry metrics into a single backend, so we can
start dogfooding Sentry metrics. Currently the Sentry portion is feature
flagged and has a sampling rate.

Sentry also provides a way to specify the unit and record distributions, so add
that to the abstract so we can start using it later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants