Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(generic-metrics): Add generic metrics interface #46957

Merged
merged 29 commits into from
Apr 13, 2023

Conversation

ayirr7
Copy link
Member

@ayirr7 ayirr7 commented Apr 5, 2023

This is a draft of the (very initial) interface that product teams would use for sending the different metrics. The Kafka implementation of this would actually construct metrics messages and produce them to the ingest topic via the Producer API.

The use case registry is also included in this, which is just an Enum currently.

I'm wondering if there's any obvious things I'm missing in this initial scaffolding code, as well as if there's anything I should look out for as we build upon this.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 5, 2023
from sentry.sentry_metrics.use_case_id_registry import UseCaseID


class GenericMetricsBackend(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this interface is going to produce on Kafka it will be asynchronous. Still you will likely want to allow the client to wait for the message to be acknowledged (callback). How do you plan to support this use case?

You will likely want an interface that allows you to swap the backend (relay or kafka) but still allow delivery guarantee. How will it look like in relay ? Will the call be synchronous ?

There are multiple ways to achieve this:

  • have a sync api and an async api
  • have only an async api and simulate the async behavior in relay with a background thread that wait for the HTTP response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, to start, maybe we can have an aysnc api and a sync api. I would start working on this while building the Kafka backend, which will be the next PR. Is there anything else specific to this PR that would need to change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should figure out how the async interface looks like (how the client receives the callback, is that a future?) at this point as you will need such interface immediately for Kafka.

Copy link
Member Author

@ayirr7 ayirr7 Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am thinking about that for the Kafka implementation. Since this PR implements the UseCaseID though, it's a bit blocking for some other work. How would it be if we merged the current state of the interface and, in another PR, revisit it as needed + build the Kafka implementation?

src/sentry/sentry_metrics/metrics_interface.py Outdated Show resolved Hide resolved
Comment on lines 10 to 11
PERFORMANCE = "performance"
RELEASE_HEALTH = "release-health"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MRI namespaces used for these cases today are transactions and sessions, what's the reason for changing the names ?

Copy link
Member Author

@ayirr7 ayirr7 Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UseCaseID is what is ultimately populating the use_case_id field in messages' payloads. It's also what Clickhouse knows to be the use_case_id. Is there a reason why we should change these from performance and release-health to transactions and sessions? (Also to be clear, we are not changing the namespaces. We are just introducing the UseCaseID enum to keep track of the "true" use cases - e.g. performance, RH, escalating_issues - that identify messages and that Clickhouse recognizes ultimately. We also have a mapping between UseCaseID and namespace. You can also see UseCaseKey which is what currently gets added onto message payloads before sending them to Snuba.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the true use cases should be transactions and sessions not performance and release_health so you avoid the mapping altogether.
I don't think we are storing anything as use case in Clickhouse today, so you would not be changing the existing value.

Another reason for not doing the pamming. Let's say you map what comes from MRI (transactions) to what the indexer consider the true use case (performance). What if somebody tomorow introduces a new MRI namespace called performance in the intent of creating a new use case? Now you would have a naming clash.

I'd just take transactions and sessions as use case ids and avoid the mapping.

from sentry.sentry_metrics.use_case_id_registry import UseCaseID


class GenericMetricsBackend(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate from Filippo's comment, I think on the base class we should have docstrings for each method.

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Merging #46957 (7f20f17) into master (8bdffb9) will decrease coverage by 0.02%.
The diff coverage is n/a.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #46957      +/-   ##
==========================================
- Coverage   80.61%   80.60%   -0.02%     
==========================================
  Files        4749     4774      +25     
  Lines      200714   201782    +1068     
  Branches    11657    11727      +70     
==========================================
+ Hits       161808   162643     +835     
- Misses      38648    38880     +232     
- Partials      258      259       +1     

see 421 files with indirect coverage changes

@untitaker
Copy link
Member

For the query interface, we'll be using the Snuba API

does this mean that people will resolve strings from the metrics indexer themselves? Or are you saying we will still go through the metrics abstraction layer that the TET team built

@ayirr7
Copy link
Member Author

ayirr7 commented Apr 12, 2023

For the query interface, we'll be using the Snuba API

does this mean that people will resolve strings from the metrics indexer themselves? Or are you saying we will still go through the metrics abstraction layer that the TET team built

I mean the metrics abstraction layer, thanks for helping clarify that

@ayirr7 ayirr7 merged commit 4ee50b2 into master Apr 13, 2023
@ayirr7 ayirr7 deleted the expand-metrics-interface branch April 13, 2023 20:19
@github-actions github-actions bot locked and limited conversation to collaborators Apr 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants