Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(generic-metrics): Add generic metrics interface #46957

Merged
merged 29 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions src/sentry/sentry_metrics/kafka_metrics_interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from typing import Mapping

from sentry.sentry_metrics.metrics_interface import GenericMetricsBackend
from sentry.sentry_metrics.use_case_id_registry import UseCaseID


class KafkaGenericMetricsBackend(GenericMetricsBackend):
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self) -> None:
super().__init__()

def counter(
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an opinionated interface. Why is org and project ID needed at all? Seems like it can't be used for any use cases that aren't within the context of a single org and project then?

Copy link
Contributor

@onewland onewland Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (a required) part of the back-end schema so I don't think we should drop these.

Copy link
Member

@lynnagara lynnagara Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the backend schema is a multi-tenant system and we need to keep individual customer data separate from other customers. For Sentry, this seems like it should always be org_id = 1 and project_id = 1. Why asking the user to pass it every time here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't be the Sentry org_id and project_id since this would be customer data. @onewland am I missing anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify, the metrics platform is for aggregating customer data received during ingest only? I think this restriction wasn't super clear to me before, and I assumed it was valid to use metrics for anything at all.

Copy link
Contributor

@onewland onewland Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from both a query efficiency (the back-end tables are ordered first by org_id and project_id) and cost accounting perspective we should aim to include customer project IDs and org IDs where they make sense, and we can pass Sentry org IDs and project IDs everywhere else.

For example, on the escalating issues project, we're trying to keep a count of error by group ID on ingest. This has an obvious correspondence to a customer's project_id and org_id, and therefore that data should propagate through.

If we wanted to track something like the number of times a celery task is executed, and this is independent from customer data/behavior, then I think it would make more sense to do org_id = 1 and project_id = 1.

Lyn, how would you feel if these parameters were optional and the backends did some resolution if org_id/project_id aren't passed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just hardcode org_id 1 and and project_id 1 though. Is there a plan for how these values will get resolved?

Does this interface make using Relay a non-starter? Or does there need to be other changes there to make this happen? The implication of this interface seems to be that a project can send data onto some other org/project not just itself. I am guessing this is not something generally allowed in Relay as it might have all kinds of implications in terms of security, quotas, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that fallback values could be configured in Sentry options so they could vary by deployment.

I don't see why this interface makes Relay a non-starter, nor why you think it implies that projects would send data on behalf of other projects. Keep in mind that this interface will only be supported by an internal Relay endpoint, and we would not be exposing this interface to anybody outside Sentry. If we ever reach the point where we want to accept metrics externally, it will be through a DSN which does auth and so there will always be a customer-specific org/project_id attached.

Copy link
Member

@untitaker untitaker Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes are planned on Relay-side? Without any, you won't be able to implement this interface for Relay. If you only want to implement this interface for Kafka all is well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so just to clarify - the internal Relay endpoint mentioned above would allow us to skip authenticating as Sentry, and would allow us to send customer org_id and project_id to Relay via this interface

metric_name: str,
value: int,
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
tags: Mapping[str, str],
) -> None:
pass

def set(
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
metric_name: str,
value: int,
tags: Mapping[str, str],
) -> None:
pass

def distribution(
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
metric_name: str,
value: int,
tags: Mapping[str, str],
) -> None:
pass
38 changes: 38 additions & 0 deletions src/sentry/sentry_metrics/metrics_interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
from typing import Mapping

from sentry.sentry_metrics.use_case_id_registry import UseCaseID


class GenericMetricsBackend:
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
def counter(
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
metric_name: str,
value: int,
tags: Mapping[str, str],
) -> None:
raise NotImplementedError()

def set(
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
metric_name: str,
value: int,
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
tags: Mapping[str, str],
) -> None:
raise NotImplementedError()

def distribution(
self,
use_case_id: UseCaseID,
org_id: int,
project_id: int,
metric_name: str,
value: int,
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
tags: Mapping[str, str],
) -> None:
raise NotImplementedError()
49 changes: 49 additions & 0 deletions src/sentry/sentry_metrics/use_case_id_registry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from __future__ import annotations

from typing import Any, Iterator

_HARDCODED_USE_CASES = {"PERFORMANCE": "performance"}

_REGISTERED_USE_CASES: dict[str, str] = {}
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved


class _UseCaseID(type):
def __getattr__(self, attr: str) -> UseCaseID:
if attr not in _HARDCODED_USE_CASES and attr not in _REGISTERED_USE_CASES:
raise AttributeError(attr)

return UseCaseID(attr.lower())

def __iter__(self) -> Iterator[UseCaseID]:
return iter(
UseCaseID(value)
for value in {
**_HARDCODED_USE_CASES,
**_REGISTERED_USE_CASES,
}.values()
)

john-z-yang marked this conversation as resolved.
Show resolved Hide resolved

class UseCaseID(metaclass=_UseCaseID):
ayirr7 marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, value: str):
if (
value not in _HARDCODED_USE_CASES.values()
and value not in _REGISTERED_USE_CASES.values()
):
raise ValueError("Passed use case has not been registered")

self.value = value

def __hash__(self) -> int:
return hash(self.value)

def __eq__(self, other: Any) -> bool:
return isinstance(other, UseCaseID) and other.value == self.value

def __repr__(self) -> str:
return f"UseCaseID.{self.value.upper()}"


def register_use_case(key: str) -> UseCaseID:
_REGISTERED_USE_CASES[key.upper()] = key.lower()
return UseCaseID(key)