-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
small updates to Telemetry #833
base: main
Are you sure you want to change the base?
Conversation
CI is failing btw |
Same error happens when I reverted all my changes, it seems to be related to something else. I will try to debug it further. |
Okay, all tests are passing now, I think I fixed original problem by setting SHELL env. I also noticed |
lib/stripe/webhook_plug.ex
Outdated
Event data: #{inspect(event)} | ||
""" | ||
end | ||
:telemetry.span(~w[stripe webhook]a, %{}, fn -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be done at the WebServer Telemetry level? That was the feedback given from the OTEL folks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, Phoenix and Plug already add their own events around requests, but these are very generic and we can do it better here by annotating with Stripe event type. Having this together with other part gives us more complete view of Stripe subsystem, or that's how I see it from monitoring perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am with you! I see the point of adding event.type
since it is domain-specific.
But wouldn't the WebServer handle the handler_status
based on the response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I'm actually working on integrating OTEL and added Telemetry here to create spans and traces similar to how it's done in phoenix in otel-erlang-contrib. I hope to have spans that include stripe api calls by name, I think that will be super helpful when browsing traces, but would be difficult to do using generic telemetry events from phoenix or other lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not opposed to anything, but since these are architectural components, we are better off doing research and making sure we are following "standards" (OTEL practices for now) so we all benefit
I came across particular telemetry requirements that required the Operator to change their mindset and workflow to accomplish what they wanted; I am not saying that is the case here! I am just asking questions to make sure we are doing the right thing. I hope you understand.
This is a tricky domain; everyone wants their way of observing things...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't the WebServer handle the handler_status based on the response?
Well, sort of. WebServer will report it in its own telemetry as completely different event so in a sense yes, but it's not always available. I think OTEL traces will be able to connect to parent span and present this info but other systems, for example Prometheus, won't.
Adding handler_status to this event makes it possible to monitor error rates by Stripe event type and use RED method to get useful insights with little effort, that's the main reason why I decided to add it here.
Alternatively we could add exactly same tracing to handler module that WebhookHandler
is calling. It will work the same but this is additional effort to build custom observability versus having it provided ootb by stripity-stripe.
I personally prefer ootb experience because otherwise it's difficult to get in and do it right on your own.
I am not opposed to anything, but since these are architectural components, we are better off doing research and making sure we are following "standards" (OTEL practices for now) so we all benefit
[...]
Yup, understood, happy to have this conversation and to look for best solution together, thanks for taking the time to discuss it too!
Added telemetry support to webhook handler and changed metadata included in API requests. API requests metadata contains some low cardinality fields that can be useful when visualising these metrics in other tools. For example using 'stripe.request.stop' event we can infer metrics for RED method and have visibility into all Stripe API endpoints. 'url' passed in to 'do_perform_request_and_retry' contains query params which can contain sensitive data such as emails so it is URI-parsed and all params are dropped. Webhooks are often important part of integration with Stripe and having telemetry in place for that is good idea too imo.
Maybe fix problem with erlexec failing to start: ``` SHELL environment variable not set! [c_src/exec.cpp:609] Notice: 15:48:19.772 [notice] Application erlexec exited: :exec_app.start(:normal, []) returned an error: shutdown: failed to start child: :exec ** (EXIT) {:port_exited_with_status, 4} ```
Run mix compile before tests so the output is separate. Trace will make it easier to understand if some tests were excluded and such.
Some tests are failing because query params are in wrong order. My fix is to ignore order and just checking all were set with correct value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, just few minors adjustment, primarily around HTTP metadata and aligned a bit more with common OTEL lingos.
lib/stripe/api.ex
Outdated
@@ -376,14 +376,22 @@ defmodule Stripe.API do | |||
end | |||
|
|||
defp do_perform_request_and_retry(method, url, headers, body, opts, {:attempts, attempts}) do | |||
telemetry_meta = %{ | |||
endpoint: URI.parse(url).path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endpoint: URI.parse(url).path, | |
http_url: URI.parse(url), |
Pass the entire URL, let processors to filter the info, HTTP OTEL packages would do something around the line of https://github.com/open-telemetry/opentelemetry-erlang-contrib/blob/3f13bc8b2042071b723d0ea8c7b118fd10f42d26/instrumentation/opentelemetry_req/lib/opentelemetry_req.ex#L143-L148
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay added and I renamed original field to stripe_api_endpoint
. I can remove it because you're right that it can be extracted from http_url
via processor, I kept it for better out-ot-the-box experience.
lib/stripe/api.ex
Outdated
endpoint: URI.parse(url).path, | ||
method: method, | ||
attempt: attempts, | ||
stripe_api_version: headers["Stripe-Version"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stripe_api_version: headers["Stripe-Version"], | |
http_headers: headers, |
lib/stripe/api.ex
Outdated
case http_module().request(method, url, Map.to_list(headers), body, opts) do | ||
{:ok, status, _, _} = resp -> | ||
{resp, %{status: status}} | ||
{resp, %{telemetry_meta | status: status}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{resp, %{telemetry_meta | status: status}} | |
{resp, %{telemetry_meta | http_status_code: status, http_response_content_length: http_response_content_length }} |
If content-length
is present add http_response_content_length
Updated naming and changed where metadata is constructed in order to keep original function short and focused. URL is stripped of query params to avoid leaking of sensitive data such as emails to logging systems.
Hii, I updated names per your feedback, but not completely.
I also moved setting metadata one level higher so the original function is less complicated. Just experimenting and trying to fit in with current code patterns. Lemme know what you think, if it looks good this way I will squash this commit with the original. |
Hey, I'm having second thoughts about the usefulness of this PR. Everything I added here can be easily instrumented client-side too, just by wrapping calls to |
I am OK taking these changes; I left a few comments the last time, primarily around the name of the tags and the value of them. |
|
||
opts | ||
|> Keyword.put(:telemetry_metadata, %{ | ||
stripe_api_endpoint: parsed_uri.path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be extracted from http_url
if somebody really wants to map the data to something else
Okay, I think I addressed earlier feedback in code or in GH and also left some followup questions for you, I'm not sure you noticed it 😅 |
This pull request has been automatically marked as "stale:discard". If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated!. |
Hi there, I added some telemetry around webhook handler and changed metadata included in all API requests. It's all just a few small changes to make monitoring for stripity-stripe users awesome out of the box 🙃
Webhooks are often important part of integration with Stripe so having telemetry in there is good idea imo.
API requests metadata contains some low cardinality fields that can be useful when visualising these metrics in other tools. For example using
stripe.request.stop
event we can infer metrics for RED method and also have visibility into individual Stripe API endpoints.There are no breaking changes in here, only telemetry metadata is different (
url
field is gone). There is assumption thatevent
object always containstype
property, which is always a case to my knowledge and Stripe docs.Here is example how it looks in PhoenixLiveDashboard:
In order to enable this visualisation I only needed to add few lines to
telemetry.ex
:Other systems such as prom_ex, telemetry_ui, otel, etc can be easily integrated too.
Hope you like it, happy to take in any and all feedback.