Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical and Rate Limited Sampler Implementations #717

Conversation

gillesbergerp
Copy link

The hierarchical sampler allows for specifying a list of samplers that will be invoked in order. It will sample a request if any of its children would.
The Rate Limited sampler on the other hand samples requests at a fixed rate.

This is a prerequisite for #714

Copy link

linux-foundation-easycla bot commented Nov 11, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@gillesbergerp gillesbergerp force-pushed the feature/gillesbergerp/hierarchical-rate-limited branch from a40750d to be451b2 Compare November 11, 2023 05:30
Copy link
Collaborator

@arielvalentin arielvalentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```

Or, if you use [bundler][bundler-home], include `opentelemetry-sampling-hierarchical` in your `Gemfile`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add example usage in the README and/or an example directory with a script that shows how to configure it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there is no way yet to use it. This will require an update to https://github.com/open-telemetry/opentelemetry-ruby/blob/d365dfeb3982f09cd62a17c6e67eb3792f56e460/sdk/lib/opentelemetry/sdk/trace/tracer_provider.rb#L167. I will open a PR to support the X-Ray sampler there as well and will adjust the README afterwards accordingly if that's fine with you


module OpenTelemetry
module Sampling
module Hierarchical
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Hierarchical Sampler a term used in X-Ray to describe this pattern? As opposed to a Any or All sampler?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not but I find Any sampler to be a bit misleading since it doesn't imply a fixed order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this name is not intuitive to me. Would you be able to help me understand what makes it Hierarchical or Ranked ordering vs "If any of the items in this list return true regardless of order"?

Copy link
Author

@gillesbergerp gillesbergerp Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of samplers is iterated left to right. If any returns a positive response, we stop and say that the span should be sampled. Any doesn't give you this guarantee (in my interpretation).
The order can make a difference. A contrived example would be having two samplers:

  • Sampler A: one per second, only if kind == :internal
  • Sampler B: one per second, only if kind in [:server, :internal]

Let's say the application produces two traces in the same second, one with kind = :internal and one with kind = :server (in this order). If sampler A was applied first, both traces would be sampled. On the other hand, if sampler B was first, only the first (:internal) trace would be sampled since B already spent all its credits when the second trace arrives.

#
# SPDX-License-Identifier: Apache-2.0

module OpenTelemetry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the package the package naming should be symmetric with the SDK:

OpenTelemetry::SDK::Trace::Samplers

Or more use the API namespace

OpenTelemetry::Trace::Samplers

@open-telemetry/ruby-contrib-maintainers WDYT?

sampling/hierarchical/test/opentelemetry/test_helper.rb Outdated Show resolved Hide resolved
# @param [Hash<String, Object>] attributes
# @return [OpenTelemetry::SDK::Trace::Samplers::Result] The sampling result.
def should_sample?(trace_id:, parent_context:, links:, name:, kind:, attributes:)
if can_spend?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling is done per span so if you run out of credits_per_second mid trace you will start dropping spans, and potentially end up with broken traces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better solved by instantiating a ParentBased sampler with the RateLimited sampler as root? I'D rather avoid making these distinctions here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by making the parent based sampler as root?

Wouldn't that make it so the rate limited sampling result is always ignored since the Hierarchical Sampler will always check the other samplers to see if it should override the result?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot follow. Are we talking about the hierarchical or rate-limited sampler?

def can_spend?
return false if @credits_per_second <= 0

@lock.synchronize do
Copy link
Contributor

@robertlaurin robertlaurin Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampler is set on the tracer provider so you'll be locking every time you create a span, this may not be appropriate depending on how much span volume your service is generating.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest being optimistic and not locking at all?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a suggestion as much as we're doing is raising concerns about the performance implications of adding another lock around span creation.

I'm not sure how much latency this introduces for applications that generate thousands of spans in a single transaction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since default thread counts are fairly low in most server implementations (in Puma it's 5, for instance for MRI). We can also provide a configuration option for that

private

# @return [Boolean]
def can_spend?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about how this performs in the case where an app is configured with Puma, with Unicorn? Forking vs Pre-forking web servers will result in different spend rates. What's the expected outcome here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point that I haven't considered. I believe that the best option is to not make any assumptions about this at all. Configuring the rate accordingly could restore the parity between forking and pre-forking servers. Should we note this in the README?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that I think we want clarity on what the expectations are for the scope of the rate limiter.

Does the rate limit apply per process, per thread, or per fiber?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per process makes most sense, I believe. I don't see why different processes should have this knowledge of each other. It also makes the handling easier as we don't need to know the details of different servers

@gillesbergerp gillesbergerp force-pushed the feature/gillesbergerp/hierarchical-rate-limited branch from be451b2 to 431f98e Compare November 17, 2023 10:06
@gillesbergerp gillesbergerp force-pushed the feature/gillesbergerp/hierarchical-rate-limited branch from 431f98e to d071542 Compare November 20, 2023 03:58
Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Dec 21, 2023
@github-actions github-actions bot closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Marks an issue/PR stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants