From d2fb9d0964ec63eb90191c291d4e2536b304e0dd Mon Sep 17 00:00:00 2001 From: Peter Findeisen Date: Wed, 4 Dec 2024 13:58:33 -0800 Subject: [PATCH 1/9] Migrating from the old repository at https://github.com/open-telemetry/oteps/pull/250 --- oteps/0250-Composite_Samplers.md | 297 +++++++++++++++++++++++++++++++ 1 file changed, 297 insertions(+) create mode 100644 oteps/0250-Composite_Samplers.md diff --git a/oteps/0250-Composite_Samplers.md b/oteps/0250-Composite_Samplers.md new file mode 100644 index 00000000000..49debadf3b0 --- /dev/null +++ b/oteps/0250-Composite_Samplers.md @@ -0,0 +1,297 @@ +# Composite Samplers Proposal + +This proposal addresses head-based sampling as described by the [Open Telemetry SDK](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling). +It introduces additional _composite samplers_. +Composite samplers use other samplers (_delegates_ or _children_) to make sampling decisions. +The composite samplers invoke the delegate samplers, but eventually make the final call. + +Some of the new samplers proposed here have been designed to work with Consistent Probability Samplers. For detailed description of this concept see [probability sampling (OTEP 235)](https://github.com/open-telemetry/oteps/blob/main/text/trace/0235-sampling-threshold-in-trace-state.md). +Also see Draft PR 3910 [Probability Samplers based on W3C Trace Context Level 2](https://github.com/open-telemetry/opentelemetry-specification/pull/3910). + +## Motivation + +The need for configuring head sampling has been explicitly or implicitly indicated in several discussions, both within the [Sampling SIG](https://docs.google.com/document/d/1gASMhmxNt9qCa8czEMheGlUW2xpORiYoD7dBD7aNtbQ) and in the wider community. +Some of the discussions are going back a number of years, see for example + +- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages? +- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing +- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler + +Unfortunately, some of the valuable ideas flowing at the sampling SIG meetings never got recorded at the time of their inception, but see [Sampling SIG Research Notes](https://github.com/open-telemetry/oteps/pull/213) or the comments under [OTEP 240: A Sampling Configuration proposal](https://github.com/open-telemetry/oteps/pull/240) for some examples. + +## The Goal + +The goal of this proposal is to help creating advanced sampling configurations using pre-defined building blocks. Let's consider the following example of sampling requirements. It is believed that many users will have requirements following a similar pattern. Most notable elements here are trace classification based on target URL, some spans requiring special handling, and putting a sanity cap on the total volume of exported spans. + +### Example + +Head-based sampling requirements. + +- for root spans: + - drop all `/healthcheck` requests + - capture all `/checkout` requests + - capture 25% of all other requests +- for non-root spans + - follow the parent sampling decision + - however, capture all calls to service `/foo` (even if the trace will be incomplete) +- in any case, do not exceed 1000 spans/minute + +We present two quite different approaches to composite samplers. The first one uses only the current [sampling API](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling). +It can be applied to a large variety of samplers, but may not work correctly for Consistent Probability Samplers. It is also not very efficient nor elegant. + +The second approach is applicable exclusively to Consistent Probability Samplers, and is more efficient and less prone to misconfiguration. It requires additional API to be provided by the delegate samplers. + +__Note__: both approaches call for calculating _unions_ of Attribute sets. +Whenever such union is constructed, in case of conflicting attribute keys, the attribute definition from the last set that uses that key takes effect. Similarly, whenever modifications of `Tracestate` are performed in sequence, in case of conflicting keys, the last modification erases the previous values. + +## Approach One + +The following new composite samplers are proposed. + +### AnyOf + +`AnyOf` is a composite sampler which takes a non-empty list of Samplers (delegates) as the argument. The intention is to make `RECORD_AND_SAMPLE` decision if __any of__ the delegates decides to `RECORD_AND_SAMPLE`. + +Upon invocation of its `shouldSample` method, it MUST go through the whole list and invoke `shouldSample` method on each delegate sampler, passing the same arguments as received, and collecting the delegates' sampling Decisions. + +`AnyOf` sampler MUST return a `SamplingResult` with the following elements. + +- If all of the delegate Decisions are `DROP`, the composite sampler MUST return `DROP` Decision as well. +If any of the delegate Decisions is `RECORD_AND_SAMPLE`, the composite sampler MUST return `RECORD_AND_SAMPLE` Decision. +Otherwise, if any of the delegate Decisions is `RECORD_ONLY`, the composite sampler MUST return `RECORD_ONLY` Decision. +- The set of span `Attributes` to be added to the `Span` is the union of the sets of `Attributes` as provided by those delegate samplers which produced a sampling Decision other than `DROP`. +- The `Tracestate` to be used with the new `Span` is obtained by cumulatively applying all the potential modifications of the parent `Tracestate` by the delegate samplers. + +Each delegate sampler MUST be given a chance to participate in the sampling decision as described above and MUST see the same _parent_ state. The resulting sampling Decision does not depend on the order of the delegate samplers. + +### Conjunction + +`Conjunction` is a composite sampler which takes two Samplers (delegates) as the arguments. These delegate samplers will be hereby referenced as First and Second. This kind of composition forms conditional chaining of both samplers. + +Upon invocation of its `shouldSample` method, the Conjunction sampler MUST invoke `shouldSample` method on the First sampler, passing the same arguments as received, and examine the received sampling Decision. +Upon receiving `DROP` or `RECORD_ONLY` decision it MUST return the `SamplingResult` (which includes a set of `Attributes` and `Tracestate` in addition to the sampling Decision) from the First sampler, and it MUST NOT proceed with querying the Second sampler. +If the sampling decision from the First sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST invoke `shouldSample` method on the Second sampler, effectively passing the `Tracestate` received from the First sampler as the parent trace state. + +If the sampling Decision from the Second sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `RECORD_AND_SAMPLE`. +- The set of span `Attributes` to be added to the `Span` is the union of the sets of `Attributes` as provided by both samplers. +- The `Tracestate` to be used with the new `Span` is as provided by the Second sampler. + +If the sampling Decision from the Second sampler is `RECORD_ONLY`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `RECORD_ONLY`. +- The set of span `Attributes` to be added to the `Span` is the set of `Attributes` returned by the First sampler. +- The `Tracestate` to be used with the new `Span` is the `Tracestate` provided by the Second sampler. + +If the sampling Decision from the Second sampler is `DROP`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `DROP`. +- The set of span `Attributes` to be added to the `Span` is empty. +- The `Tracestate` to be used with the new `Span` is the `Tracestate` provided by the Second sampler. + +### RuleBased + +`RuleBased` is a composite sampler which performs `Span` categorization (e.g. when sampling decision depends on `Span` attributes) and sampling. +The Spans can be grouped into separate categories, and each category can use a different Sampler. +Categorization of Spans is aided by `Predicates`. + +#### Predicate + +The Predicates represent logical expressions which can access `Span` `Attributes` (or anything else available when the sampling decision is to be made), and perform tests on the accessible values. +For example, one can test if the target URL for a SERVER span matches a given pattern. +`Predicate` interface allows users to create custom categories based on information that is available at the time of making the sampling decision. + +##### SpanMatches + +This is a routine/function/method for `Predicate`, which returns `true` if a given `Span` matches, i.e. belongs to the category described by the Predicate. + +##### Required Arguments for Predicates + +The arguments represent the values that are made available for `ShouldSample`. + +- `Context` with parent `Span`. +- `TraceId` of the `Span` to be created. +- Name of the `Span` to be created. +- Initial set of `Attributes` of the `Span` to be created. +- Collection of links that will be associated with the `Span` to be created. + +#### Required Arguments for RuleBased + +- `SpanKind` +- list of pairs (`Predicate`, `Sampler`) + +For making the sampling decision, if the `Span` kind matches the specified kind, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the same arguments as received by `ShouldSample`. If a call returns `true`, the corresponding `Sampler` will be called to make the final sampling decision. If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the final decision is `DROP`. + +The order of `Predicate`s is essential. If more than one `Predicate` matches a `Span`, only the Sampler associated with the first matching `Predicate` will be used. + +## Summary - Approach One + +### Example - sampling configuration + +Going back to our example of sampling requirements, we can now configure the head sampler to support this particular case, using an informal notation of samplers and their arguments. +First, let's express the requirements for the ROOT spans as follows. + +``` +S1 = RuleBased(ROOT, { + (http.target == /healthcheck) => AlwaysOff, + (http.target == /checkout) => AlwaysOn, + true => TraceIdRatioBased(0.25) + }) +``` + +Note: technically, `ROOT` is not a `SpanKind`, but is a special token matching all Spans with invalid parent context (i.e. the ROOT spans, regardless of their kind). + +In the next step, we can build the sampler to handle non-root spans as well: + +``` +S2 = ParentBased(S1) +``` + +The special case of calling service `/foo` can now be supported by: + +``` +S3 = AnyOf(S2, RuleBased(CLIENT, { (http.url == /foo) => AlwaysOn }) +``` + +Finally, the last step is to put a limit on the stream of exported spans. One of the available rate limiting sampler that we can use is Jaeger [RateLimitingSampler](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java): + +``` +S4 = Conjunction(S3, RateLimitingSampler(1000 * 60)) +``` + +### Limitations of composite samplers in Approach One + +Not all samplers can participate as components of composite samplers without undesired or unexpected effects. Some samplers require that they _see_ each `Span` being created, even if the span is going to be dropped. Some samplers update the trace state or maintain internal state, and for their correct behavior it it is assumed that their sampling decisions will be honored by the tracer at the face value in all cases. A good example for this are rate limiting samplers which have to keep track of the rate of created spans and/or the rate of positive sampling decisions. + +The need to encode and decode the `Tracestate` multiple times affects performance of the composite samplers. This drawback is eliminated in Approach Two. + +## Approach Two + +A principle of operation for Approach Two is that `ShouldSample` is invoked only once, on the root of the tree formed by composite samplers. All the logic provided by the composition of samplers is handled by calculating the threshold values, delegating the calculation downstream as necessary. + +### New API + +To make this approach possible, all Consistent Probability Samplers which participate in the samplers composition need to implement the following API, in addition to the standard Sampler API. We will use the term _Composable Sampler_ to denote Consistent Probability Samplers which provide the new API and conform to the rules described here. +The composite samplers in Approach Two are Composable Samplers as well. + +#### GetSamplingIntent + +This is a routine/function/method for all Composable Samplers. Its purpose is to query the sampler about the activities it would perform had it been asked to make a sampling decision for a given span, however, without constructing the actual sampling Decision. + +#### Required Arguments for GetSamplingIntent + +The arguments are the same as for [`ShouldSample`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#shouldsample) except for the `TraceId`. + +- `Context` with parent `Span`. +- Name of the `Span` to be created. +- `SpanKind` of the `Span` to be created. +- Initial set of `Attributes` of the `Span` to be created. +- Collection of links that will be associated with the `Span` to be created. + +#### Return value + +The return value is a structure (`SamplingIntent`) with the following elements: + +- The THRESHOLD value represented as a 14-character hexadecimal string, with value of `null` representing non-probabilistic `DROP` decision (implementations MAY use different representation, if it appears more performant or convenient), +- A function (`GetAttributes`) that provides a set of `Attributes` to be added to the `Span` in case of positive final sampling decision, +- A function (`UpdateTraceState`) that given an input `Tracestate` and sampling Decision provides a `Tracestate` to be associated with the `Span`. The samplers SHOULD NOT add or modify the `th` value for the `ot` key within these functions. + +#### Constructing `SamplingResult` + +The process of constructing the final `SamplingResult` in response to a call to `ShouldSample` on the root sampler of the composite samplers tree consists of the following steps. + +- The sampler gets its own `SamplingIntent`, it is a recursive process as described below (unless the sampler is a leaf), +- The sampler compares the received THRESHOLD value with the trace Randomness value to arrive at the final sampling `Decision`, +- In case of a positive sampling decision the sampler calls the received `GetAttributes` function to determine the set of `Attributes` to be added to the `Span`, in most cases it will be a recursive step, +- The sampler calls the received `UpdateTraceState` function passing the parent `Tracestate` and the final sampling `Decision` to get the new `Tracestate` to be associated with the `Span` - again, in most cases this is a recursive step, +- The sampler modifies (or removes) the `th` value for the `ot` key in the `Tracestate` according to the final sampling `Decision` and the THRESHOLD used in the second step above. + +### ConsistentRuleBased + +This composite sampler re-uses the concept of Predicates from Approach One. + +#### Required Arguments for ConsistentRuleBased + +- `SpanKind` +- list of pairs (`Predicate`, `ComposableSampler`) + +For calculating the `SamplingIntent`, if the `Span` kind matches the specified kind, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the same arguments as received. If a call returns `true`, the result is as returned by `GetSamplingIntent` called on the corresponding `ComposableSampler`. If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the result is obtained by calling `GetSamplingIntent` on `ConsistentAlwaysOffSampler`. + +### ConsistentAnyOf + +`ConsistentAnyOf` is a composite sampler which takes a non-empty list of ComposableSamplers (delegates) as the argument. The intention is to make a positive sampling decision if __any of__ the delegates would make a positive decision. + +Upon invocation of its `GetSamplingIntent` function, it MUST go through the whole list and invoke `GetSamplingIntent` function on each delegate sampler, passing the same arguments as received. + +`ConsistentAnyOf` sampler MUST return a `SamplingIntent` which is constructed as follows: + +- If any of the delegates returned a non-`null` threshold value, the resulting threshold is the lexicographical minimum value from the set of those non-`null` values, otherwise `null`. +- The `GetAttributes` function calculates the union of `Attribute` sets as returned by the calls to `GetAttributes` function for each delegate, in the declared order. +- The `UpdateTraceState` function makes a chain of calls to the `UpdateTraceState` functions as returned by the delegates, passing the received `Tracestate` as argument to subsequent calls and returning the last value received. + +Each delegate sampler MUST be given a chance to participate in calculating the `SamplingIntent` as described above and MUST see the same argument values. The order of the delegate samplers does not affect the final sampling `Decision`. + +### ConsistentRateLimiting + +`ConsistentRateLimiting` is a composite sampler that helps control the average rate of sampled spans while allowing another sampler (the delegate) to provide sampling hints. + +#### Required Arguments for ConsistentRateLimiting + +- ComposableSampler (delegate) +- maximum sampling (throughput) target rate + +The sampler SHOULD measure and keep the average rate of incoming spans, and therefore also of the desired ratio between the incoming span rate to the target span rate. +Upon invocation of its `GetSamplingIntent` function, the composite sampler MUST get the `SamplingIntent` from the delegate sampler, passing the same arguments as received. + +The returned `SamplingIntent` is constructed as follows. + +- If using the obtained threshold value as the final threshold would entail sampling more spans than the declared target rate, the sampler SHOULD increase the threshold to a value that would meet the target rate. Several algorithms can be used for threshold adjustment, no particular behavior is prescribed by the specification though. +- The `GetAttributes` function returns the union of the set of `Attributes` returned by calling the delegate's `GetAttributes` and own `Attributes`. +- The `UpdateTraceState` function returns the `Tracestate` as returned by calling `UpdateTraceState` from the delegate's `SamplingIntent`. + +TO DO: consider introducing a `ConsistentConjuntion` sampler (similar to `Conjunction` from Approach One) that would generalize the relationship between the delegate and the principal sampler, and remove the explicit delegate from `ConsistentRateLimiting`. + +## Summary - Approach Two + +### Example - sampling configuration with Approach Two + +With the samplers introduced by Approach Two, our example requirements can be coded in a very similar way as with Approach One. However, the work of the samplers configured this way forms a tree of `GetSamplingIntent` invocations rather than `ShouldSample` invocations as in Approach One. + +``` +S = ConsistentRateLimiting( + ConsistentAnyOf( + ConsistentParentBased( + ConsistentRuleBased(ROOT, { + (http.target == /healthcheck) => ConsistentAlwaysOff, + (http.target == /checkout) => ConsistentAlwaysOn, + true => ConsistentFixedThreshold(0.25) + }), + ConsistentRuleBased(CLIENT, { + (http.url == /foo) => ConsistentAlwaysOn + } + ), + 1000 * 60 + ) +``` + +### Limitations of composite samplers in Approach Two + +Making sampling decisions with samplers from Approach Two is more efficient than in Approach One, especially if, platform permitting, `null` values can be used for `GetAttributes` and `UpdateTraceState` functions to represent the prevailing trivial cases of _no-new-attributes_ and _no-special-trace-state-keys_. The only limitation of this approach that it operates exclusively within the domain of Composable Samplers (a subset of Consistent Probability Samplers). + +Developers of Composable Samplers should consider that the sampling Decision they declare as their intent might be different from the final sampling Decision. + +### Prototyping + +A prototype implementation of ComposableSamplers for Java is available, see [ConsistentSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentSampler.java) and its subclasses. + +## Prior art + +A number of composite samplers are already available as independent contributions +([RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java), +[Stratified Sampling](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/stratified-sampling-example), +LinksBasedSampler [for Java](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/LinksBasedSampler.java) +and [for DOTNET](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/links-based-sampler)). +Also, historically, some Span categorization was introduced by [JaegerRemoteSampler](https://www.jaegertracing.io/docs/1.54/sampling/#remote-sampling). + +This proposal aims at generalizing these ideas, and at providing a bit more formal specification for the behavior of the composite samplers. From 19d67831c4928cdd37f141ad222d9d7abb9ed1b0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Robert=20Paj=C4=85k?= Date: Fri, 6 Dec 2024 00:39:36 +0100 Subject: [PATCH 2/9] Remove the recommendation to not synchronize access to Config.disabled (#4310) --- CHANGELOG.md | 6 ++++++ specification/logs/sdk.md | 4 +--- specification/metrics/sdk.md | 4 +--- specification/trace/sdk.md | 4 +--- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1e848eb50af..a781750a2d2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,16 +16,22 @@ release. - Add in-development support for `otlp/stdout` exporter via `OTEL_TRACES_EXPORTER`. ([#4183](https://github.com/open-telemetry/opentelemetry-specification/pull/4183)) +- Remove the recommendation to not synchronize access to `TracerConfig.disabled`. + ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) ### Metrics - Add in-development support for `otlp/stdout` exporter via `OTEL_METRICS_EXPORTER`. ([#4183](https://github.com/open-telemetry/opentelemetry-specification/pull/4183)) +- Remove the recommendation to not synchronize access to `MeterConfig.disabled`. + ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) ### Logs - Add in-development support for `otlp/stdout` exporter via `OTEL_LOGS_EXPORTER`. ([#4183](https://github.com/open-telemetry/opentelemetry-specification/pull/4183)) +- Remove the recommendation to not synchronize access to `LoggerConfig.disabled`. + ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) ### Events diff --git a/specification/logs/sdk.md b/specification/logs/sdk.md index d382cde2447..190c723dda4 100644 --- a/specification/logs/sdk.md +++ b/specification/logs/sdk.md @@ -190,9 +190,7 @@ It consists of the following parameters: is [Enabled](./api.md#enabled). If `disabled` is `true`, `Enabled` returns `false`. If `disabled` is `false`, `Enabled` returns `true`. It is not necessary for implementations to ensure that changes to `disabled` are - immediately visible to callers of `Enabled`. I.e. atomic, volatile, - synchronized, or equivalent memory semantics to avoid stale reads are - discouraged to prioritize performance over immediate consistency. + immediately visible to callers of `Enabled`. ## Additional LogRecord interfaces diff --git a/specification/metrics/sdk.md b/specification/metrics/sdk.md index 325c252fd7e..003b1538e66 100644 --- a/specification/metrics/sdk.md +++ b/specification/metrics/sdk.md @@ -973,9 +973,7 @@ default `MeterConfig.disabled=false` and instruments use the default aggregation when no matching views match the instrument. It is not necessary for implementations to ensure that changes -to `MeterConfig.disabled` are immediately visible to callers of `Enabled`. I.e. -atomic, volatile, synchronized, or equivalent memory semantics to avoid stale -reads are discouraged to prioritize performance over immediate consistency. +to `MeterConfig.disabled` are immediately visible to callers of `Enabled`. ## Attribute limits diff --git a/specification/trace/sdk.md b/specification/trace/sdk.md index 765edc4b296..03f5b629a35 100644 --- a/specification/trace/sdk.md +++ b/specification/trace/sdk.md @@ -184,9 +184,7 @@ It consists of the following parameters: is [Enabled](./api.md#enabled). If `disabled` is `true`, `Enabled` returns `false`. If `disabled` is `false`, `Enabled` returns `true`. It is not necessary for implementations to ensure that changes to `disabled` are - immediately visible to callers of `Enabled`. I.e. atomic, volatile, - synchronized, or equivalent memory semantics to avoid stale reads are - discouraged to prioritize performance over immediate consistency. + immediately visible to callers of `Enabled`. ## Additional Span Interfaces From aa4bf33591afde4bd1f5ab103b00bbf9b1718553 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Fri, 6 Dec 2024 09:27:04 -0800 Subject: [PATCH 3/9] Deprecate Event API and SDK in favor of Emit Event in the Log API (#4319) --- CHANGELOG.md | 3 +++ spec-compliance-matrix.md | 21 +-------------------- specification/README.md | 1 - specification/logs/README.md | 4 +--- specification/logs/event-api.md | 13 ++----------- specification/logs/event-sdk.md | 3 ++- specification/versioning-and-stability.md | 4 +--- 7 files changed, 10 insertions(+), 39 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a781750a2d2..dcc08f63fca 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -35,6 +35,9 @@ release. ### Events +- Deprecate Events API and SDK in favor of having Events support in the Logs API and SDK. + ([#4319](https://github.com/open-telemetry/opentelemetry-specification/pull/4319)) + ### Baggage ### Resource diff --git a/spec-compliance-matrix.md b/spec-compliance-matrix.md index d2c47e48ea5..192e9aab745 100644 --- a/spec-compliance-matrix.md +++ b/spec-compliance-matrix.md @@ -193,6 +193,7 @@ Disclaimer: this list of features is still a work in progress, please refer to t | LoggerProvider.Shutdown | | | + | | + | | | + | | + | - | | | LoggerProvider.ForceFlush | | | + | | + | | | + | | + | - | | | Logger.Emit(LogRecord) | | | + | | + | | | + | | + | - | | +| Logger.EmitEvent(LogRecord) | | | | | | | | | | | | | | Logger.Enabled | X | + | | | | | | | + | + | | | | SimpleLogRecordProcessor | | | + | | + | | | + | | + | | | | BatchLogRecordProcessor | | | + | | + | | | + | | + | | | @@ -203,26 +204,6 @@ Disclaimer: this list of features is still a work in progress, please refer to t | Can plug custom LogRecordExporter | | | + | | + | | | + | | + | | | | Trace Context Injection | | | + | | + | | | + | | + | + | | -## Events - -Features for the [Events API](specification/logs/event-api.md) and the [Events SDK](specification/logs/event-sdk.md). -Disclaimer: Events are currently in Development status - work in progress. - -| Feature | Optional | Go | Java | JS | Python | Ruby | Erlang | PHP | Rust | C++ | .NET | Swift | -|----------------------------------------------------------------------------|----------|----|------|----|--------|------|--------|-----|------|-----|------|-------| -| [EventLoggerProvider](specification/logs/event-api.md#eventloggerprovider) | Optional | Go | Java | JS | Python | Ruby | Erlang | PHP | Rust | C++ | .NET | Swift | -| Get EventLogger | | | | | | | | | | | | | -| Get EventLogger accepts version | X | | | | | | | | | | | | -| Get EventLogger accepts schema_url | X | | | | | | | | | | | | -| Get EventLogger accepts attributes | X | | | | | | | | | | | | -| [EventLogger](specification/logs/event-api.md#eventlogger) | Optional | Go | Java | JS | Python | Ruby | Erlang | PHP | Rust | C++ | .NET | Swift | -| Emit event accepts name | | | | | | | | | | | | | -| Emit event accepts AnyValue body | X | | | | | | | | | | | | -| Emit event accepts severity | X | | | | | | | | | | | | -| Emit event accepts timestamp | X | | | | | | | | | | | | -| Emit event accepts attributes | X | | | | | | | | | | | | -| Emit event accepts context | X | | | | | | | | | | | | - ## Resource | Feature | Optional | Go | Java | JS | Python | Ruby | Erlang | PHP | Rust | C++ | .NET | Swift | diff --git a/specification/README.md b/specification/README.md index ecc8ee3a033..32a653e04d6 100644 --- a/specification/README.md +++ b/specification/README.md @@ -33,7 +33,6 @@ path_base_for_github_subdir: - [Metrics](metrics/api.md) - [Logs](logs/README.md) - [API](logs/api.md) - - [Event API](logs/event-api.md) - SDK Specification - [Tracing](trace/sdk.md) - [Metrics](metrics/sdk.md) diff --git a/specification/logs/README.md b/specification/logs/README.md index ecf0e84d0cd..38d611d779b 100644 --- a/specification/logs/README.md +++ b/specification/logs/README.md @@ -230,8 +230,7 @@ processor. ### Infrastructure Logs These are logs generated by various infrastructure components, such as -Kubernetes events (if you are wondering why events are discussed in the context -of logs see [Event API Data model](./event-api.md#event-data-model)). Like system logs, the +Kubernetes events. Like system logs, the infrastructure logs lack a trace context and can be enriched by the resource context - information about the node, pod, container, etc. @@ -447,7 +446,6 @@ standard output. * [Logs API](./api.md) * [Logs SDK](./sdk.md) * [Logs Data Model](./data-model.md) -* [Event API](./event-api.md) * [Trace Context in non-OTLP Log Formats](../compatibility/logging_trace_context.md) ## References diff --git a/specification/logs/event-api.md b/specification/logs/event-api.md index 5a89336cde0..7d17f7f9e74 100644 --- a/specification/logs/event-api.md +++ b/specification/logs/event-api.md @@ -1,6 +1,7 @@ # Events API -**Status**: [Development](../document-status.md) +**Status**: [Deprecated](../document-status.md) (was never stabilized), +see [Emit Event](./api.md#emit-an-event) in the Logs API for replacement.
Table of Contents @@ -9,7 +10,6 @@ -- [Logs API Development](#logs-api-development) - [Event Data model](#event-data-model) - [Event API use cases](#event-api-use-cases) - [EventLoggerProvider](#eventloggerprovider) @@ -31,15 +31,6 @@ The Event API consists of these main components: provides access to `EventLogger`s. * [EventLogger](#eventlogger) is the component responsible for emitting events. -## Logs API Development - -> [!NOTE] -> We are currently in the process of defining a new [Logs API](./api.md). - -The intent is that this Logs API will incorporate the current functionality of this existing Events API and once it is defined and implemented, the Events API usage will be migrated, deprecated, renamed and eventually removed. - -No further work is scheduled for the current Events API definition at this time. - ## Event Data model Wikipedia’s [definition of log file](https://en.wikipedia.org/wiki/Log_file): diff --git a/specification/logs/event-sdk.md b/specification/logs/event-sdk.md index b0322f1f77b..25a66a1734b 100644 --- a/specification/logs/event-sdk.md +++ b/specification/logs/event-sdk.md @@ -1,6 +1,7 @@ # Events SDK -**Status**: [Development](../document-status.md) +**Status**: [Deprecated](../document-status.md) (was never stabilized), +see the [Logs SDK](./sdk.md) and [Emit Event](./api.md#emit-an-event) in the Logs API for replacement.
Table of Contents diff --git a/specification/versioning-and-stability.md b/specification/versioning-and-stability.md index 570aa11ba9e..22f9f8e96b0 100644 --- a/specification/versioning-and-stability.md +++ b/specification/versioning-and-stability.md @@ -247,9 +247,7 @@ Semantic Conventions defines the set of fields in the OTLP data model: - The attribute keys provided on the LogRecord - The attribute values provided on the LogRecord that are defined in a list of well-known values. - - For log records that are [Log Events](logs/event-api.md) - - The following data provided to [emit event](logs/event-api.md#emit-event): - - The event name (the value of the `event.name` attribute) + - The event name (the value of the `event.name` attribute) Things not listed in the above are not expected to remain stable via semantic convention and are allowed (or expected) to change. A few examples: From 51cb58cddbb3abe701f8ed27f063098e03b71fa2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Robert=20Paj=C4=85k?= Date: Fri, 6 Dec 2024 18:39:02 +0100 Subject: [PATCH 4/9] [Logs] Remove the in-development isolating log record processor (#4301) --- CHANGELOG.md | 2 ++ specification/logs/sdk.md | 23 +---------------------- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index dcc08f63fca..08ce1ef2a1b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -32,6 +32,8 @@ release. ([#4183](https://github.com/open-telemetry/opentelemetry-specification/pull/4183)) - Remove the recommendation to not synchronize access to `LoggerConfig.disabled`. ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) +- Remove the in-development isolating log record processor. + ([#4301](https://github.com/open-telemetry/opentelemetry-specification/pull/4301)) ### Events diff --git a/specification/logs/sdk.md b/specification/logs/sdk.md index 190c723dda4..1de732275fd 100644 --- a/specification/logs/sdk.md +++ b/specification/logs/sdk.md @@ -28,7 +28,6 @@ * [Built-in processors](#built-in-processors) + [Simple processor](#simple-processor) + [Batching processor](#batching-processor) - + [Isolating processor](#isolating-processor) - [LogRecordExporter](#logrecordexporter) * [LogRecordExporter operations](#logrecordexporter-operations) + [Export](#export) @@ -235,8 +234,7 @@ the following information added to the [LogRecord](data-model.md#log-and-event-r * [`TraceFlags`](./data-model.md#field-traceflags) The SDK MAY provide an operation that makes a deep clone of a `ReadWriteLogRecord`. -The operation can be used to implement the [isolating processor](#isolating-processor) -or by asynchronous processors (e.g. [Batching processor](#batching-processor)) +The operation can be used by asynchronous processors (e.g. [Batching processor](#batching-processor)) to avoid race conditions on the log record that is not required to be concurrent safe. @@ -383,10 +381,6 @@ make the flush timeout configurable. The standard OpenTelemetry SDK MUST implement both simple and batch processors, as described below. -**Status**: [Development](../document-status.md) - -The standard OpenTelemetry SDK SHOULD implement an isolating processor, -as described below. - Other common processing scenarios SHOULD be first considered for implementation out-of-process in [OpenTelemetry Collector](../overview.md#collector). @@ -426,21 +420,6 @@ to make sure that they are not invoked concurrently. * `maxExportBatchSize` - the maximum batch size of every export. It must be smaller or equal to `maxQueueSize`. The default value is `512`. -#### Isolating processor - -**Status**: [Development](../document-status.md) - -This is an implementation of `LogRecordProcessor` ensuring the log record -passed to `OnEmit` of the configured `processor` does not share mutable data -with subsequent registered processors. -For example, the `OnEmit` implementation of the isolating processor can be -a decorator that makes a deep copy of the log record before passing it to -the configured `processor`. - -**Configurable parameters:** - -* `processor` - processor to be isolated. - ## LogRecordExporter `LogRecordExporter` defines the interface that protocol-specific exporters must From 52e9be7b61da92afb27b4b2f2dc28c03450758c3 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Fri, 6 Dec 2024 11:20:05 -0800 Subject: [PATCH 5/9] Remove auto-assignment of PRs (#4329) --- .github/auto_assign_issue.yml | 22 ---------------------- .github/auto_assign_pr.yml | 25 ------------------------- .github/workflows/auto-assign-pr.yml | 14 -------------- 3 files changed, 61 deletions(-) delete mode 100644 .github/auto_assign_issue.yml delete mode 100644 .github/auto_assign_pr.yml delete mode 100644 .github/workflows/auto-assign-pr.yml diff --git a/.github/auto_assign_issue.yml b/.github/auto_assign_issue.yml deleted file mode 100644 index 5fe775fc287..00000000000 --- a/.github/auto_assign_issue.yml +++ /dev/null @@ -1,22 +0,0 @@ -# Set to true to add reviewers to issues/PRs -addReviewers: false - -# Set to true to add assignees to issues/PRs -addAssignees: true - -# A list of assignees, overrides reviewers if set -assignees: - - arminru - - bogdandrutu - - carlosalberto - - jack-berg - - jmacd - - jsuereth - - reyang - - tigrannajaryan - - yurishkuro - -# A number of assignees to add to the issues/PRs -# Set to 0 to add all of the assignees. -# Uses numberOfReviewers if unset. -numberOfAssignees: 1 diff --git a/.github/auto_assign_pr.yml b/.github/auto_assign_pr.yml deleted file mode 100644 index 4c9716df204..00000000000 --- a/.github/auto_assign_pr.yml +++ /dev/null @@ -1,25 +0,0 @@ -# Set to true to add reviewers to pull requests -addReviewers: false - -# Set to true to add assignees to pull requests -addAssignees: true - -# Set to true to add assignees from different groups to pull requests -useAssigneeGroups: true - -# A list of assignees, split into different groups, to be added to pull requests (GitHub user name) -assigneeGroups: - tc: - - arminru - - bogdandrutu - - carlosalberto - - jack-berg - - jmacd - - jsuereth - - reyang - - tigrannajaryan - - yurishkuro - -# A number of assignees added to the pull request -# Set 0 to add all the assignees (default: 0) -numberOfAssignees: 1 diff --git a/.github/workflows/auto-assign-pr.yml b/.github/workflows/auto-assign-pr.yml deleted file mode 100644 index 694e4fea22b..00000000000 --- a/.github/workflows/auto-assign-pr.yml +++ /dev/null @@ -1,14 +0,0 @@ -name: 'Auto Assign PR' -on: - pull_request_target: - types: [opened, ready_for_review] - -jobs: - add-owner: - runs-on: ubuntu-latest - steps: - - name: run - uses: kentaro-m/auto-assign-action@v1.1.1 - with: - configuration-path: ".github/auto_assign_pr.yml" - repo-token: '${{ secrets.GITHUB_TOKEN }}' From a265ae0628177be25dc477ea8fe200f1c825b871 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Fri, 6 Dec 2024 11:23:29 -0800 Subject: [PATCH 6/9] Change `event.name` attribute into top-level event name field (#4320) --- CHANGELOG.md | 2 ++ specification/logs/api.md | 4 ++-- specification/logs/data-model.md | 14 +++++++++++++- specification/logs/sdk.md | 1 + specification/versioning-and-stability.md | 2 +- 5 files changed, 19 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 08ce1ef2a1b..852e94c7700 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -34,6 +34,8 @@ release. ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) - Remove the in-development isolating log record processor. ([#4301](https://github.com/open-telemetry/opentelemetry-specification/pull/4301)) +- Change `event.name` attribute into top-level event name field. + ([#4320](https://github.com/open-telemetry/opentelemetry-specification/pull/4320)) ### Events diff --git a/specification/logs/api.md b/specification/logs/api.md index aa7638351a6..f04062c1348 100644 --- a/specification/logs/api.md +++ b/specification/logs/api.md @@ -137,6 +137,7 @@ The API MUST accept the following parameters: - [Severity Text](./data-model.md#field-severitytext) (optional) - [Body](./data-model.md#field-body) (optional) - [Attributes](./data-model.md#field-attributes) (optional) +- **Status**: [Development](../document-status.md) - [Event Name](./data-model.md#event-name) (optional) #### Enabled @@ -173,8 +174,7 @@ formatted as an [event](./data-model.md#events). **Parameters:** -* The [`Name`](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/events.md) - of the Event. +* [Event Name](./data-model.md#event-name) (required) * [Timestamp](./data-model.md#field-timestamp) (optional) * [Observed Timestamp](./data-model.md#field-observedtimestamp) (optional). If unspecified the implementation SHOULD set it equal to the current time. diff --git a/specification/logs/data-model.md b/specification/logs/data-model.md index c7e88dd745b..3fed63c0ba4 100644 --- a/specification/logs/data-model.md +++ b/specification/logs/data-model.md @@ -1,6 +1,6 @@ # Logs Data Model -**Status**: [Stable](../document-status.md) +**Status**: [Stable](../document-status.md), except where otherwise specified
Table of Contents @@ -34,6 +34,7 @@ * [Field: `InstrumentationScope`](#field-instrumentationscope) * [Field: `Attributes`](#field-attributes) + [Errors and Exceptions](#errors-and-exceptions) + * [Field: `EventName`](#field-eventname) - [Example Log Records](#example-log-records) - [Example Mappings](#example-mappings) - [References](#references) @@ -208,6 +209,7 @@ Body |The body of the log record. Resource |Describes the source of the log. InstrumentationScope|Describes the scope that emitted the log. Attributes |Additional information about the event. +**Status**: [Development](../document-status.md) - EventName | Name that identifies the class / type of event. Below is the detailed description of each field. @@ -477,6 +479,16 @@ of the record. If included, they MUST follow the OpenTelemetry [semantic conventions for exception-related attributes](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md). +### Field: `EventName` + +**Status**: [Development](../document-status.md) + +Type: string. + +Description: Name that identifies the class / type of the [Event](#events). +This name SHOULD uniquely identify the event structure (both attributes and body). +A log record with a non-empty event name is an [Event](#events). + ## Example Log Records For example log records see diff --git a/specification/logs/sdk.md b/specification/logs/sdk.md index 1de732275fd..930d900c3f4 100644 --- a/specification/logs/sdk.md +++ b/specification/logs/sdk.md @@ -232,6 +232,7 @@ the following information added to the [LogRecord](data-model.md#log-and-event-r * [`TraceId`](./data-model.md#field-traceid) * [`SpanId`](./data-model.md#field-spanid) * [`TraceFlags`](./data-model.md#field-traceflags) +* **Status**: [Development](../document-status.md) - [`EventName`](./data-model.md#event-name) The SDK MAY provide an operation that makes a deep clone of a `ReadWriteLogRecord`. The operation can be used by asynchronous processors (e.g. [Batching processor](#batching-processor)) diff --git a/specification/versioning-and-stability.md b/specification/versioning-and-stability.md index 22f9f8e96b0..a9a9ce9c9d4 100644 --- a/specification/versioning-and-stability.md +++ b/specification/versioning-and-stability.md @@ -247,7 +247,7 @@ Semantic Conventions defines the set of fields in the OTLP data model: - The attribute keys provided on the LogRecord - The attribute values provided on the LogRecord that are defined in a list of well-known values. - - The event name (the value of the `event.name` attribute) + - The event name Things not listed in the above are not expected to remain stable via semantic convention and are allowed (or expected) to change. A few examples: From 0d046c07e2b24bc2405a3e1d367f93ab102d889d Mon Sep 17 00:00:00 2001 From: Carlos Alberto Cortez Date: Thu, 12 Dec 2024 22:31:31 -0600 Subject: [PATCH 7/9] Release 1.40.0 (#4328) December 2024 Release. Mostly in-development additions, e.g. * https://github.com/open-telemetry/opentelemetry-specification/pull/4183 * https://github.com/open-telemetry/opentelemetry-specification/pull/4295 --- CHANGELOG.md | 52 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 14 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 852e94c7700..21208c00da5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,36 @@ release. ### Context +### Traces + +### Metrics + +### Logs + +### Events + +### Baggage + +### Resource + +### Profiles + +### OpenTelemetry Protocol + +### Compatibility + +### SDK Configuration + +### Common + +### Supplementary Guidelines + +### OTEPs + +## v1.40.0 (2024-12-12) + +### Context + - Adds optional `GetAll` method to `Getter` in Propagation API, allowing for the retrieval of multiple values for the same key. [#4295](https://github.com/open-telemetry/opentelemetry-specification/pull/4295) @@ -34,25 +64,13 @@ release. ([#4310](https://github.com/open-telemetry/opentelemetry-specification/pull/4310)) - Remove the in-development isolating log record processor. ([#4301](https://github.com/open-telemetry/opentelemetry-specification/pull/4301)) -- Change `event.name` attribute into top-level event name field. - ([#4320](https://github.com/open-telemetry/opentelemetry-specification/pull/4320)) ### Events - Deprecate Events API and SDK in favor of having Events support in the Logs API and SDK. ([#4319](https://github.com/open-telemetry/opentelemetry-specification/pull/4319)) - -### Baggage - -### Resource - -### Profiles - -### OpenTelemetry Protocol - -### Compatibility - -### SDK Configuration +- Change `event.name` attribute into top-level event name field. + ([#4320](https://github.com/open-telemetry/opentelemetry-specification/pull/4320)) ### Common @@ -64,6 +82,12 @@ release. - Add core principles for evaluating specification changes. ([#4286](https://github.com/open-telemetry/opentelemetry-specification/pull/4286)) +## OTEPs + +- The [open-telemetry/oteps](https://github.com/open-telemetry/oteps) repository was + merged into the specification repository. + ([#4288](https://github.com/open-telemetry/opentelemetry-specification/pull/4288)) + ## v1.39.0 (2024-11-06) ### Logs From 37ac8ca431213b27a5c3c7ce5b54477bb23996f4 Mon Sep 17 00:00:00 2001 From: Peter Findeisen Date: Wed, 4 Dec 2024 13:58:33 -0800 Subject: [PATCH 8/9] Migrating from the old repository at https://github.com/open-telemetry/oteps/pull/250 --- oteps/0250-Composite_Samplers.md | 297 +++++++++++++++++++++++++++++++ 1 file changed, 297 insertions(+) create mode 100644 oteps/0250-Composite_Samplers.md diff --git a/oteps/0250-Composite_Samplers.md b/oteps/0250-Composite_Samplers.md new file mode 100644 index 00000000000..49debadf3b0 --- /dev/null +++ b/oteps/0250-Composite_Samplers.md @@ -0,0 +1,297 @@ +# Composite Samplers Proposal + +This proposal addresses head-based sampling as described by the [Open Telemetry SDK](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling). +It introduces additional _composite samplers_. +Composite samplers use other samplers (_delegates_ or _children_) to make sampling decisions. +The composite samplers invoke the delegate samplers, but eventually make the final call. + +Some of the new samplers proposed here have been designed to work with Consistent Probability Samplers. For detailed description of this concept see [probability sampling (OTEP 235)](https://github.com/open-telemetry/oteps/blob/main/text/trace/0235-sampling-threshold-in-trace-state.md). +Also see Draft PR 3910 [Probability Samplers based on W3C Trace Context Level 2](https://github.com/open-telemetry/opentelemetry-specification/pull/3910). + +## Motivation + +The need for configuring head sampling has been explicitly or implicitly indicated in several discussions, both within the [Sampling SIG](https://docs.google.com/document/d/1gASMhmxNt9qCa8czEMheGlUW2xpORiYoD7dBD7aNtbQ) and in the wider community. +Some of the discussions are going back a number of years, see for example + +- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages? +- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing +- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler + +Unfortunately, some of the valuable ideas flowing at the sampling SIG meetings never got recorded at the time of their inception, but see [Sampling SIG Research Notes](https://github.com/open-telemetry/oteps/pull/213) or the comments under [OTEP 240: A Sampling Configuration proposal](https://github.com/open-telemetry/oteps/pull/240) for some examples. + +## The Goal + +The goal of this proposal is to help creating advanced sampling configurations using pre-defined building blocks. Let's consider the following example of sampling requirements. It is believed that many users will have requirements following a similar pattern. Most notable elements here are trace classification based on target URL, some spans requiring special handling, and putting a sanity cap on the total volume of exported spans. + +### Example + +Head-based sampling requirements. + +- for root spans: + - drop all `/healthcheck` requests + - capture all `/checkout` requests + - capture 25% of all other requests +- for non-root spans + - follow the parent sampling decision + - however, capture all calls to service `/foo` (even if the trace will be incomplete) +- in any case, do not exceed 1000 spans/minute + +We present two quite different approaches to composite samplers. The first one uses only the current [sampling API](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling). +It can be applied to a large variety of samplers, but may not work correctly for Consistent Probability Samplers. It is also not very efficient nor elegant. + +The second approach is applicable exclusively to Consistent Probability Samplers, and is more efficient and less prone to misconfiguration. It requires additional API to be provided by the delegate samplers. + +__Note__: both approaches call for calculating _unions_ of Attribute sets. +Whenever such union is constructed, in case of conflicting attribute keys, the attribute definition from the last set that uses that key takes effect. Similarly, whenever modifications of `Tracestate` are performed in sequence, in case of conflicting keys, the last modification erases the previous values. + +## Approach One + +The following new composite samplers are proposed. + +### AnyOf + +`AnyOf` is a composite sampler which takes a non-empty list of Samplers (delegates) as the argument. The intention is to make `RECORD_AND_SAMPLE` decision if __any of__ the delegates decides to `RECORD_AND_SAMPLE`. + +Upon invocation of its `shouldSample` method, it MUST go through the whole list and invoke `shouldSample` method on each delegate sampler, passing the same arguments as received, and collecting the delegates' sampling Decisions. + +`AnyOf` sampler MUST return a `SamplingResult` with the following elements. + +- If all of the delegate Decisions are `DROP`, the composite sampler MUST return `DROP` Decision as well. +If any of the delegate Decisions is `RECORD_AND_SAMPLE`, the composite sampler MUST return `RECORD_AND_SAMPLE` Decision. +Otherwise, if any of the delegate Decisions is `RECORD_ONLY`, the composite sampler MUST return `RECORD_ONLY` Decision. +- The set of span `Attributes` to be added to the `Span` is the union of the sets of `Attributes` as provided by those delegate samplers which produced a sampling Decision other than `DROP`. +- The `Tracestate` to be used with the new `Span` is obtained by cumulatively applying all the potential modifications of the parent `Tracestate` by the delegate samplers. + +Each delegate sampler MUST be given a chance to participate in the sampling decision as described above and MUST see the same _parent_ state. The resulting sampling Decision does not depend on the order of the delegate samplers. + +### Conjunction + +`Conjunction` is a composite sampler which takes two Samplers (delegates) as the arguments. These delegate samplers will be hereby referenced as First and Second. This kind of composition forms conditional chaining of both samplers. + +Upon invocation of its `shouldSample` method, the Conjunction sampler MUST invoke `shouldSample` method on the First sampler, passing the same arguments as received, and examine the received sampling Decision. +Upon receiving `DROP` or `RECORD_ONLY` decision it MUST return the `SamplingResult` (which includes a set of `Attributes` and `Tracestate` in addition to the sampling Decision) from the First sampler, and it MUST NOT proceed with querying the Second sampler. +If the sampling decision from the First sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST invoke `shouldSample` method on the Second sampler, effectively passing the `Tracestate` received from the First sampler as the parent trace state. + +If the sampling Decision from the Second sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `RECORD_AND_SAMPLE`. +- The set of span `Attributes` to be added to the `Span` is the union of the sets of `Attributes` as provided by both samplers. +- The `Tracestate` to be used with the new `Span` is as provided by the Second sampler. + +If the sampling Decision from the Second sampler is `RECORD_ONLY`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `RECORD_ONLY`. +- The set of span `Attributes` to be added to the `Span` is the set of `Attributes` returned by the First sampler. +- The `Tracestate` to be used with the new `Span` is the `Tracestate` provided by the Second sampler. + +If the sampling Decision from the Second sampler is `DROP`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows: + +- The sampling Decision is `DROP`. +- The set of span `Attributes` to be added to the `Span` is empty. +- The `Tracestate` to be used with the new `Span` is the `Tracestate` provided by the Second sampler. + +### RuleBased + +`RuleBased` is a composite sampler which performs `Span` categorization (e.g. when sampling decision depends on `Span` attributes) and sampling. +The Spans can be grouped into separate categories, and each category can use a different Sampler. +Categorization of Spans is aided by `Predicates`. + +#### Predicate + +The Predicates represent logical expressions which can access `Span` `Attributes` (or anything else available when the sampling decision is to be made), and perform tests on the accessible values. +For example, one can test if the target URL for a SERVER span matches a given pattern. +`Predicate` interface allows users to create custom categories based on information that is available at the time of making the sampling decision. + +##### SpanMatches + +This is a routine/function/method for `Predicate`, which returns `true` if a given `Span` matches, i.e. belongs to the category described by the Predicate. + +##### Required Arguments for Predicates + +The arguments represent the values that are made available for `ShouldSample`. + +- `Context` with parent `Span`. +- `TraceId` of the `Span` to be created. +- Name of the `Span` to be created. +- Initial set of `Attributes` of the `Span` to be created. +- Collection of links that will be associated with the `Span` to be created. + +#### Required Arguments for RuleBased + +- `SpanKind` +- list of pairs (`Predicate`, `Sampler`) + +For making the sampling decision, if the `Span` kind matches the specified kind, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the same arguments as received by `ShouldSample`. If a call returns `true`, the corresponding `Sampler` will be called to make the final sampling decision. If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the final decision is `DROP`. + +The order of `Predicate`s is essential. If more than one `Predicate` matches a `Span`, only the Sampler associated with the first matching `Predicate` will be used. + +## Summary - Approach One + +### Example - sampling configuration + +Going back to our example of sampling requirements, we can now configure the head sampler to support this particular case, using an informal notation of samplers and their arguments. +First, let's express the requirements for the ROOT spans as follows. + +``` +S1 = RuleBased(ROOT, { + (http.target == /healthcheck) => AlwaysOff, + (http.target == /checkout) => AlwaysOn, + true => TraceIdRatioBased(0.25) + }) +``` + +Note: technically, `ROOT` is not a `SpanKind`, but is a special token matching all Spans with invalid parent context (i.e. the ROOT spans, regardless of their kind). + +In the next step, we can build the sampler to handle non-root spans as well: + +``` +S2 = ParentBased(S1) +``` + +The special case of calling service `/foo` can now be supported by: + +``` +S3 = AnyOf(S2, RuleBased(CLIENT, { (http.url == /foo) => AlwaysOn }) +``` + +Finally, the last step is to put a limit on the stream of exported spans. One of the available rate limiting sampler that we can use is Jaeger [RateLimitingSampler](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java): + +``` +S4 = Conjunction(S3, RateLimitingSampler(1000 * 60)) +``` + +### Limitations of composite samplers in Approach One + +Not all samplers can participate as components of composite samplers without undesired or unexpected effects. Some samplers require that they _see_ each `Span` being created, even if the span is going to be dropped. Some samplers update the trace state or maintain internal state, and for their correct behavior it it is assumed that their sampling decisions will be honored by the tracer at the face value in all cases. A good example for this are rate limiting samplers which have to keep track of the rate of created spans and/or the rate of positive sampling decisions. + +The need to encode and decode the `Tracestate` multiple times affects performance of the composite samplers. This drawback is eliminated in Approach Two. + +## Approach Two + +A principle of operation for Approach Two is that `ShouldSample` is invoked only once, on the root of the tree formed by composite samplers. All the logic provided by the composition of samplers is handled by calculating the threshold values, delegating the calculation downstream as necessary. + +### New API + +To make this approach possible, all Consistent Probability Samplers which participate in the samplers composition need to implement the following API, in addition to the standard Sampler API. We will use the term _Composable Sampler_ to denote Consistent Probability Samplers which provide the new API and conform to the rules described here. +The composite samplers in Approach Two are Composable Samplers as well. + +#### GetSamplingIntent + +This is a routine/function/method for all Composable Samplers. Its purpose is to query the sampler about the activities it would perform had it been asked to make a sampling decision for a given span, however, without constructing the actual sampling Decision. + +#### Required Arguments for GetSamplingIntent + +The arguments are the same as for [`ShouldSample`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#shouldsample) except for the `TraceId`. + +- `Context` with parent `Span`. +- Name of the `Span` to be created. +- `SpanKind` of the `Span` to be created. +- Initial set of `Attributes` of the `Span` to be created. +- Collection of links that will be associated with the `Span` to be created. + +#### Return value + +The return value is a structure (`SamplingIntent`) with the following elements: + +- The THRESHOLD value represented as a 14-character hexadecimal string, with value of `null` representing non-probabilistic `DROP` decision (implementations MAY use different representation, if it appears more performant or convenient), +- A function (`GetAttributes`) that provides a set of `Attributes` to be added to the `Span` in case of positive final sampling decision, +- A function (`UpdateTraceState`) that given an input `Tracestate` and sampling Decision provides a `Tracestate` to be associated with the `Span`. The samplers SHOULD NOT add or modify the `th` value for the `ot` key within these functions. + +#### Constructing `SamplingResult` + +The process of constructing the final `SamplingResult` in response to a call to `ShouldSample` on the root sampler of the composite samplers tree consists of the following steps. + +- The sampler gets its own `SamplingIntent`, it is a recursive process as described below (unless the sampler is a leaf), +- The sampler compares the received THRESHOLD value with the trace Randomness value to arrive at the final sampling `Decision`, +- In case of a positive sampling decision the sampler calls the received `GetAttributes` function to determine the set of `Attributes` to be added to the `Span`, in most cases it will be a recursive step, +- The sampler calls the received `UpdateTraceState` function passing the parent `Tracestate` and the final sampling `Decision` to get the new `Tracestate` to be associated with the `Span` - again, in most cases this is a recursive step, +- The sampler modifies (or removes) the `th` value for the `ot` key in the `Tracestate` according to the final sampling `Decision` and the THRESHOLD used in the second step above. + +### ConsistentRuleBased + +This composite sampler re-uses the concept of Predicates from Approach One. + +#### Required Arguments for ConsistentRuleBased + +- `SpanKind` +- list of pairs (`Predicate`, `ComposableSampler`) + +For calculating the `SamplingIntent`, if the `Span` kind matches the specified kind, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the same arguments as received. If a call returns `true`, the result is as returned by `GetSamplingIntent` called on the corresponding `ComposableSampler`. If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the result is obtained by calling `GetSamplingIntent` on `ConsistentAlwaysOffSampler`. + +### ConsistentAnyOf + +`ConsistentAnyOf` is a composite sampler which takes a non-empty list of ComposableSamplers (delegates) as the argument. The intention is to make a positive sampling decision if __any of__ the delegates would make a positive decision. + +Upon invocation of its `GetSamplingIntent` function, it MUST go through the whole list and invoke `GetSamplingIntent` function on each delegate sampler, passing the same arguments as received. + +`ConsistentAnyOf` sampler MUST return a `SamplingIntent` which is constructed as follows: + +- If any of the delegates returned a non-`null` threshold value, the resulting threshold is the lexicographical minimum value from the set of those non-`null` values, otherwise `null`. +- The `GetAttributes` function calculates the union of `Attribute` sets as returned by the calls to `GetAttributes` function for each delegate, in the declared order. +- The `UpdateTraceState` function makes a chain of calls to the `UpdateTraceState` functions as returned by the delegates, passing the received `Tracestate` as argument to subsequent calls and returning the last value received. + +Each delegate sampler MUST be given a chance to participate in calculating the `SamplingIntent` as described above and MUST see the same argument values. The order of the delegate samplers does not affect the final sampling `Decision`. + +### ConsistentRateLimiting + +`ConsistentRateLimiting` is a composite sampler that helps control the average rate of sampled spans while allowing another sampler (the delegate) to provide sampling hints. + +#### Required Arguments for ConsistentRateLimiting + +- ComposableSampler (delegate) +- maximum sampling (throughput) target rate + +The sampler SHOULD measure and keep the average rate of incoming spans, and therefore also of the desired ratio between the incoming span rate to the target span rate. +Upon invocation of its `GetSamplingIntent` function, the composite sampler MUST get the `SamplingIntent` from the delegate sampler, passing the same arguments as received. + +The returned `SamplingIntent` is constructed as follows. + +- If using the obtained threshold value as the final threshold would entail sampling more spans than the declared target rate, the sampler SHOULD increase the threshold to a value that would meet the target rate. Several algorithms can be used for threshold adjustment, no particular behavior is prescribed by the specification though. +- The `GetAttributes` function returns the union of the set of `Attributes` returned by calling the delegate's `GetAttributes` and own `Attributes`. +- The `UpdateTraceState` function returns the `Tracestate` as returned by calling `UpdateTraceState` from the delegate's `SamplingIntent`. + +TO DO: consider introducing a `ConsistentConjuntion` sampler (similar to `Conjunction` from Approach One) that would generalize the relationship between the delegate and the principal sampler, and remove the explicit delegate from `ConsistentRateLimiting`. + +## Summary - Approach Two + +### Example - sampling configuration with Approach Two + +With the samplers introduced by Approach Two, our example requirements can be coded in a very similar way as with Approach One. However, the work of the samplers configured this way forms a tree of `GetSamplingIntent` invocations rather than `ShouldSample` invocations as in Approach One. + +``` +S = ConsistentRateLimiting( + ConsistentAnyOf( + ConsistentParentBased( + ConsistentRuleBased(ROOT, { + (http.target == /healthcheck) => ConsistentAlwaysOff, + (http.target == /checkout) => ConsistentAlwaysOn, + true => ConsistentFixedThreshold(0.25) + }), + ConsistentRuleBased(CLIENT, { + (http.url == /foo) => ConsistentAlwaysOn + } + ), + 1000 * 60 + ) +``` + +### Limitations of composite samplers in Approach Two + +Making sampling decisions with samplers from Approach Two is more efficient than in Approach One, especially if, platform permitting, `null` values can be used for `GetAttributes` and `UpdateTraceState` functions to represent the prevailing trivial cases of _no-new-attributes_ and _no-special-trace-state-keys_. The only limitation of this approach that it operates exclusively within the domain of Composable Samplers (a subset of Consistent Probability Samplers). + +Developers of Composable Samplers should consider that the sampling Decision they declare as their intent might be different from the final sampling Decision. + +### Prototyping + +A prototype implementation of ComposableSamplers for Java is available, see [ConsistentSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentSampler.java) and its subclasses. + +## Prior art + +A number of composite samplers are already available as independent contributions +([RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java), +[Stratified Sampling](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/stratified-sampling-example), +LinksBasedSampler [for Java](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/LinksBasedSampler.java) +and [for DOTNET](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/links-based-sampler)). +Also, historically, some Span categorization was introduced by [JaegerRemoteSampler](https://www.jaegertracing.io/docs/1.54/sampling/#remote-sampling). + +This proposal aims at generalizing these ideas, and at providing a bit more formal specification for the behavior of the composite samplers. From 1b1f3bd6d39f5f1e34dded62e341c71693c784d1 Mon Sep 17 00:00:00 2001 From: Peter Findeisen Date: Fri, 13 Dec 2024 17:04:19 -0800 Subject: [PATCH 9/9] Adding IsAdjustedCountReliable function. --- oteps/0250-Composite_Samplers.md | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/oteps/0250-Composite_Samplers.md b/oteps/0250-Composite_Samplers.md index 49debadf3b0..8653fca9327 100644 --- a/oteps/0250-Composite_Samplers.md +++ b/oteps/0250-Composite_Samplers.md @@ -194,18 +194,38 @@ The arguments are the same as for [`ShouldSample`](https://github.com/open-telem The return value is a structure (`SamplingIntent`) with the following elements: - The THRESHOLD value represented as a 14-character hexadecimal string, with value of `null` representing non-probabilistic `DROP` decision (implementations MAY use different representation, if it appears more performant or convenient), +- A function (`IsAdjustedCountReliable`) that provides a `boolean` value indicating that the adjusted count (calculated as reciprocal of the sampling probability) can be faithfully used to estimate span metrics, - A function (`GetAttributes`) that provides a set of `Attributes` to be added to the `Span` in case of positive final sampling decision, - A function (`UpdateTraceState`) that given an input `Tracestate` and sampling Decision provides a `Tracestate` to be associated with the `Span`. The samplers SHOULD NOT add or modify the `th` value for the `ot` key within these functions. +#### Requirements for the basic samplers + +The `ConsistentAlwaysOff` sampler MUST provide a `SamplingIntent` with + +- The THRESHOLD value of `null` (or equivalent), +- `IsAdjustedCountReliable` returning `false`, +- `GetAttributes` returning an empty set, +- `UpdateTraceState` returning its argument, without any modifications. + +The `ConsistentAlwaysOn` sampler MUST provide a `SamplingIntent` with + +- The THRESHOLD value of `00000000000000` (or equivalent), +- `IsAdjustedCountReliable` returning `true`, +- `GetAttributes` returning an empty set, +- `UpdateTraceState` returning its argument, without any modifications. + + #### Constructing `SamplingResult` The process of constructing the final `SamplingResult` in response to a call to `ShouldSample` on the root sampler of the composite samplers tree consists of the following steps. - The sampler gets its own `SamplingIntent`, it is a recursive process as described below (unless the sampler is a leaf), - The sampler compares the received THRESHOLD value with the trace Randomness value to arrive at the final sampling `Decision`, -- In case of a positive sampling decision the sampler calls the received `GetAttributes` function to determine the set of `Attributes` to be added to the `Span`, in most cases it will be a recursive step, - The sampler calls the received `UpdateTraceState` function passing the parent `Tracestate` and the final sampling `Decision` to get the new `Tracestate` to be associated with the `Span` - again, in most cases this is a recursive step, -- The sampler modifies (or removes) the `th` value for the `ot` key in the `Tracestate` according to the final sampling `Decision` and the THRESHOLD used in the second step above. +- In case of positive sampling decision: + - the sampler calls the received `GetAttributes` function to determine the set of `Attributes` to be added to the `Span`, in most cases it will be a recursive step, + - the sampler calls the received `IsAdjustedCountReliable` function, and in case of `true` it modifies the `th` value for the `ot` key in the `Tracestate` according to the received THRESHOLD; if the returned value is `false`, it removes the `th` value for the `ot` key from the `Tracestate`, +- In case of negative sampling decision, it removes the `th` value for the `ot` key from the `Tracestate`. ### ConsistentRuleBased @@ -226,7 +246,8 @@ Upon invocation of its `GetSamplingIntent` function, it MUST go through the whol `ConsistentAnyOf` sampler MUST return a `SamplingIntent` which is constructed as follows: -- If any of the delegates returned a non-`null` threshold value, the resulting threshold is the lexicographical minimum value from the set of those non-`null` values, otherwise `null`. +- If any of the delegates returned a non-`null` threshold value, the resulting threshold is the lexicographical minimum value T from the set of those non-`null` values, otherwise `null`. +- The `IsAdjustedCountReliable` returns `true`, if any of the delegates returning the threshold value equal to T returns `true` upon calling its `IsAdjustedCountReliable` function, otherwise it returns `false`. - The `GetAttributes` function calculates the union of `Attribute` sets as returned by the calls to `GetAttributes` function for each delegate, in the declared order. - The `UpdateTraceState` function makes a chain of calls to the `UpdateTraceState` functions as returned by the delegates, passing the received `Tracestate` as argument to subsequent calls and returning the last value received. @@ -246,8 +267,9 @@ Upon invocation of its `GetSamplingIntent` function, the composite sampler MUST The returned `SamplingIntent` is constructed as follows. -- If using the obtained threshold value as the final threshold would entail sampling more spans than the declared target rate, the sampler SHOULD increase the threshold to a value that would meet the target rate. Several algorithms can be used for threshold adjustment, no particular behavior is prescribed by the specification though. -- The `GetAttributes` function returns the union of the set of `Attributes` returned by calling the delegate's `GetAttributes` and own `Attributes`. +- If using the obtained threshold value as the final threshold would entail sampling more spans than the declared target rate, the sampler SHOULD set the threshold to a value that would meet the target rate. Several algorithms can be used for threshold adjustment, no particular behavior is prescribed by the specification though. +- The `IsAdjustedCountReliable` returns the result of calling this function on the `SamplingIntent` provided by the delegate. +- The `GetAttributes` function returns the result of calling this function on the `SamplingIntent` provided by the delegate. - The `UpdateTraceState` function returns the `Tracestate` as returned by calling `UpdateTraceState` from the delegate's `SamplingIntent`. TO DO: consider introducing a `ConsistentConjuntion` sampler (similar to `Conjunction` from Approach One) that would generalize the relationship between the delegate and the principal sampler, and remove the explicit delegate from `ConsistentRateLimiting`.