[WIP] Add MeasurementProcessor specification to Metrics SDK #4318

Blinkuu · 2024-12-03T10:32:27Z

Fixes #4298

Changes

TODO

For non-trivial changes, follow the change proposal process.

Related issues #
Related OTEP(s) #
Links to the prototypes (when adding or changing features)
CHANGELOG.md file updated for non-trivial changes
spec-compliance-matrix.md updated if necessary

TODO Closes open-telemetry#4298.

linux-foundation-easycla · 2024-12-03T10:32:31Z

The committers listed above are authorized under a signed CLA.

✅ login: Blinkuu / name: Lukasz Gut (1ce9e4d, a8f5de3, 60adbd3, 3f3186a, 9d60b12)
✅ login: pellared / name: Robert Pająk (4b0a58d, 449d2fb)

specification/metrics/sdk.md

Add status field Co-authored-by: Robert Pająk <pellared@hotmail.com>

specification/metrics/sdk.md

Co-authored-by: Reiley Yang <reyang@microsoft.com>

reyang · 2024-12-03T16:39:47Z

specification/metrics/sdk.md

+For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.
+


Do we allow the processor to "drop" the measurement (e.g. the processor decided that it doesn't want the measurement) or other operations beyond modifications on the value and attributes?

Related question (thus decided to put it here).
Shouldn't the processor also be used when evaluating Enabled?
Shouldn't we also add an OnEnabled hook?

Related comment in other issue:

Specify how Logs SDK implements Enabled #4207 (comment)

To allow processors to "drop" measurements, they must be somehow connected to the MetricsReader. I agree that it would be a cool feature to have, providing great flexibility.

The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.

I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.

As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.

Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.

the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

Providing this feature without the ability to do unit conversion or drop measurements would be a miss. Can solve the lack of knowledge about unit by providing the processor access to instrument metadata. I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

OK, I'll come around on this topic. I see how dropping metric events is a useful feature, despite the potential for difficult consequences. Dropping metric events is not very different than sampling traces at 0%. Just like 0% sampling (which we call "non probabilistic"), there is a loss of information, but that is intentional.

@jack-berg Given your statement, I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API. Tangentially, I see a potential for us to form new APIs (like OpenCensus) which accept a list of measurements atomically and apply a single timestamp (e.g., or process the dynamic context once for multiple events).

Let me pose a thought experiment. What does a MeasurementProcessor do better than you could achieve simply by wrapping a MeterProvider with a new instance containing the desired logic? I'm looking at the complexity trade-off here. I see how the desire to modify units comes about -- especially with the base-2 exponential histogram -- we see a desire to change seconds to/from milliseconds w/o loss of information as a compelling use-case. In the wrapped-MeterProvider scenario, the units-conversion wrapper would ("simply") register a new instrument with the delegate MeterProvider having different units and divide/multiply the value on its way through.

I thought of another case that I'm aware of, which calls for modifying the instrument kind, i.e., more than just a change of unit. I'm aware of use-cases for synchronous UpDownCounter instruments where the user would like to separate positive from negative values as two Counters. In this case, the two absolute value instruments convey the rate of ups and down as separate information. Still, the input-to-output mapping is 1:1.

I prefer to think of MeasurementProcessor as something like syntactic sugar for the example I described above, meaning that it can be defined abstractly as a wrapper of meter providers with a per-instrument event translation rule. There seems to be a potential -- do we know any use-cases? -- for one metric API event to translate into more than one metric API event on the wrapped meter provider. In this sense, we could define MeasurementProcessor as a per-instrument function that maps one input measurement into a list of zero or more output measurements, enabling both dropping and proliferation of events.

I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API.

@jmacd I think this makes sense. Having access to an Instrument inside the processor makes it very powerful.

I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

@jack-berg I'm reading the View specification, which explicitly mentions that views work on the "metric" level. Therefore, configuring processors on the Views (instead of on MeterProvider) would require updating the View specification as well, unless I'm misunderstanding something.

Regarding dropping Measurements, changing instrument kinds, modifying the value, or even creating new Measurements on the fly (e.g., split UpDownCounter into two counters), we could make the proposed Measure() method return an array of Measurements instead of Void.

specification/metrics/sdk.md

reyang · 2024-12-03T16:43:56Z

specification/metrics/sdk.md

+
+<!--  TODO: Should we mingle with the Exporter concept here? For metrics, the only thing we care is that Measuremenets are processed before aggregation happens   -->
+
+In particular, if any `MeasurementProcessor` has any associated exporter, it SHOULD try to call the exporter's `Export` with all `Measurements` for which this was not already done and then invoke `ForceFlush` on it. If a timeout is specified (see below), the `MeasurementProcessor` MUST prioritize honoring the timeout over finishing all calls. It MAY skip or abort some or all `Export` or `ForceFlush` calls it has made to achieve this goal.


This is going to be tricky, what about MetricReader and the pull-based exporter? (e.g. Prometheus Exporter)

Agreed, hence my comment. I referenced the Exporter here, but in the context of the Metrics SDK, it would only make sense to connect MeasurementProcessor and MetricReader.

To make it work, the MeasurementProcessor pipeline needs to exist between calls that record a new measurement, aka. Add() and Record() APIs defined on the Instrument, but before the actual aggregation happens, as at that point, we lose reference to Attributes.

So something along the lines of:

Instrument ---Record()/Add()---> MeasurementProcessor(s) ---> Views -> Aggregations -> MetricReader

However, I'm still trying to figure out if Views and Aggregations could somehow become built-in processors and whether it would even make sense.

Co-authored-by: Reiley Yang <reyang@microsoft.com>

jmacd · 2024-12-04T22:57:53Z

specification/metrics/sdk.md

+For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.
+


The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.

I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.

As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.

Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.

jmacd · 2024-12-04T23:01:44Z

specification/metrics/sdk.md

+
+`Shutdown` SHOULD complete or abort within some timeout. `Shutdown` can be implemented as a blocking API or an asynchronous API which notifies the caller via a callback or an event. OpenTelemetry SDK authors can decide if they want to make the shutdown timeout configurable.
+
+#### ForceFlush


I wonder if ForceFlush is required or even meaningful. I looked over our API and SDK specification, and I see no text explaining how a SDK should respond if there are metric events concurrent with flush and shutdown operations. Since I view MeasurementProcessor as a synchronous processor, all the same ambiguity applies. If a user is calling metric events while flushing or shutting down, the measurement processor may or may not have completed.

Since the processor is synchronous, it should have no buffered state to flush, it should just be synchronous.

+1, related to #4318 (comment)

Agree on the simplification, removed these methods from the spec (60adbd3).

github-actions · 2024-12-14T03:20:55Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Add MeasurementProcessor specification to Metrics SDK

3f3186a

TODO Closes open-telemetry#4298.

Update TODO

a8f5de3

Blinkuu commented Dec 3, 2024

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

pellared reviewed Dec 3, 2024

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

Update specification/metrics/sdk.md

1ce9e4d

Add status field Co-authored-by: Robert Pająk <pellared@hotmail.com>

reyang reviewed Dec 3, 2024

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

Update specification/metrics/sdk.md

4b0a58d

Co-authored-by: Reiley Yang <reyang@microsoft.com>

reyang reviewed Dec 3, 2024

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

reyang reviewed Dec 3, 2024

View reviewed changes

Update specification/metrics/sdk.md

449d2fb

Co-authored-by: Reiley Yang <reyang@microsoft.com>

jmacd reviewed Dec 4, 2024

View reviewed changes

Blinkuu added 2 commits December 6, 2024 12:25

Remove Shutdown and ForceFlush from MeasurementProcessor spec

60adbd3

Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec

9d60b12

github-actions bot added the Stale label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add MeasurementProcessor specification to Metrics SDK #4318

[WIP] Add MeasurementProcessor specification to Metrics SDK #4318

Blinkuu commented Dec 3, 2024

linux-foundation-easycla bot commented Dec 3, 2024 •

edited

Loading

This comment was marked as resolved.

reyang Dec 3, 2024

pellared Dec 3, 2024 •

edited

Loading

Blinkuu Dec 4, 2024

jmacd Dec 4, 2024

jack-berg Dec 4, 2024

jmacd Dec 5, 2024

Blinkuu Dec 6, 2024 •

edited

Loading

reyang Dec 3, 2024

Blinkuu Dec 4, 2024 •

edited

Loading

jmacd Dec 4, 2024

jmacd Dec 4, 2024

reyang Dec 4, 2024

Blinkuu Dec 6, 2024

github-actions bot commented Dec 14, 2024

		For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.


		<!-- TODO: Should we mingle with the Exporter concept here? For metrics, the only thing we care is that Measuremenets are processed before aggregation happens -->

		In particular, if any `MeasurementProcessor` has any associated exporter, it SHOULD try to call the exporter's `Export` with all `Measurements` for which this was not already done and then invoke `ForceFlush` on it. If a timeout is specified (see below), the `MeasurementProcessor` MUST prioritize honoring the timeout over finishing all calls. It MAY skip or abort some or all `Export` or `ForceFlush` calls it has made to achieve this goal.


		`Shutdown` SHOULD complete or abort within some timeout. `Shutdown` can be implemented as a blocking API or an asynchronous API which notifies the caller via a callback or an event. OpenTelemetry SDK authors can decide if they want to make the shutdown timeout configurable.

		#### ForceFlush

[WIP] Add MeasurementProcessor specification to Metrics SDK #4318

Are you sure you want to change the base?

[WIP] Add MeasurementProcessor specification to Metrics SDK #4318

Conversation

Blinkuu commented Dec 3, 2024

Changes

linux-foundation-easycla bot commented Dec 3, 2024 • edited Loading

This comment was marked as resolved.

Choose a reason for hiding this comment

pellared Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blinkuu Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blinkuu Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 14, 2024

linux-foundation-easycla bot commented Dec 3, 2024 •

edited

Loading

pellared Dec 3, 2024 •

edited

Loading

Blinkuu Dec 6, 2024 •

edited

Loading

Blinkuu Dec 4, 2024 •

edited

Loading