-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify boundaries of numeric env vars #4331
base: main
Are you sure you want to change the base?
Clarify boundaries of numeric env vars #4331
Conversation
@@ -108,6 +109,13 @@ gracefully ignore the setting and use the default value if it is defined. | |||
|
|||
For example, the value `12000` indicates 12000 milliseconds, i.e., 12 seconds. | |||
|
|||
### Timeout values | |||
|
|||
For variables that represent a timeout (e.g. exporter timeout), implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the "Duration" section right above the "Timeout values" section, it has mentioned things such as "an integer representing a number of milliseconds" and "if a negative value is provided, the implementation MUST generate a warning". Are "Duration" and "Timeout values" mutually exclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A timeout is a type of duration. So maybe I should move this content to the duration section, or as a subsection within duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A timeout is a type of duration. So maybe I should move this content to the duration section, or as a subsection within duration.
Yep, that'll bring clarity, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A timeout is a type of duration.
Am I correct that _DELAY
are the only configurations that of type duration but are not of type timeout? Do we want to have zero delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to have zero delay?
I came to the conclusion that while its odd, its still valid-ish. Take the batch span processor's delay interval, which represents the gap in time from the conclusion of one export to the start of the next. Assume that exports over the network take some amount of time like 2ms. Setting the batch span processor's delay to zero is just saying: "as soon as one export resolves, start the next".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on supporting zero delay, folks from the operating system background would find it common https://learn.microsoft.com/windows/win32/api/synchapi/nf-synchapi-sleepex#parameters.
|
||
For variables that represent a timeout (e.g. exporter timeout), implementations | ||
SHOULD validate that values are positive unless they have good reasons not to ( | ||
e.g. backwards compatibility with semantics where a negative or zero value means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will lead to inconsistency and confusion, let's say if Java SDK decided to treat -1
as indefinite while Python SDK decided to fall back to defaults (e.g. 30,000 milliseconds).
I think it is better to ask all SDKs to have a consistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to ask all SDKs to have a consistent behavior.
Consistency is best, but its difficult to make consistency recommendations post hoc. Consider our change in advice to make http/protobuf
default OTLP protocol, when it was previously grpc
. We have portions of the ecosystem which made the switch, either because they went stable after the change in advice or decided the change wasn't impactful enough to be breaking for users, and other portions which stuck with grpc
as the default (like opentelemetry-java).
If all maintainers agree that changing their semantics is a bugfix, let's get rid of this qualification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the grpc
and http/protobuf
example. However, I think for timeouts there is a high chance that we can achieve consistency. For example, if 3 language implementation SIGs are saying "-1 means indefinite to us" while other SIGs treat negative values as invalid, it is possible to have all implementations treating -1
as indefinite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if 3 language implementation SIGs are saying "-1 means indefinite to us" while other SIGs treat negative values as invalid, it is possible to have all implementations treating -1 as indefinite.
Its true. Its always acceptable to loosen the restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should define it similarly like in other types e.g.
The value is positive - if a negative value is provided, the implementation MUST generate a warning, gracefully ignore the setting and use the default value if it is defined; if a zero value is provided, the implementation SHOULD (not "MUST" because of backwards compatibility when it is used to represent infinity or zero) generate a warning, gracefully ignore the setting and use the default value if it is defined.
let's say if Java SDK decided to treat -1 as indefinite
This should be seen as a bug given the definition of the duration type:
The value is non-negative - if a negative value is provided, the implementation MUST generate a warning, gracefully ignore the setting and use the default value if it is defined.
The only issue is when 0 is handled as Infinite or zero. See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize the arguments:
- If negative, generate a warning and use the default. This is already the specified behavior for duration properties, which includes timeout.
- If zero, interpret to mean indefinite. There is prior art to support this interpretation. Also, its less controversial for an implementation which currently rejects 0 as invalid to begin accepting it than it is for an implementation which currently accepts zero to begin rejecting it. I.e. loosening restrictiveness is always ok. Increasing restrictiveness may be interpreted as a breaking change.
Given these arguments, does anyone feel strongly that we should reject 0 as invalid? Tagging folks who originally supported this idea: @tigrannajaryan, @lmolkova, @cijothomas, @pellared
Need a changelog entry. |
| OTEL_BSP_MAX_EXPORT_BATCH_SIZE | Maximum batch size | 512 | Must be less than or equal to OTEL_BSP_MAX_QUEUE_SIZE | | ||
| Name | Description | Default | Notes | | ||
|--------------------------------|------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------| | ||
| OTEL_BSP_SCHEDULE_DELAY | Delay interval (in milliseconds) between two consecutive exports | 5000 | Valid values are non-negative. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add a Type column instead? The type definitions contain more information which is relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Opened #4336 to address separately.
Resolves #4283.
Adds new guidance indicates that for timeout variables, "implementations SHOULD validate that values are positive". Carves out an exception for implementations which have different semantics and feel changing could not be justified as a bugfix:
The original issue was about the semantics of assigning a timeout of 0 or negative values, but figured it was useful to do a pass on all numeric env vars clarify acceptable boundaries.
List of changes below. All properties currently have unspecified boundaries.
pollingIntervalMs
: values are positive, since polling with an interval of 0ms is nonsensical.OTEL_BSP_SCHEDULE_DELAY
/OTEL_BLRP_SCHEDULE_DELAY
/OTEL_METRIC_EXPORT_INTERVAL
: values are non-negative, since its valid to immediately start an export 0ms after the prior ended.OTEL_BSP_EXPORT_TIMEOUT
/OTEL_BLRP_EXPORT_TIMEOUT
/OTEL_METRIC_EXPORT_TIMEOUT
: values are positive, following new guidance for timeouts.OTEL_BSP_MAX_QUEUE_SIZE
/OTEL_BLRP_MAX_QUEUE_SIZE
: values are positive, since setting the queue size to zero is nonsensical.OTEL_BSP_MAX_EXPORT_BATCH_SIZE
/OTEL_BLRP_MAX_EXPORT_BATCH_SIZE
: values are positive, since setting the batch size to zero is nonsensical.OTEL_ATTRIBUTE_*
/OTEL_SPAN_ATTRIBUTE_*
/OTEL_LOGRECORD_ATTRIBUTE_*
: values are non-negative, since its valid to record 0 attributes.OTEL_EXPORTER_ZIPKIN_TIMEOUT
/OTEL_EXPORTER_OTLP_TIMEOUT
/OTEL_EXPORTER_OTLP_{SIGNAL}_TIMEOUT
: values are positive, following new guidance for timeouts.Related PR to
opentelemetry-configuration
: open-telemetry/opentelemetry-configuration#151