-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update enduser domain and add enduser.authentication.id
#1456
base: main
Are you sure you want to change the base?
Conversation
0a8b0e8
to
ee0970f
Compare
model/enduser/registry.yaml
Outdated
brief: > | ||
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains. | ||
attributes: | ||
- id: enduser.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems confusing to set enduser.id = "QdH5CAWJgqVT4rOr0qtumf"
and enduser.authentication.id = "lmolkova"
, based on the
https://github.com/open-telemetry/semantic-conventions/pull/1146/files#r1712997369 and https://github.com/open-telemetry/semantic-conventions/pull/1146/files#r1710187141
It'd be more clear if we called this one enduser.tracking.id
or enduser.anonymous.id
so that people would not put PII there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per earlier discussion, it seemed that anonymous
was confusing to some.
i'm good with enduser.anonymous_id
or enduser.tracking_id
. neither tracking
nor anonymous
a namespace, nesting here doesn't seem to follow the naming convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean that enduser.tracking.id
would not follow the naming convention? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like enduser.tracking.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some examples
semantic-conventions/schemas/1.19.0
Line 32 in d5d2b9d
messaging.consumer_id: messaging.consumer.id |
nesting is used for .
if not, use _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmolkova @trisch-me Please review and fill in anything that i may have missed.:
enduser.pseudo.id
- pros: it's clear to know this id is not authenticated
- cons: it might lead to misinterpretation of this id, like it's not a real id, a testing id, a temporary id?
enduser.tracking.id' - pros: it's clear that this id is used to track a particular user. - cons:
tracking` may raise privacy concerns, as it implies monitoring user behavior, which could lead to user distrust. it also lacks context of what exactly is being tracked (e.g. user actions, sessions, locations, etc)
enduser.unauth.id
- pros: unauth
is short
- cons: unauth
is ambiguous, as it can be unauthenticated
or unauthorized
. additionally, acronym is not a good naming practice and leads to more confusion.
enduser.temp.id
or enduser.transient.id
- pros: it suggests that this id is temporary and associated with user who has not been authenticated.
- cons: it lacks context about the id is temporary for what context (e.g. session, authentication)
enduser.unauthenticated.id
- pro: it's clear to indicate an authenticated user.
- cons: it collides with enduser.authentication.id
, which can be renamed to enduser.authenticated.id
, then it would have been fine?
enduser.anonymous.id
- pros: it's clear that this id is anonymous.
- cons: it can be confusing and lacks context. as long we have a clear documentation, this should be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like these three:
enduser.pseudo.id
enduser.transient.id
enduser.ephemeral.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attended Client SIG this morning,
they also preferred enduser.pseudo.id
after going through this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what were the cons for enduser.unauthenticated.id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have already discussed and agreed to have enduser.authentication.id
. authentication
as a sub-namespace under enduser. anything that is not under authentication
is either unauthenticated or unauthorized or anonymous or some random id.
enduser.authentication.id
…s/semantic-conventions into heya/add-enduser-namespace
model/enduser/registry.yaml
Outdated
Identifier of an anonymous end user who interacts with a system. | ||
This identifier may be unique only through best-effort means and does not imply that the user is authenticated to the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meeting notes from semantic convention SIG meeting today:
- is this something that OpenTelemetry instrumentation is planning populate, or is this only something that vendor-specific instrumentation is planning to populate?
- if it's something that OpenTelemetry instrumentation is planning to populate, what would the implementation look like, e.g. would this be stored in a persistent cookie? would it be stamped onto a specific event?
- there is a desire not to add attributes into the semconv repo when only a single vendor has expressed interest in them
- there is also a desire not to add attributes into the semconv repo without having any span/event/metric definitions that use them
the recommendation for next steps was to discuss this in the Client SIG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyams were those questions addressed during client sig meeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we kind of covered it. cc @MSNev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you post the answers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something that OpenTelemetry instrumentation is planning populate
- Yes, this will be added by instrumentations either directly or indirectly (via a psuedo user manager -- like the Session Manager)
if it's something that OpenTelemetry instrumentation is planning to populate, what would the implementation look like, e.g. would this be stored in a persistent cookie? would it be stamped onto a specific event?
- It will need something like the SessionManager implementation to "manager" the lifecycle of this value, some environments would be just to create a simple random value for App start (like Android) while in a browser which is stateless then the "user manager" would most likely use cookies (but it could also use session storage)
there is a desire not to add attributes into the semconv repo when only a single vendor has expressed interest in them
- This will not be single vender specific, but it is highly RUM specific
there is also a desire not to add attributes into the semconv repo without having any span/event/metric definitions that use them
- It should be available both Spans and Logs, while it could be available for Metrics it's cardinality (because its a random value) should not necessarily be used for metrics...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! a couple of followup questions:
- if/when entity (or mutable resource) attributes are available, do you see
enduser.pseudo.id
as one of those? - are there any specific span or event semantic conventions that we can add
enduser.pseudo.id
to? or isenduser.pseudo.id
hopefully just an entity (or mutable resource) attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we discussed these in the client SIG. I will wait for @MSNev's response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If/when mutable resources attributes (whether they are entities or shorter lived lifespan resources ) then this (and session) could possibly by represented there.
are there any specific span or event semantic conventions that we can add enduser.pseudo.id to?
No, not specifically as this really is just additional context details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe having a document covering enduser
convention as a whole was a critical feedback point.
We're trying to avoid adding attributes without also adding a signal that leverages them.
I'm surprised we don't have any signals to populate these on. Wouldn't we add them on session events?
https://github.com/open-telemetry/semantic-conventions/blob/f7362c7066856ff8591ac461d4e3b31ad7af3a4b/docs/general/session.md
If we're going to just stamp them on all telemetry items, let's document it as an attribute group - this would be the place to describe any additional guidance. This doc has to be updated anyway.
I believe it should cover:
- who populates those attributes - would an HTTP instrumentation do it? Distro? End user apps?
- when they are populated? Whenever they are known? Is it opt-in? etc
- how to populate them - where would I get this information from? There are some mappings in existing docs - are they still relevant?
- is user info propagated to other services (currently docs say it should be using baggage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed here
it's ok, it's not a required check, we're working on resolving the flakiness |
- ref: enduser.authentication.id | ||
requirement_level: required | ||
note: > | ||
The `enduser.authentication.id` attribute is intended to provide an unique identifier of an authenticated enduser. | ||
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that enduser.id
would be the normal (authenticated) user id, do I remember that correctly? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1456 (comment)
we decided to use enduser.authentication.id
for the authenticated user id since the beginning of this discussion.
renaming enduser.id
to something specific (like enduser.pseudo.id
) so that user don't put authenticated user id under this attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you summarize the reason for using enduser.authentication.id
instead of enduser.id
?
does this mean we will prohibit future "embedding" of user.id
into the enduser.*
namespace? or will we have 3 "id" attributes including enduser.id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided to have a sub-namespace called authentication so that we can add other attributes to do with authentication in the future. enduser.id
is renamed to enduser.pseudo.id
for unauthenticated/unauthorized/anonymous or any other kind of tracking id as long as it's not authenticated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided to have a sub-namespace called authentication so that we can add other attributes to do with authentication in the future.
I don't see any discussion around it or some motivation to introduce enduser.authentication.id
. Is there a link? Where did it come from?
If we need to add authentication
namespace with some other properties in the future, we can always do it.
The enduser.id
seems general and concise. I'd prefer to use it unless there are strong reasons not to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was the 1st thing i did on this long-discussed thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1146 was never approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this PR is based on that PR.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by | ||
default and then provide a configuration parameter to turn on retention for use cases where the | ||
information is required and would not violate any policies or regulations. | ||
`enduser.pseudo.id` attribute can be set by a specific client component, e.g. through a cookie out of the Span's HTTP request headers. Client side application should be able to stamp this attribute on any telemetry item emitted by the application whenever this cookie is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmolkova please let me know your thoughts on this statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if my suggestion is factually correct - please adjust it, but I'd phrase it differently - something along the following lines:
Enduser attributes capture end user identity. They are likely to contain PII and should be populated, processed, and stored with caution.
Information about the end user is usually available on the client side (in a mobile or browser application). Enduser attributes are populated by the user application in coordination with OpenTelemetry SDK. Some OpenTelemetry distributions auto-collect this information from HTTP cookies.
When user information is available, it's RECOMMENDED to add it to all spans and events emitted in the scope of operation initiated by this user.
Application in coordination with OpenTelemetry SDK and Distro MAY propagate user information from the client application to the front end and across different backend services using custom HTTP cookies and/or Baggage.
Enduser information is collected and populated manually by user application or specialized components,
other instrumentations such as HTTP or RPC are not expected to populate these attributes by default.
<!-- TODO: add link to the implementation in otel-js-contirb -->
I would also put it above the table.
I think it should also replace the content of lines 413-415 above
It is expected this information would be propagated unchanged from node-to-node within the system
using the Baggage mechanism. These attributes should not be used to record system-to-system
authentication attributes.
requirement_level: required | ||
note: > | ||
The `enduser.authentication.id` attribute is intended to provide an unique identifier of an authenticated enduser. | ||
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry. |
I don't think that's accurate and also there is no need to mention changes in yaml that have no effect on markdown.
@@ -2,23 +2,27 @@ groups: | |||
- id: registry.enduser.deprecated | |||
type: attribute_group | |||
display_name: Deprecated End User Attributes | |||
brief: Describes deprecated enduser attributes. Complete enduser namespace has been deprecated | |||
brief: "Describes deprecated end user attributes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brief: "Describes deprecated end user attributes." | |
brief: "Describes deprecated enduser attributes." |
attributes: | ||
- id: enduser.id | ||
type: string | ||
brief: 'Deprecated, use `enduser.pseudo.id` instead.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe enuser.id
should stay as it is. Even if we call it enduser.authentication.id
, the old enduser.id
matches authenticated one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main reason we came up with enduser.pseudo.id
or a list of other naming options because we don't want customers to put authenticated user id to enduser.id
. authenticated one is using enduser.authentication.id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ask is to keep enduser.id
instead of adding enduser.authentication.id
. This PR would add enduser.pseudo.id
and un-deprecate enduser.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that wasn't what we discussed though. what about the sub-namespace authentication? i thought that is what we want to do so that any other attributes associated with authentication
can be added later.
if you want to change it now, that will be different. we would need to re-discuss it. i thought the consensus is to have authentication sub namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1104 (comment) it was from you @lmolkova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been under the assumption that we use user.id
for pseudo/tracking/anonymous id - as you can see we've reconsidered it after that.
if you want to change it now, that will be different. we would need to re-discuss it.
It seems at least me and @trask are on the same page to use enduser.id
instead of enduser.authentication.id
. I believe I saw some support from @trisch-me and @jsuereth on it too.
Please make the change and if anyone has objections they will be able to comment and provide reasoning.
- id: enduser.authentication.id | ||
type: string | ||
brief: "Unique identifier of an authenticated user in the system." | ||
examples: [ 'S-1-5-21-202424912787-2692429404-2351956786-1000' ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to be an example of windows OS user (SID). Given that this attribute is intended to represent human identity, we should use a realistic example (login? email? guid?)
Also, we should add a note that it's likely a PII. I have a proposal on how to capture it (until we have a formal way in #1707)
note: | | |
... | |
> [!WARNING] | |
> | |
> This attribute contains sensitive (PII) information. | |
``` |
attributes: | ||
- id: enduser.pseudo.id | ||
type: string | ||
stability: experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use development now, let's change it here and in other non-deprecated attributes
stability: experimental | |
stability: development |
type: attribute_group | ||
display_name: End User Attributes | ||
brief: > | ||
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains. | |
Describes the end user. |
I don't think the other part is correct:
which can be used as a subdomain of browser, client, or user domains.
brief: "Deprecated, use `user.id` instead." | ||
examples: 'username' | ||
deprecated: "Replaced by `enduser.pseudo.id`." | ||
examples: ['QdH5CAWJgqVT4rOr0qtumf'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please bring the old example back
requirement_level: recommended | ||
- ref: enduser.pseudo.id | ||
requirement_level: recommended | ||
- ref: enduser.role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove it from this group - it's listed as deprecated attribute and is not necessary here
- ref: enduser.role | ||
deprecated: "Removed." | ||
requirement_level: recommended | ||
- ref: enduser.scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, please remove the reference
- id: identity | ||
type: attribute_group | ||
brief: > | ||
These attributes may be used for any operation with an authenticated and/or authorized enduser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes may be used for any operation with an authenticated and/or authorized enduser. | |
Describes end user identity. |
we support unauthenticated users too, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, enduse.pseudo.id
can be unauthenticated, unauthorized, anonymous, and etc.. as long as it's not authenticated, which is tracked under enduser.authentication.id
.
deprecated: "Removed." | ||
requirement_level: recommended | ||
- ref: enduser.authentication.id | ||
requirement_level: required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be required since it's likely to contain PII
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove it
@@ -410,34 +414,7 @@ system. It is expected this information would be propagated unchanged from node- | |||
using the Baggage mechanism. These attributes should not be used to record system-to-system | |||
authentication attributes. | |||
|
|||
Examples of where the `enduser.id` value is extracted from: | |||
|
|||
| Authentication protocol | Field or description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep this table - it actually explains what the enduser.id
is - we don't provide an explanation like this anymore. Is this information still accurate and can it be used to capture enduser.id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enduser.id
has been replaced with 'enduser.pseudo.id`. i don't think it makes sense to keep it.
For people who are flip-flopping on what we discussed about sub namespace #1104 (comment) or watch the SIG meeting recordings whenever i was present :) |
hi @heyams, I don't see any arguments in those links for specifically preferring in order to help this PR move forward, can you write out the reasons you have for specifically preferring |
it was decided in the SIG meeting, not me. recording is available though. they preferred to have |
@heyams I still don't see any reasons in your last comment for specifically preferring |
Fixes #1104