Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic convention for span link names #1057

Open
sirzooro opened this issue Nov 14, 2022 · 16 comments
Open

Semantic convention for span link names #1057

sirzooro opened this issue Nov 14, 2022 · 16 comments
Labels
enhancement New feature or request

Comments

@sirzooro
Copy link

What are you trying to achieve?
Spans can be created with one of more links leading to other spans (other than parent span link). When such links are present, Jaeger displays all links in small menu next to the trace. Every line in this menu is some ID (probably Span ID) and link type (parent/external). This is not very user friendly. When only one external link is present (in case of 1-to-1 relation between traces) this is not a big deal. However when many links are added, user have to open every one of them to find the one he is looking for.

What did you expect to see?
Please create semantic convention for links. At the beginning it should contain one attribute only to specify user-friendly link name, e.g. link.name. This attribute could be later used by tools like Jaeger to present more user-friendly names on link list.

@arminru arminru added the enhancement New feature or request label Nov 14, 2022
@Oberon00
Copy link
Member

Shouldn't you ask Jaeger to display the name of the linked span instead? We also don't have that name for the "link" to the parent span.

@sirzooro
Copy link
Author

Parent link is a special case, it does not need explicit name.

Using name of the linked span resolves this only partially. Span names should be generic ones, so user may end up with 4 links named "Sending message X". Explicit name attribute would allow to set link name like "Sending message X (retry 2)". Another useful bit of information is some unique ID which is stored in span attributes, and could be added to link name to help user distinguish links.

One more problematic case are broken links, where target span for some reason does not exists in Jaeger database - in such case Jaeger would have to revert to current behavior and show ID only.

@pyohannes
Copy link
Contributor

There is a PR up for refactoring messaging semantic conventions which advocates for adding message specific attributes on links.

I think there's definitely value in having names (and, as we're at it, also a well-defined type) for links. The use case for broken links (probably due to sampling) is an important one, as there's currently no way to have consistent sampling for linked traces.

However, I think there are a few questions that would need to be answered:

  • Having an API enforced link name (like event names) versus having link name as a semantic convention. Having this enforced by the API and as a part of the data model would allow tools like Jaeger to depend on the presence of a link name, whereas having it prescribed by semantic conventions might deter adoption, as it's not guaranteed to be present.
  • Are there any other uses for link names, besides better visualization? Does it make sense to search and filter based on link names? If so, does it make sense to make recommendations regarding the cardinality of link names?
  • Cost. Is this duplicate information that could also be synthesized by tools like Jaeger from already existing span attributes? Is the additional cost worth the benefit?

@Oberon00
Copy link
Member

I think it is fine to add attributes describing the semantic meaning of the link. And I understand your use case in the UI is a display name. But I'm not sure if it wouldn't be better to derive that name from semantically meaningful attributes. E.g. we could have a semantic convention for an attribute defining the relationship to the linked span.

@sirzooro
Copy link
Author

I think it is fine to add attributes describing the semantic meaning of the link. And I understand your use case in the UI is a display name. But I'm not sure if it wouldn't be better to derive that name from semantically meaningful attributes. E.g. we could have a semantic convention for an attribute defining the relationship to the linked span.

Relation type is not enough. It is also necessary to add some kind of "ID" attribute to distinguish multiple links of the same type.

@lmolkova
Copy link
Contributor

lmolkova commented Nov 16, 2022

link.name could be useful to distinguish different kinds of links. For example, if I process a batch of messages, I would probably have a span that describes batch processing, and links on it would describe individual messages.
But it's possible that I want to link some other context to the same span. E.g. if I process messages like this, I might want to link receive to process:

messages = queue.receive(10)
process(messages)

process(messages) // span could have links to all messages and receive span
   db.write(messages)

This is an imaginary scenario, but I assume having a link.name as a low-cardinality field would give us a way to distinguish different kinds of links.

It can also have semantical meaning for the backends to visualize links properly. E.g. they'd know that messaging.message.link
describes a message and db.bulk.operation describes the individual DB operation.

If we don't populate link.name, semantic conventions that make use of links, would have to define required attributes on links and explain which of them can be used as a name heuristic. It'd not be generic.

If redundancy is a problem, link.name can be recommended attribute. when missing, no semantics is applied. And instrumentations may offer a way to opt-out.

@Oberon00
Copy link
Member

Oberon00 commented Nov 17, 2022

@sirzooro

Relation type is not enough. It is also necessary to add some kind of "ID" attribute to distinguish multiple links of the same type.

I disagree. This is not necessary. The linked span ID is already an ID.

@Oberon00
Copy link
Member

Oberon00 commented Nov 17, 2022

@lmolkova

If we don't populate link.name, semantic conventions that make use of links, would have to define required attributes on links and explain which of them can be used as a name heuristic. It'd not be generic.

I think you are thinking from a wrong angle about this.

I absolutely agree that it would be good to be able to distinguish different kinds of links. But where I disagree is that this should be done through a link.name attribute that also has to be a useful display name. I think there should be a link.kind or similar attribute instead.

E.g. you propose "having a link.name as a low-cardinality field", but @sirzooro proposed to have some ID (max cardinality!) in the name.

The concept of "name" is just too complex, everybody wants it to be something different. Remember the discussion about span name? open-telemetry/opentelemetry-specification#557 "Span name: Both low-cardinality (grouping key) and human-readable (display name)"

@sirzooro
Copy link
Author

@sirzooro

Relation type is not enough. It is also necessary to add some kind of "ID" attribute to distinguish multiple links of the same type.

I disagree. This is not necessary. The linked span ID is already an ID.

I mean something else than Span ID. When there are 3 links present, I would like to see "Foo 1", "Foo 18" and "Foo 32" in the menu instead of "Foo", "Foo", "Foo".

@lmolkova
Copy link
Contributor

lmolkova commented Nov 17, 2022

@Oberon00 agreed, I'd be more interested in 'link.kind' or 'link.type'. Assuming link.kind is messaging, than message.id is a good display name and duplicating it on link.name would be expensive.

I agree we weren't successful at defining a good approach display names before, but I still see value in link.name though - if you want to see a name on your link put attribute X on link, and backends can show it if there is no kind defined. Otherwise, they can use the semantics of this kind.
I think the main blocker here is that very few backends support links and it's hard to reason about it.

@blumamir I wonder what's your view on this based on your experience visualizing links on the backend?

@bwoebi
Copy link

bwoebi commented Aug 18, 2023

I'm definitely interested in span link names and kinds.

Regarding span link names, most naming can be inferred by the relationship described by the span link kind alone. But sometimes specific information is better visualized in a name rather than an additional attribute (after all, you're first displaying the span link names before you show attributes). For me the span link name should be a human readable name. For low cardinality grouping keys, that functionality should be generally covered by the span kind.

Span link names also should be simply optional, serving as enhancing the visualization if there is additional meaning they may convey.

For span link kinds, there are some concrete scenarios I have identified:

  • When a task is executed by a task runner, the trace of a task runner may spawn different sub-traces, which you link together with a span link.
  • When a task is scheduled by some request to be executed later, you will want to link their traces together.
  • When a request resumes another request, e.g. by an end user resuming a buying process, to understand context, you may want to link them together.
  • When a span follows from another spans execution, or a span consumes the result of another span, due to async fork/join (and can’t really be represented by parent-child), you want to link them together.

These scenarios may be sometimes subdivided, e.g. a span link of kind "scheduled by" may be scheduled by either a message of an actual delayed job (like run after 5 min).
Similarly an "executed by" task may have been executed by a web server, a task scheduler, an actual cron job etc. But the mechanism of execution could be rather a separate tag which is only available when the task is of kind "executed by".

Summarizing, for me there are four fundamental span link kinds to be distinguished, "executed by", "scheduled by", "resumes", "follows from".
The names may be argued and the list not complete, but that's the starting point I would propose.

Span link kinds should be strongly recommended to be set by users for proper displaying and categorization purposes in visualization.

@pyohannes
Copy link
Contributor

Thanks for reviving this discussion.

Summarizing, for me there are four fundamental span link kinds to be distinguished, "executed by", "scheduled by", "resumes", "follows from".
The names may be argued and the list not complete, but that's the starting point I would propose.

With such an approach, I see a danger that this list might become quite long, and possibly ambiguous. Also, when I think about messaging in particular, where a producer and a consumer trace could be linked, it's not clear to me in what category this link falls. It might actually depend on what the producer intends the consumer to do with the message.

@bwoebi
Copy link

bwoebi commented Aug 28, 2023

this list might become [...] possibly ambiguous

It may be the case that a multiple link kinds match a specific scenario. The goal however isn't having a precisely disjunct set of link kinds, but something which can reasonably inform the desired presentation for a given link. In fact, as long as the number of distinct presentations is small, the number of link kinds can be small too.

This is also why I ended up with these 4 link kinds, and not more, as anything else I could think of ended up broadly fitting in one of these four categories.

So, whether that will be many kinds depends on how strictly one wants to separate the individual kinds. I think we should aim having as few kinds as necessary for proper presentation.

Also, when I think about messaging in particular, where a producer and a consumer trace could be linked

Practically, unless I'm missing something, producer traces are also produced before consumer traces, i.e. the link itself will be set on the consumer side. Now certainly, a trace ingestor may elect to automatically create a backreference, which will be appropriately marked internally.

If we consider this to be a scenario we support, it would be trivial though to also standardize a tag marking the direction of the link.

It might actually depend on what the producer intends the consumer to do with the message.

Yes, it should, in general be the consumer who specifies the link kind. The producer typically does not have nor need that information.

@sirzooro
Copy link
Author

Also, when I think about messaging in particular, where a producer and a consumer trace could be linked

Practically, unless I'm missing something, producer traces are also produced before consumer traces, i.e. the link itself will be set on the consumer side. Now certainly, a trace ingestor may elect to automatically create a backreference, which will be appropriately marked internally.

Producer and consumer can be linked using regular parent/child relation - producer injects its span into message, consumer extracts it and uses as a parent for its span. There is also special semantic convention for producer/consumer spans. Usually this is enough, unless there is some special reason to not do this.

@bwoebi
Copy link

bwoebi commented Aug 28, 2023

If this applies. There may be reasons to not do this. Not everyone wants to have everything in a single big trace for example, we've heard complaints about traces not being somewhat temporally limited for example. In such cases a consumer may decide to start a new trace and link to the old one.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: V1 - Stable Semantics
Development

No branches or pull requests

7 participants