Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] AWS instrumentation not adding request specific info on requests *not* sampled #2345

Open
cfbao opened this issue Nov 26, 2024 · 3 comments
Labels
bug Something isn't working comp:instrumentation.aws Things related to OpenTelemetry.Instrumentation.AWS

Comments

@cfbao
Copy link

cfbao commented Nov 26, 2024

Component

OpenTelemetry.Instrumentation.AWS

Package Version

Package Name Version
OpenTelemetry.Extensions.Hosting 1.10.0
OpenTelemetry.Instrumentation.AWS 1.10.0-beta.1

Runtime Version

net9.0

Description

Context propagation should happen on all traces, even ones that are not sampled, so downstream services know to not sample their portion of the workflow too.

This isn't happening with OpenTelemetry.Instrumentation.AWS. E.g. SQS messages from an operation won't get the traceparent message attribute if that operation isn't sampled.

Steps to Reproduce

using Amazon.SQS;
using Amazon.SQS.Model;
using OpenTelemetry.Trace;

var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddOpenTelemetry().WithTracing(builder => builder
	.AddAWSInstrumentation(o => o.SuppressDownstreamInstrumentation = true)
	.SetSampler(new TraceIdRatioBasedSampler(0.5))
	.AddSource("SqsOtel"));
builder.Services
	.AddSingleton<IAmazonSQS, AmazonSQSClient>()
	.AddHostedService<Worker>()
var host = builder.Build();
host.Run();

class Worker(IAmazonSQS sqs) : BackgroundService
{
	private const string QueueUrl = "<my-queue-url>";

	protected override async Task ExecuteAsync(CancellationToken stoppingToken)
	{
		while (!stoppingToken.IsCancellationRequested)
		{
			var sendResponse = await sqs.SendMessageAsync(new()
			{
				QueueUrl = QueueUrl,
				MessageBody = "Hello from SQS!",
			}, stoppingToken);

			Console.WriteLine($"""
				sent:
					{sendResponse.MessageId}
				""");

			var receiveResponse = await sqs.ReceiveMessageAsync(new ReceiveMessageRequest
			{
				QueueUrl = QueueUrl,
				MaxNumberOfMessages = 10,
				WaitTimeSeconds = 20,
				MessageAttributeNames = ["All"],
			});

			foreach (var msg in receiveResponse.Messages)
			{
				Console.WriteLine($"""
					received:
						{msg.MessageId}
						{msg.MessageAttributes.Count}
						{msg.MessageAttributes.GetValueOrDefault("traceparent")?.StringValue}
					""");

				await sqs.DeleteMessageAsync(new()
				{
					QueueUrl = QueueUrl,
					ReceiptHandle = msg.ReceiptHandle,
				}, stoppingToken);
			}

			await Task.Delay(1000, stoppingToken);
		}
	}
}

Expected Result

We see a "traceparent" attribute in the message for all messages regardless of sampling decision, with console output like this:

sent:
        07252ddd-82c8-494a-8c62-3f139759e53f
received:
        07252ddd-82c8-494a-8c62-3f139759e53f
        1
        00-e4d6f996fd72ce140f1477be875ff351-a9a387e9cf8d2ef2-00
sent:
        5b0cf5bb-9224-46c3-b637-8586703838b9
received:
        5b0cf5bb-9224-46c3-b637-8586703838b9
        1
        00-cb7fefed64c41d386b7ea4d0062ef93a-0eed7d24a469dab6-01

Notice the "00" suffix in the first traceparent value, indicating that it's not sampled

Actual Result

We don't see message attributes on requests that are not sampled. Console output is like this:

sent:
        cfd3b57a-06d4-4bd8-8039-2ec4e8acab33
received:
        cfd3b57a-06d4-4bd8-8039-2ec4e8acab33
        1
        00-12a0dfe5dbe6bc8f7a24432c614dec1b-d8a549c0140fa810-01
sent:
        9af046d7-93a7-42aa-a6d2-aecb823240b0
received:
        9af046d7-93a7-42aa-a6d2-aecb823240b0
        0

sent:
        9cbf8764-3e57-466e-aa49-8afd78d33410
received:
        9cbf8764-3e57-466e-aa49-8afd78d33410
        0

sent:
        cc638f17-d8b8-4140-b506-e8d04f46bd9f
received:
        cc638f17-d8b8-4140-b506-e8d04f46bd9f
        1
        00-355f6b12c9cba9177a1dcccb7939a5bc-9746e7ca070cffb4-01

Additional Context

.NET doesn't really make it easy to propagate context on traces that are not sampled.
(HttpClient has its own complex logic to deal with this)

There is discussion on this general problem here:
open-telemetry/opentelemetry-dotnet#2887

I've added my thoughts on the best practice pattern here, with reference to a sample app in opentelemetry-dotnet:
https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/MicroserviceExample

@cfbao cfbao added the bug Something isn't working label Nov 26, 2024
@github-actions github-actions bot added the comp:instrumentation.aws Things related to OpenTelemetry.Instrumentation.AWS label Nov 26, 2024
Copy link
Contributor

Tagging component owner(s).

@srprash @ppittle @muhammad-othman @rypdal @Oberon00

@ppittle
Copy link
Member

ppittle commented Dec 4, 2024

@cfbao - Looks like you've already done a deep dive into this issue.

Would you be interested in submitting a fix?

@cfbao
Copy link
Author

cfbao commented Dec 6, 2024

@ppittle Yes, I'm up for contributing a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:instrumentation.aws Things related to OpenTelemetry.Instrumentation.AWS
Projects
None yet
Development

No branches or pull requests

2 participants