-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spike: Rewrite using AS notifications #671
Closed
arielvalentin
wants to merge
18
commits into
open-telemetry:main
from
arielvalentin:spike-as-active-job
Closed
spike: Rewrite using AS notifications #671
arielvalentin
wants to merge
18
commits into
open-telemetry:main
from
arielvalentin:spike-as-active-job
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a spike to see if we can rewrite the ActiveJob instrumentation using AS notifications. Some things that stand out to me: Each subscriber requires additional logic to handle context propagation, which makes code reuse a little when clunky creating spans: - Egress (kind: consumer) spans must extract incoming context to continue the trace context or use links - This means keeping track of multiple context tokens and detaching them in reverse order - Ingress (kind: producer) spans must use the implicit parent span from the current context and must inject the outbound context - Not all events measure a block of code e.g. https://github.com/rails/rails/blob/a5f4aadcee42f9b4fa99468776c56b5b988c87cf/activejob/lib/active_job/instrumentation.rb#L31 - Exceptions are not documented as being included in `perform` events, which means we may not have access to them to set an error on the Ingress spans - Exceptions seem to be caught and recorded as part of other events: https://github.com/rails/rails/blob/a5f4aadcee42f9b4fa99468776c56b5b988c87cf/activejob/lib/active_job/exceptions.rb#L152 - `retry_stopped`, `discard`, and `enqueue_retry` will provide errors but should these be mapped to internal spans? - Would others agree that if the job is stopped or discarded due to an error, we should record the error? - There are no semantics around retries. I assume that means we do not need to track those?
@chrisholmes @ahayworth @robbkidd @kaylareopelle There is a lot of structural duplication in this spike but I just wanted to see if I could get it working. This is what the example output looks like: @arielvalentin ➜ .../opentelemetry-ruby-contrib/instrumentation/active_job/example (spike-as-active-job) $ bundle exec ruby active_job.rb
[ActiveJob] Enqueued TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) to Async(default)
[ActiveJob] [TestJob] [4da9685e-48d4-4dcc-9c77-0c1c32c585b6] Performing TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) from Async(default) enqueued at 2023-09-23T15:08:14Z
#<struct OpenTelemetry::SDK::Trace::SpanData
name="default publish",
kind=:producer,
status=#<OpenTelemetry::Trace::Status:0x00007fcadc2a4ca8 @code=1, @description="">,
parent_span_id="\x00\x00\x00\x00\x00\x00\x00\x00",
total_recorded_attributes=6,
total_recorded_events=0,
total_recorded_links=0,
start_timestamp=1695481694531298591,
end_timestamp=1695481694532534302,
attributes=
{"code.namespace"=>"TestJob",
"messaging.destination_kind"=>"queue",
"messaging.system"=>"async",
"messaging.destination"=>"default",
"messaging.message_id"=>"4da9685e-48d4-4dcc-9c77-0c1c32c585b6",
"net.transport"=>"inproc"},
links=nil,
events=nil,
resource=
#<OpenTelemetry::SDK::Resources::Resource:0x00007fcadc5069b0
@attributes=
{"service.name"=>"unknown_service",
"process.pid"=>69961,
"process.command"=>"active_job.rb",
"process.runtime.name"=>"ruby",
"process.runtime.version"=>"3.2.2",
"process.runtime.description"=>"ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]",
"telemetry.sdk.name"=>"opentelemetry",
"telemetry.sdk.language"=>"ruby",
"telemetry.sdk.version"=>"1.3.0"}>,
instrumentation_scope=#<struct OpenTelemetry::SDK::InstrumentationScope name="otel-active_job", version="0.0.1">,
span_id="\xE3\x90\x04\xACw:\r\xA9",
trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc150668 @flags=1>,
tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>>
--------------------------------------------------
The computer is doing some work, beep beep boop.
--------------------------------------------------
[ActiveJob] [TestJob] [4da9685e-48d4-4dcc-9c77-0c1c32c585b6] Performed TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) from Async(default) in 26.26ms
#<struct OpenTelemetry::SDK::Trace::SpanData
name="default process",
kind=:consumer,
status=#<OpenTelemetry::Trace::Status:0x00007fcadc2a4ca8 @code=1, @description="">,
parent_span_id="\xE3\x90\x04\xACw:\r\xA9",
total_recorded_attributes=7,
total_recorded_events=0,
total_recorded_links=1,
start_timestamp=1695481694532428068,
end_timestamp=1695481694558682818,
attributes=
{"code.namespace"=>"TestJob",
"messaging.destination_kind"=>"queue",
"messaging.system"=>"async",
"messaging.destination"=>"default",
"messaging.message_id"=>"4da9685e-48d4-4dcc-9c77-0c1c32c585b6",
"messaging.active_job.provider_job_id"=>"4f662ecc-bea5-4386-b5ec-5346dc0ee46b",
"net.transport"=>"inproc"},
links=
[#<OpenTelemetry::Trace::Link:0x00007fcadc2a9370
@attributes={},
@span_context=
#<OpenTelemetry::Trace::SpanContext:0x00007fcae0bb87f0
@remote=true,
@span_id="\xE3\x90\x04\xACw:\r\xA9",
@trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc2a9c58 @flags=1>,
@trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
@tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>>>],
events=nil,
resource=
#<OpenTelemetry::SDK::Resources::Resource:0x00007fcadc5069b0
@attributes=
{"service.name"=>"unknown_service",
"process.pid"=>69961,
"process.command"=>"active_job.rb",
"process.runtime.name"=>"ruby",
"process.runtime.version"=>"3.2.2",
"process.runtime.description"=>"ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]",
"telemetry.sdk.name"=>"opentelemetry",
"telemetry.sdk.language"=>"ruby",
"telemetry.sdk.version"=>"1.3.0"}>,
instrumentation_scope=#<struct OpenTelemetry::SDK::InstrumentationScope name="otel-active_job", version="0.0.1">,
span_id="=v\xAC\x02 \xC1T\xEE",
trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc150668 @flags=1>,
tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>> |
arielvalentin
force-pushed
the
spike-as-active-job
branch
from
September 24, 2023 14:20
dc41df5
to
25eed1b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a spike to see if we can rewrite the ActiveJob instrumentation using AS notifications.
Some things that stand out to me:
Each subscriber requires additional logic to handle context propagation, which makes code reuse a little when clunky creating spans:
Egress (kind: consumer) spans must extract incoming context to continue the trace context or use links
Ingress (kind: producer) spans must use the implicit parent span from the current context and must inject the outbound context
Not all events measure a block of code e.g. https://github.com/rails/rails/blob/a5f4aadcee42f9b4fa99468776c56b5b988c87cf/activejob/lib/active_job/instrumentation.rb#L31
Exceptions are not documented as being included in
perform
events, which means we may not have access to them to set an error on the Ingress spansretry_stopped
,discard
, andenqueue_retry
will provide errors but should these be mapped to internal spans?There are no semantics around retries. I assume that means we do not need to track those?