Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spike: Rewrite using AS notifications #671

Closed

Conversation

arielvalentin
Copy link
Collaborator

This is a spike to see if we can rewrite the ActiveJob instrumentation using AS notifications.

Some things that stand out to me:

Each subscriber requires additional logic to handle context propagation, which makes code reuse a little when clunky creating spans:

This is a spike to see if we can rewrite the ActiveJob instrumentation using AS notifications.

Some things that stand out to me:

Each subscriber requires additional logic to handle context propagation, which makes code reuse a little when clunky creating spans:
- Egress (kind: consumer) spans must extract incoming context to continue the trace context or use links
  - This means keeping track of multiple context tokens and detaching them in reverse order
- Ingress (kind: producer) spans must use the implicit parent span from the current context and must inject the outbound context

- Not all events measure a block of code e.g. https://github.com/rails/rails/blob/a5f4aadcee42f9b4fa99468776c56b5b988c87cf/activejob/lib/active_job/instrumentation.rb#L31
- Exceptions are not documented as being included in `perform` events, which means we may not have access to them to set an error on the Ingress spans
  - Exceptions seem to be caught and recorded as part of other events: https://github.com/rails/rails/blob/a5f4aadcee42f9b4fa99468776c56b5b988c87cf/activejob/lib/active_job/exceptions.rb#L152
  - `retry_stopped`, `discard`, and `enqueue_retry` will provide errors but should these be mapped to internal spans?
  - Would others agree that if the job is stopped or discarded due to an error, we should record the error?
- There are no semantics around retries. I assume that means we do not need to track those?
@arielvalentin
Copy link
Collaborator Author

@chrisholmes @ahayworth @robbkidd @kaylareopelle There is a lot of structural duplication in this spike but I just wanted to see if I could get it working.

This is what the example output looks like:

@arielvalentin ➜ .../opentelemetry-ruby-contrib/instrumentation/active_job/example (spike-as-active-job) $ bundle exec ruby active_job.rb 
[ActiveJob] Enqueued TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) to Async(default)
[ActiveJob] [TestJob] [4da9685e-48d4-4dcc-9c77-0c1c32c585b6] Performing TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) from Async(default) enqueued at 2023-09-23T15:08:14Z
#<struct OpenTelemetry::SDK::Trace::SpanData
 name="default publish",
 kind=:producer,
 status=#<OpenTelemetry::Trace::Status:0x00007fcadc2a4ca8 @code=1, @description="">,
 parent_span_id="\x00\x00\x00\x00\x00\x00\x00\x00",
 total_recorded_attributes=6,
 total_recorded_events=0,
 total_recorded_links=0,
 start_timestamp=1695481694531298591,
 end_timestamp=1695481694532534302,
 attributes=
  {"code.namespace"=>"TestJob",
   "messaging.destination_kind"=>"queue",
   "messaging.system"=>"async",
   "messaging.destination"=>"default",
   "messaging.message_id"=>"4da9685e-48d4-4dcc-9c77-0c1c32c585b6",
   "net.transport"=>"inproc"},
 links=nil,
 events=nil,
 resource=
  #<OpenTelemetry::SDK::Resources::Resource:0x00007fcadc5069b0
   @attributes=
    {"service.name"=>"unknown_service",
     "process.pid"=>69961,
     "process.command"=>"active_job.rb",
     "process.runtime.name"=>"ruby",
     "process.runtime.version"=>"3.2.2",
     "process.runtime.description"=>"ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]",
     "telemetry.sdk.name"=>"opentelemetry",
     "telemetry.sdk.language"=>"ruby",
     "telemetry.sdk.version"=>"1.3.0"}>,
 instrumentation_scope=#<struct OpenTelemetry::SDK::InstrumentationScope name="otel-active_job", version="0.0.1">,
 span_id="\xE3\x90\x04\xACw:\r\xA9",
 trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
 trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc150668 @flags=1>,
 tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>>

--------------------------------------------------
 The computer is doing some work, beep beep boop.
--------------------------------------------------

[ActiveJob] [TestJob] [4da9685e-48d4-4dcc-9c77-0c1c32c585b6] Performed TestJob (Job ID: 4da9685e-48d4-4dcc-9c77-0c1c32c585b6) from Async(default) in 26.26ms
#<struct OpenTelemetry::SDK::Trace::SpanData
 name="default process",
 kind=:consumer,
 status=#<OpenTelemetry::Trace::Status:0x00007fcadc2a4ca8 @code=1, @description="">,
 parent_span_id="\xE3\x90\x04\xACw:\r\xA9",
 total_recorded_attributes=7,
 total_recorded_events=0,
 total_recorded_links=1,
 start_timestamp=1695481694532428068,
 end_timestamp=1695481694558682818,
 attributes=
  {"code.namespace"=>"TestJob",
   "messaging.destination_kind"=>"queue",
   "messaging.system"=>"async",
   "messaging.destination"=>"default",
   "messaging.message_id"=>"4da9685e-48d4-4dcc-9c77-0c1c32c585b6",
   "messaging.active_job.provider_job_id"=>"4f662ecc-bea5-4386-b5ec-5346dc0ee46b",
   "net.transport"=>"inproc"},
 links=
  [#<OpenTelemetry::Trace::Link:0x00007fcadc2a9370
    @attributes={},
    @span_context=
     #<OpenTelemetry::Trace::SpanContext:0x00007fcae0bb87f0
      @remote=true,
      @span_id="\xE3\x90\x04\xACw:\r\xA9",
      @trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc2a9c58 @flags=1>,
      @trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
      @tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>>>],
 events=nil,
 resource=
  #<OpenTelemetry::SDK::Resources::Resource:0x00007fcadc5069b0
   @attributes=
    {"service.name"=>"unknown_service",
     "process.pid"=>69961,
     "process.command"=>"active_job.rb",
     "process.runtime.name"=>"ruby",
     "process.runtime.version"=>"3.2.2",
     "process.runtime.description"=>"ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]",
     "telemetry.sdk.name"=>"opentelemetry",
     "telemetry.sdk.language"=>"ruby",
     "telemetry.sdk.version"=>"1.3.0"}>,
 instrumentation_scope=#<struct OpenTelemetry::SDK::InstrumentationScope name="otel-active_job", version="0.0.1">,
 span_id="=v\xAC\x02 \xC1T\xEE",
 trace_id="\x9D0\xB0_W\b\xCC\xD0\xF7\xA1s\x95\xF45\x1D|",
 trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x00007fcadc150668 @flags=1>,
 tracestate=#<OpenTelemetry::Trace::Tracestate:0x00007fcadc12dff0 @hash={}>>

@arielvalentin
Copy link
Collaborator Author

Jaeger view:

image

@arielvalentin
Copy link
Collaborator Author

hmm
image

@arielvalentin
Copy link
Collaborator Author

Ok I had made a mistake and did not use root spans now I have what I need.

image

@arielvalentin
Copy link
Collaborator Author

As I suspected the Ingress span does not get errors assigned to it:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant