-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ActionController::Live #772
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution.
I don't quite understand why setting the last context in the stack as the "root context" fixes this problem. What happens when the parent thread/fiber closes the context?
Doesn't clearing out the fiber local span context stack cause issues for the current thread or fiber?
Would using context propagation be safer here?
Oh boy I did just look at the link you shared and Rails is copying fiber local variables to the child thread 😢 @open-telemetry/ruby-maintainers given that's Rails is potentially causing problems for the SDK here what options do we have? I am not keen on breaking encapsulation of the storage mechanism used by the SDK. My instincts here are to use context propagation anytime a thread is created but I think in this case the code is trying to enrich the rack span, which is bound to a different thread. Enriching a span bound to a different thread also makes me a bit uneasy because it may be finished at anytime and the attributes won't ever be amended to it if they were. I'm also worried about the context being closed and detached in the parent thread; but then again the fact that the Rails framework is copying the fiber local stack means this problem already exists for alive controller requests. |
Thanks for taking a look @arielvalentin!
Do you mean using something like a tracepoint to propagate the context whenever a thread is created, rather than trying to just fix for this one specific case? Is there a good example of how to instrument threaded code through context propagation elsewhere? I must admit I'm relatively new to OpenTelemetry and this codebase, but I'd love to work on a more suitable fix if this proposed patch has issues. |
Contexts are intended to be thread local and state should not be shared across threads. What I mean when I say "context propagation" is to use the OpenTelemetry context propagators to extract the trace context to the child thread and treat any child threads as if they were their own "sub service" akin to messaging, http, or rpc. This is the approach we use for Golang programs at GH that make heavy use of go-routines. I thought this model would have been a good fit for alive but now I don't know if it will work given the rack span is something we want to share with the process controller subscriber so I'm unsure what to do next. If we use context propagation it will result in a new "ingress" point and make it look like the code running in the thread is its own service so I'm not confident in what I think the best thing to do given Rails is doing something I don't expect in sharing the parent threads fiber local variables. |
Hi @thomasmarshall and @arielvalentin! I work with OTel approver @kaylareopelle who mentioned this PR. Very cool work y'all are doing 🎉 I want to share another potential avenue for Recently, instrumentation was added for |
@hannahramadan is that being backported to rails 6.1? |
@arielvalentin Ah no — |
Thanks for the explanation. I'm not familiar with streaming responses so I'm not sure what the differences are between Live in Rails 6.1 and 7. Assuming we're able to use the notifications for Rails 7 what would we provide users of 6.1? |
@arielvalentin do you mean the current instrumentation that adds attributes to the rack span in a My understanding is that the response is only sent when we start writing to the stream, so I would expect adding attributes to the rack span in
A few options I can think of:
@hannahramadan thanks for the suggestion! I could definitely look into adding Rails instrumentation for Live actions. At the moment, if payload[:controller].constantize < ::ActionController::Live Though with a notification handler we wouldn't be able to use block forms like |
@thomasmarshall I am referring to the Span injected by the Rack instrumentation. Depending on the version you are using it is either injected by a "classic" Rack Middleware or a Rack::Events Handler: The ActionPack instrumentation uses a Helper method If we go back in time the previous version did the same thing, i.e. it loads the Rack span instead of creating a new one: Line 15 in c4d33b3
Both of these cases rely on Fiber Local Variables. If the same Rack Span and Context local stack is being shared across multiple unrelated requests, then that is a problem. I do not know enough about SSE and Rack's implementation to understand this well enough to make an informed decision here. BUT If I understand this comment correctly... # This processes the action in a child thread. It lets us return the
# response code and headers back up the Rack stack, and still process
# the body in parallel with sending data to the client. The client is bound to a single TCP connection. In the case where the server is multi-threaded it will be a connection per thread. That implies there is a single Rack span and fiber local context is created for the TCP Connection. Rails then copies the fiber local variables to the child "handler" thread, which would end up including the Rack Span that was created in the thread handling the TCP connection: Since the Live::Response has a specialized buffered response, Rack is able to send over initial http headers and line by line responses: https://github.com/rails/rails/blob/main/actionpack/lib/action_dispatch/http/response.rb#L538 ... But wait... this is using buffered responses!!! Buffered Responses and the "Classic" MiddlewareThe Rack Classic Middleware implementation does not support buffered responses correctly! It closes the span too early, before the response is completed because it first exits the middleware stack running the Assuming the behavior/functionality is the same with In order to support buffered responses you will have to use the You can enable this feature by setting What about the Fiber local problem?I don't know what to do about this. I imagine this would cause a problem for any number of gems that relies on Fiber local storage to do things and I am quite frankly shocked this isn't causes a whole bunch of problems for other gems. I need the @open-telemetry/ruby-maintainers to chime in here but if my hunch is right, then using What do we do next?Would you be amenable to giving a go to using |
@arielvalentin using the latest |
Nope. Just wanted to see if the Rack Events version works without the patches here. Going to calls in @open-telemetry/ruby-maintainers once again. Any ideas? Thoughts? |
This commit adds a patch to ensure `ActionController::Live` actions, which are processed in a new thread, are instrumented as a child span in the current trace. This is a patch rather than an `ActiveSupport::Notifications` handler. Although the `action_controller.process_action` event is processed in the new thread, we don't have a reference to the controller or anything to indicate that it is an `ActionController::Live` controller. The patch needs to be included rather than prepended, as `ActionController::Live` is a module that is included in each controller as required by the host app, rather than a class to inherit from. As such, `include` is the only way to get our `#process_action` method into the right place. We need to manually unset the `:__opentelemetry_context_storage__` thread local, as this gets copied across from the parent thread and points to the same stack array. Calling `#clear` would modify that shared stack array, so we need to unset it instead.
8613d1f
to
d278eb4
Compare
# Unset thread local to avoid modifying stack array shared with parent thread | ||
Thread.current[:__opentelemetry_context_storage__] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bad. We shouldn't have to reach into the API internals in this way. This happens because OpenTelemetry::Context.clear
is implemented as stack.clear
, which empties the backing array instead of replacing it with a new empty instance.
I think it'd be worthwhile to change the implementation of OpenTelemetry::Context.clear
to be Thread.current[STACK_KEY] = []
.
On the other hand, I think there is a race condition when fetching the current context. Specifically, since the thread-locals are copied, the backing array for the context stack is shared with the parent thread, and if that parent thread modifies its context before this thread retrieves it, the context used in this thread will be incorrect.
Side note: the tracing library we built at Shopify before OpenTelemetry Ruby patched ActionController::Live
in a very similar way, other than unsetting the thread local. It worked there because the thread-local was a Span
instead of an array, and we managed the "stack" as a linked list of spans.
I think it might be preferable to explicitly propagate context in some way, but I'm not familiar enough with ActionController::Live
to know how to accomplish that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could leverage/patch ActiveSupport::IsolatedExecutionState.share_with
in some way? 🤔
That would be a good place to "propagate" the context, probably (if we fix Context.clear
) by:
current_context = OpenTelemetry::Context.current
OpenTelemetry::Context.clear
OpenTelemetry::Context.attach(current_context)
It's a bit of a hack, in that we're not going to Context.detach
. The patch for Live
could call OpenTelemetry::Context.clear
after the in_span
call.
I.e. our patch for ActiveSupport::IsolatedExecutionState.share_with
would be:
def share_with(other)
super
current_context = OpenTelemetry::Context.current
OpenTelemetry::Context.clear
OpenTelemetry::Context.attach(current_context)
end
And our patch for Live
would be simply:
attributes = {
OpenTelemetry::SemanticConventions::Trace::CODE_NAMESPACE => self.class.name,
OpenTelemetry::SemanticConventions::Trace::CODE_FUNCTION => action_name
}
Instrumentation.instance.tracer.in_span("#{self.class.name}##{action_name} stream", attributes: attributes) do
super
end
OpenTelemetry::Context.clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fbogsany. I will open a PR in opentelemetry-ruby
to propose your suggested change to Context.clear
. I'll also probably close this PR and reopen a new one with a different approach, using explicit propagation like you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a linked list implementation of the context stack in open-telemetry/opentelemetry-ruby#1597 but the performance impact was significant (~9% with the interpreter, as much as 24% with yjit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open-telemetry/opentelemetry-ruby#1598 is merged. Once we cut a new release of the API, we can use that as a minimum version here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasmarshall - A new version of the API has been released! Could you incorporate it into this PR?
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
I tried the latest versions of As such I beleive this PR is no longer needed, so I'll close it. Thank everyone for your help with this! |
This commit adds a patch to ensure
ActionController::Live
actions, which are processed in a new thread, are instrumented as a child span in the current trace. Previously, we would lose the current context and instrumented code insideActionController::Live
actions would appear as separate traces.This is a patch rather than an
ActiveSupport::Notifications
handler. Although theaction_controller.process_action
event is processed in the new thread, we don't have a reference to the controller or anything to indicate that it is anActionController::Live
controller.The patch needs to be included rather than prepended, as
ActionController::Live
is a module that is included in each controller as required by the host app, rather than a class to inherit from. As such,include
is the only way to get our#process_action
method into the right place.We need to manually unset the
:__opentelemetry_context_storage__
thread local, as this gets copied across from the parent thread and points to the same stack array. Calling#clear
would modify that shared stack array, so we need to unset it instead.Example