Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Amnesthesia · 2023-11-15T02:31:02Z

Describe the bug

This is possibly not a bug in graphql-ruby, and I apologize in advance if that's the case, but I've just run out of ideas on what this could be and thought I'd open this bug report in case this is something anybody else has run into before, or if it's a common problem that I just can't find much information on.

Basically, we see a lot of these errors in our AppSignal monitoring, but I can't reproduce it locally. We don't know if it's something that happens when we deploy new versions and somebody is using the site with (possibly now disconnected websockets?) or what's been going on really

Versions

graphql version: 2.1.1
rails (or other framework): 7.0.8
other applicable versions (graphql-batch, etc):

graphql-pro (1.24.11)
    graphql (2.1.1)
    graphql-metrics (5.0.7)
    graphql-rails_logger (1.2.4)
    rubocop-graphql (1.4.0)
graphql (= 2.1.1)

Steps to reproduce

I'm sorry but honestly, no clue. We can't reproduce this locally, it only happens in our production and staging environments (on Amazon Fargate using ElastiCache Redis)

Expected behavior

Not to throw this error, or to find the subscription

Actual behavior

What specifically went wrong?

Place full backtrace here (if a Ruby exception is involved):

Click to view exception backtrace


lib/action_cable/connection/subscriptions.rb:75 find |  
-- | --
lib/action_cable/connection/subscriptions.rb:47 remove |  
lib/action_cable/connection/subscriptions.rb:18 execute_command |  
lib/action_cable/connection/base.rb:89 dispatch_websocket_message |  
lib/action_cable/server/worker.rb:59 block in invoke |  
lib/active_support/callbacks.rb:118 block in run_callbacks |  
lib/semantic_logger/base.rb:190 block in tagged |  
lib/semantic_logger/semantic_logger.rb:346 tagged |  
lib/semantic_logger/base.rb:202 tagged |  
lib/rails_semantic_logger/extensions/action_cable/tagged_logger_proxy.rb:8 tag |  
lib/action_cable/server/worker/active_record_connection_management.rb:16 with_database_connections |  
lib/active_support/callbacks.rb:127 block in run_callbacks |  
lib/action_cable/engine.rb:71 block (4 levels) in <class:Engine> |  
lib/active_support/execution_wrapper.rb:92 wrap |  
lib/action_cable/engine.rb:66 block (3 levels) in <class:Engine> |  
lib/active_support/callbacks.rb:127 instance_exec |  
lib/active_support/callbacks.rb:127 block in run_callbacks |  
lib/active_support/callbacks.rb:138 run_callbacks |  
lib/action_cable/server/worker.rb:42 work |  
lib/action_cable/server/worker.rb:58 invoke |  
lib/action_cable/server/worker.rb:53 block in async_invoke |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:352 run_task |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:343 block (3 levels) in create_worker |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 loop |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 block (2 levels) in create_worker |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 catch |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 block in create_worker

The text was updated successfully, but these errors were encountered:

Amnesthesia · 2023-11-15T02:39:20Z

Actually, this being raised in subscriptions.rb when the identifier can't be found in the @subscriptions array, could this be because we run multiple containers, and the user may be subscribed at another container?

rmosolgo · 2023-11-15T18:49:32Z

Hi! Sorry for the trouble and thanks for the detailed report. What I expect on re-deploy is for clients to reconnect, re-sending GraphQL subscriptions as needed. A couple of quick thoughts:

What caused this error to start occurring? Did you just add GraphQL subscriptions to your app, or did something else change that caused these to appear?
Reviewing the backtrace above, I don't see anything GraphQL-related. Is there any indication that this is GraphQL-specific? (Are there other uses of ActionCable that work properly in your application?)
You mention that you can't replicate the error locally. If you're using the app in production during a deploy, do you see anything happening in your browser?
In AppSignal, can you see any trends in the occurences in this bug? (Maybe browser? I can imagine browsers handling this event in different ways...)

Amnesthesia · 2023-11-16T01:28:56Z

@rmosolgo Right, so to address your questions (and what I think is going on here as well):

This error has been occurring for a very long time for us, but it doesn't seem to cause any issues. We've had a lot of ideas on what it could be, and none of those have turned out to be it. We haven't really prioritized it because like I said, it seems pretty harmless
You're right, and I realized after posting this here that this may not be graphql-related and could be actioncable-related
3 & 4. We're not seeing anything related to the browser type for this, seems to be a mixed bag

I think it's that we have this in our GraphQLChannel:

def subscribed
    @subscription_ids = []
end

This would start each instance on a blank slate of connections, so if we're scaling and running 3-4 instances, they're all pushing to the same redis. One instance may pick up a subscriptions.trigger(...) from puma on another instance, but doesn't have the gid in its @subscription_ids, because its not being shared.

Perhaps @subscription_ids should be pointing to redis to maintain a shared list of subscribed clients?

rmosolgo · 2023-12-18T20:25:55Z

My intention on @subscription_ids is for that to be a per-instance list of subscribers to that instance. So it works like:

subscriptions.trigger is called in one Ruby process, it calls .broadcast to notify other processes of the trigger:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Lines 116 to 122 in 8270480

 # An event was triggered; Push the data over ActionCable. 

 # Subscribers will re-evaluate locally. 

 def execute_all(event, object) 

 stream = stream_event_name(event) 

 message = @serializer.dump(object) 

 @action_cable.server.broadcast(stream, message) 

 end

meanwhile, any ActionCable process that received a subscription should have setup a stream_from for that subscriber:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Line 169 in 8270480

channel.stream_from(stream_event_name(initial_event), coder: @action_cable_coder) do |message|
So, when a process receives a notification from a trigger, it runs that stream_from block and generates a payload. Then it broadcasts again, using the specific subscription ID, to update the client:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Line 129 in 8270480

@action_cable.server.broadcast(stream_subscription_name(subscription_id), payload)

I hope that helps!

Amnesthesia · 2024-01-02T21:58:53Z

@rmosolgo Thank you for taking the time out of the day to explain this a bit, appreciate it! :) We still haven't worked this issue out. We realized we weren't running ActionCable standalone in production, and thought perhaps this would be solved when we move ActionCable to be its own standalone service, but (I presume) because we also scale this out, this "issue" still persists.

We don't really know to what extent its an issue, or if it is an issue at all — e.g, are we getting a certain percentage of lost websockets pushes? or are these errors completely harmless?

So, when a process receives a notification from a trigger, it runs that stream_from block and generates a payload. Then it broadcasts again, using the specific subscription ID, to update the client

In this sense, should an ActionCable process that does not have the subscription_id in its list of subscribers simply not pick this up? Do you see a way that it would happen in a multi-ActionCable setup that ActionCable processes routinely try to handle messages meant for subscriber_ids that they dont have?

Amnesthesia · 2024-04-04T09:09:01Z

@rmosolgo We've moved away from scaling actioncable horizontally and have instead scaled it up, increasing the spec of the machine significantly and only running one machine. We are still seeing these errors every day and we're quite lost at where they come from

We've reduced our identified_by attributes as well to a single one, identified_by :_strategy where _strategy is a custom class managing the JWT token and current user

rmosolgo · 2024-08-06T15:04:20Z

I'm sorry I don't have anything better to offer on debugging this ... to answer those very old questions:

In this sense, should an ActionCable process that does not have the subscription_id in its list of subscribers simply not pick this up?

Yes, that sounds right to me... I think a process would only call stream_from during write_subscription, which is to say, processes without subscription IDs wouldn't call stream_from.

Do you see a way that it would happen in a multi-ActionCable setup that ActionCable processes routinely try to handle messages meant for subscriber_ids that they dont have?

How I expect that to work is:

Channels call subscription.delete(...) as described here:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Lines 58 to 62 in 8270480

# def unsubscribed

# @subscription_ids.each { |sid|

# MySchema.subscriptions.delete_subscription(sid)

# }

# end

delete_subscription removes the subscription from the process's data:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Lines 224 to 227 in f068799

 def delete_subscription(subscription_id) 

 query = @subscriptions.delete(subscription_id) 

 # In case this came from the server, tell the client to unsubscribe: 

 @action_cable.server.broadcast(stream_subscription_name(subscription_id), { more: false })

I notice that it sends more: false to the client there in delete ... maybe that's the problem here?

I pushed a branch which adds a rescue to that code, in case you want to try it: #5053

rmosolgo · 2024-08-12T14:35:15Z

Hey, I spent some more time with this and I was able to replicate the error in the test app in this repository. I could replicate this error by clicking "unsubscribe" twice. The first time, it unsubscribed (using ActionCable.Subscription.unsubscribe), but the second time, my client code called that function again and Rails logged the same error described here.

Here's a screencap:

So, I don't think #5053 is related -- this error is between the client and ActionCable, without any (Ruby) GraphQL-Ruby code involved.

In the test app, I solved this problem by adding an _unsubscribed flag to my unsubscribe function:

unsubscribe() {
  // Don't re-call `unsubscribe` to avoid Rails logging an error 
  if (!this._unsubscribed) {
    this._unsubscribed = true 
    this.subscription.unsubscribe()
  }
}

Would an approach like that work for you?

Also, reviewing this issue, I don't see which ActionCable client you're using. Are you using one of the ones from graphql-ruby-client? If so, I'd be happy to investigate an improvement there to include a fix like the one above.

I'm going to close this issue since it's gone cold, but if you get back to me about the client code, I'll be happy to re-open it until the client is improved.

steobrien · 2024-09-03T17:42:04Z

Hey @rmosolgo, thanks for the writeup and action. We're seeing this issue (constantly) in production. We're using the createRelaySubscriptionHandler and ActionCable, and have a pretty straightforward Relay implementation:

requestSubscription on component mount
subscription.dispose on unmount

Is there somewhere you could include the idempotent unsubscribe fix above for us, or should we add it somewhere on our side?

rmosolgo · 2024-09-03T18:45:08Z

@steobrien, in my case, the error was coming from calling unsubscribe() too many times. Is there a chance that Relay calls .dispose() for you? If so, you should remove the call to it yourself. Taking a quick look at the docs, it doesn't look like Relay expects you to call that yourself, so maybe it'll do that itself. Want to give it a try?

If that doesn't work for you, I can try adding a "once and only once" check here:

graphql-ruby/javascript_client/src/subscriptions/createActionCableHandler.ts

Lines 68 to 70 in 4836730

 dispose: function() { 

 channel.unsubscribe() 

 }

steobrien · 2024-09-03T19:09:09Z

I don't think Relay can do it automatically for us in this context – requestSubscription is an effect, so we need to manually clean it up. I think that's why it returns a Disposable. if we didn't do that, I believe we'd end up with a bunch of duplicate subscriptions that would build up over time.

But having said that, our useEffect implementation should guarantee "once and only once" already. So let me see if I can figure out why it could be attempting to dispose multiple times.

steobrien · 2024-09-03T19:34:38Z

I'm seeing everything behave sensibly in our implementation. Additionally, it looks like Relay's subscription dispose is already idempotent:

I confirmed this by calling dispose() numerous times – only the first sent a message on the websocket, resulting in the expected server log line:

GraphqlChannel stopped streaming from graphql-subscription:9c56ecc8-a313-4c65-8e9c-e8eb0d0f10da

Subsequent calls appeared to no-op.

So, it seems like something more mysterious is happening here.

rmosolgo · 2024-09-03T19:50:25Z

Ok, thanks for giving it a try. Could you open a new issue so we can keep investigating it? It'd be great to have:

The full error message and back trace from rails
Setup code you're using to hook up GraphQL-Ruby's JS to your Relay app
A description of what you do in the app to make the error message happen

Then we can dig in further!

steobrien · 2024-09-03T19:54:56Z

Sure thing. To your last bullet point, I'd love to figure out a way to reproduce, though. Right now, we see it frequently on production (1000 times today) but I'm struggling to figure out the pattern of events. I'll let you know if I can isolate that.

rmosolgo mentioned this issue Aug 6, 2024

ActionCable: ignore irrelevant error #5053

Closed

rmosolgo closed this as completed Aug 12, 2024

steobrien mentioned this issue Sep 5, 2024

Unable to find subscription with identifier: {"channel":"GraphqlChannel"} (RuntimeError) #5086

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Amnesthesia commented Nov 15, 2023

Amnesthesia commented Nov 15, 2023 •

edited

Loading

rmosolgo commented Nov 15, 2023

Amnesthesia commented Nov 16, 2023

rmosolgo commented Dec 18, 2023

Amnesthesia commented Jan 2, 2024

Amnesthesia commented Apr 4, 2024 •

edited

Loading

rmosolgo commented Aug 6, 2024

rmosolgo commented Aug 12, 2024

steobrien commented Sep 3, 2024

rmosolgo commented Sep 3, 2024

steobrien commented Sep 3, 2024

steobrien commented Sep 3, 2024

rmosolgo commented Sep 3, 2024

steobrien commented Sep 3, 2024

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Comments

Amnesthesia commented Nov 15, 2023

Amnesthesia commented Nov 15, 2023 • edited Loading

rmosolgo commented Nov 15, 2023

Amnesthesia commented Nov 16, 2023

rmosolgo commented Dec 18, 2023

Amnesthesia commented Jan 2, 2024

Amnesthesia commented Apr 4, 2024 • edited Loading

rmosolgo commented Aug 6, 2024

rmosolgo commented Aug 12, 2024

steobrien commented Sep 3, 2024

rmosolgo commented Sep 3, 2024

steobrien commented Sep 3, 2024

steobrien commented Sep 3, 2024

rmosolgo commented Sep 3, 2024

steobrien commented Sep 3, 2024

Amnesthesia commented Nov 15, 2023 •

edited

Loading

Amnesthesia commented Apr 4, 2024 •

edited

Loading