Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RabbitMQ: Queues not recreated in all circumstances #628

Open
ssilve1989 opened this issue Jul 27, 2023 · 9 comments
Open

RabbitMQ: Queues not recreated in all circumstances #628

ssilve1989 opened this issue Jul 27, 2023 · 9 comments
Labels
bug Something isn't working rabbitmq stale

Comments

@ssilve1989
Copy link

ssilve1989 commented Jul 27, 2023

As originally outlined here: #239 with an updated comment by me #239 (comment)

There is an issue with queue creation when connected to a Rabbit cluster.

I can consistently reproduce the following case:

  • serve a 3-node rabbit cluster in k8s
  • connect N replicas of a service with 1 unnamed queue each
    @RabbitSubscribe({
      exchange: DIRECT_EXCHANGE,
      queueOptions: {
        autoDelete: true,
        durable: false,
      },
      routingKey,
})
  • Rabbit Management GUI shows N queues. This is correct
  • restart one of the 3 nodes in the rabbit cluster
  • Rabbit Management Console shows the queues for the replicated services that were connected to that node disappear.
  • Total number of queues in Rabbit GUI is now N - replicas on that node

These replicas when reconnected to rabbit, as evidenced by a k8s health check and checking managedConnection.isConnected() never re-create a new queue binding for the service.

Interestingly enough, if you restart all nodes in the rabbit cluster in rolling restart fashion, when the final node has restarted then all queues will be recreated, but until then each replica when reconnecting does not make any new queues.

@ssilve1989
Copy link
Author

Upon further investigation, it looks like what's happening is:

  • Service A is connected to Node 1, but the queue exists on another Node. Lets say Node 2 for example.
  • Node 2 disappears (restarts, errors, is removed from cluster, etc). The queue is removed.
  • Service A is in limbo now, it has not received any error event on the channel listener, and is no longer subscribed to anything. It is no longer going to receive any messages, but its connection is fine

I would have expected an error event to have been emitted on the Channel, but that doesn't look like it would really do anything anyway since the library only logs something on that event handler.

@underfisk
Copy link
Contributor

@ssilve1989 Thanks for the repro and thorough investigation. As of now i probably don't have that much time free to provide a fix for this as it might be also an issue with the underlying library or just how we setup/configure the connection manager.
If you're willing to provide a fix that would be very much appreciated 🙏

@underfisk underfisk added rabbitmq bug Something isn't working labels Aug 4, 2023
@ssilve1989
Copy link
Author

Implementing a fix would be possible if the error event actually emitted on the channel. My naive understanding is that it should emit an error when this happens but it doesn't. Maybe thats a problem with amqplib. I was also able to reproduce this in a single-node instance by just deleting the queue from the Management console, no error event was emitted.

My naive workaround atm is to patch the @golevelup/nestjs-rabbitmq library to track the queues it creates and as part of a k8s health check, query those and call checkQueue to make sure they still exist. Maybe the library already has a way to get the auto-generated queue names? I didn't see one though.

I'll see if I can figure something out with amqplib when I get some time

@underfisk
Copy link
Contributor

@ssilve1989 Thank you!

@ssilve1989
Copy link
Author

ssilve1989 commented Aug 7, 2023

So as pointed out here amqp-node/amqplib#736 it looks like we'd need to respond to a null message as the signal that consumption has stopped and to re-create the queue. The library could/should probably expose options on what to do in such a case, such as throw or implicitly re-create the queue.

@WonderPanda
Copy link
Collaborator

So as pointed out here amqp-node/amqplib#736 it looks like we'd need to respond to a null message as the signal that consumption as stopped and to re-create the queue. The library could/should probably expose options on what to do in such a case, such as throw or implicitly re-create the queue.

Nice this is a great discovery! Do you have an interest in helping to add this functionality to the library?

@ssilve1989
Copy link
Author

@WonderPanda Yea, I'm looking into adding just a similar option like the other error handlers as consumerCancellationHandler and letting the consumer decide what to do. At least for my use-case, I'll probably just throw an error, but I think passing along the channel + queueName might be ok for other use-cases.

@ssilve1989
Copy link
Author

I have something started here but haven't throughly tested it yet. I don't think I'll have time to test it until maybe the end of the week.

I'm not sure this approach is suitable though for cases where the consumer wants the library to auto-recreate/subscribe to the queue again, since providing the channel/queue like this doesn't enable them to bind back to the decorated methods right?

Copy link

github-actions bot commented Oct 2, 2024

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rabbitmq stale
Projects
None yet
Development

No branches or pull requests

3 participants