Replies: 1 comment 1 reply
-
3.11.28 is an old version that is out of community support. Please try your test against the only supported community version: 3.13.x |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
When a node in a cluster is restarting, running
rabbitmqctl list_queues
on another node can sometimes cause the second node to hang. All subsequentrabbitmqctl
andrabbitmq-diagnostics
commands fail because the target node is unreachable. The hung node also shows high, persistent CPU use by thebeam.smp
erlang process. The node is recovered only when we restart the container running RabbitMq.There are previous reports of
list_queues
command hanging but in this case it is the node that it is run against that hangs. Thelist_queues
command itself exits immediately with the following error:We have been able to reproduce the issue with both
list_queues
andlist_unresponsive_queues
on version 3.11.28.Reproduction steps:
rabbitmqctl stop_app
etc.:list_queues
command loop:Corresponding output from node 2:
The
for loop
completes successfully on node 2 and node 2 remains responsive.In some cases, node 1 does NOT become unresponsive and the
for loop
continues even after receiving thebadrpc
error:Other information
rabbit.log
on the node that is hung. Even enabling debug logs did not yield anything new.erl -remsh
also times out:rabbitmq-diagnostics
on the node before it hangs also eventually times out when the node becomes unresponsive:top
output showing 100% CPU core usage:cluster_status
on other nodes shows that status of hung nodes isunknown
:Beta Was this translation helpful? Give feedback.
All reactions