Client connection frame writer runs into a TCP socket write timeout: {writer,send_failed,timeout} #10113
-
Describe the bugThe following RabbitMQ error message {writer,send_failed,timeout} closed the AMQP connection , with the following stack trace on the RabbitMQ Server logs RABBITMQ SERVER LOGS```023-12-06 08:29:20.599227-08:00 [error] <0.20029.3> closing AMQP connection <0.20029.3> (10.135.45.50:44966 -> 10.61.253.65:5671): Sharing the logs seen on the Client side ( Our product ) CLIENT LOGS``` Dec 06 08:28:51 ppollpghn00092 node[2091259]: 2023-12-06T16:28:51.937Z - error: audit_id=undefined, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@tmobile/app-sync_queue/cog.js:976, message=[ 'ppollpghn00092 is not the working server: ppollpghn0008e.' ] Dec 06 08:28:55 ppollpghn00092 node[2091318]: 2023-12-06T16:28:55.290Z - info: audit_id=98576c40-ba91-424f-b0c4-a5d34290357b, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@tmobile/app-trailblazer/service/business.js:618, message=[ Dec 06 08:28:55 ppollpghn00092 node[2091318]: 'Querying for shelves using this device name input: ARACHR32% - Response from server: Recieved 200 status, body length: 1 - Pronghorn user = ACervan36' Dec 06 08:28:55 ppollpghn00092 node[2091318]: ] Dec 06 08:29:25 ppollpghn00092 node[2090045]: 2023-12-06T16:29:25.294Z - error: audit_id=undefined, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@itential/database/dist/index.js:1, message=[ 'Connection to MongoDB has been lost' ] Dec 06 08:29:26 ppollpghn00092 node[2090045]: 2023-12-06T16:29:26.579Z - error: audit_id=undefined, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@itential/network/dist/Network.js:1, message=[ 'Connection to RabbitMQ has been lost: Unexpected close' ] Dec 06 08:29:26 ppollpghn00092 node[2090045]: [IAP RabbitMQ] IAP listener AMQPCM disconnected. Error: Unexpected close Dec 06 08:29:26 ppollpghn00092 node[2090045]: at succeed (/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/amqplib/lib/connection.js:280:13) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at onOpenOk (/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/amqplib/lib/connection.js:262:5) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at /opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/amqplib/lib/connection.js:165:32 Dec 06 08:29:26 ppollpghn00092 node[2090045]: at /opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/amqplib/lib/connection.js:159:12 Dec 06 08:29:26 ppollpghn00092 node[2090045]: at TLSSocket.recv (/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/amqplib/lib/connection.js:507:12) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at Object.onceWrapper (node:events:627:28) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at TLSSocket.emit (node:events:513:28) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at emitReadable_ (node:internal/streams/readable:578:12) Dec 06 08:29:26 ppollpghn00092 node[2090045]: at processTicksAndRejections (node:internal/process/task_queues:82:21) Dec 06 08:29:26 ppollpghn00092 node[2090045]: 2023-12-06T16:29:26.637Z - error: audit_id=70fd0185-e5a6-4c61-8fdf-75762e26feb5, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@itential/database/dist/index.js:1, message=[ 'Connection to MongoDB has been restored' ] Dec 06 08:29:29 ppollpghn00092 node[2091731]: 2023-12-06T16:29:29.742Z - trace: audit_id=undefined, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@itential/service/dist/Service.js:1, message=[ 'netbrain : has received a emit message of type: 10' ] Dec 06 08:29:51 ppollpghn00092 node[2091259]: 2023-12-06T16:29:51.937Z - error: audit_id=undefined, origin=/opt/Itential/pronghorn/itential-premium-20221-patch2_2022.1.12/node_modules/@tmobile/app-sync_queue/cog.js:976, message=[ 'ppollpghn00092 is not the working server: ppollpghn0008e.' ],,, RESULTS Reproduction steps1.Testbed Architecture Expected behaviorRabbitMQ write failure due to timeout must not impact inflight jobs in the queue Additional contextIs there any configuration to enable Retry Mechanism during write failure due to timeout? GOOGLE RESEARCH |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
The message indicates a TCP socket write failure on the RabbitMQ end. It is a symptom (just like missed heartbeats), not the root cause. Your applications must be ready to deal with failing TCP connections. Unacknowledged messages on such connections will be automatically requeued |
Beta Was this translation helpful? Give feedback.
-
@GitJadhav this is no a bug and RabbitMQ cannot magically avoid TCP socket operation failures. I don't know what "huge data related workflows" means exactly but large messages in AMQP 0-9-1 are That said, large messages should be put into a suitable blob store and their ID be sent in a message. I'd say anything larger than 32 MiB should be treated that way. |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin Thank you for sharing your insights Path - etc/rabbitmq/rabbitmq-env.conf heartbeat = 120 |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin
I appreciate if you can spare few minutes and provide some assist - TY CURRENT RABBITMQ CONFIG``` RabbitMQ Conf Settings listeners.tcp = none # configure tcp listener tcp_listen_options.backlog = 1024 tcp_listen_options.nodelay = trueCurrent settings running “rabbitmqctl environment “ FD descriptors setting - rabbitmqctl status Do I need to tune this parameters to extend TCP timeout ?? |
Beta Was this translation helpful? Give feedback.
The message indicates a TCP socket write failure on the RabbitMQ end. It is a symptom (just like missed heartbeats), not the root cause.
Your applications must be ready to deal with failing TCP connections. Unacknowledged messages on such connections will be automatically requeued
and redelivered (when there is an online consumer to deliver to and the prefetch settings allow for more deliveries to it).