Replies: 7 comments 6 replies
-
Sounds like a bug in next/fetch consumer. What are your client/server versions? |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you very much for the quick responses. On the client side I have version 2.0.1 and on the server side I currently have 2.10.7, although when I started with this problem I had an older one and have been updating as new versions have been released (sorry for not remembering the exact versions). |
Beta Was this translation helpful? Give feedback.
-
Out of curiosity what's your stream and consumer configurations? Do you ack the messages received? |
Beta Was this translation helpful? Give feedback.
-
I have been running my test with constantly failing/recovering cluster nodes but I can't reproduce the issue. btw I'm testing against latest and one thing I noticed is that in 2.0.2 we introduced no-responders feature. I also noticed in your report above you did mention So on the publish side it's expected to have try
{
var ack = await js.PublishAsync(subject, data, opts: new NatsJSPubOpts
{
RetryAttempts = 10,
RetryWaitBetweenAttempts = TimeSpan.FromSeconds(1)
});
...
}
catch (NatsNoRespondersException)
{
// e.g. log warning
await Task.Delay(3000); // back-off with increasing delay
} Now for consumers (i.e. So in the meantime, my suggestion is to upgrade to the latest stable version of the client (which is 2.0.3 at the moment). Edit: I've been running a test (over 24 hours now) against a cluster with filing nodes every 15 seconds. I can't reproduce the issue of NextAsync() hanging unfortunately. @ValMati let me know when you get a chance if you made any progress on this. |
Beta Was this translation helpful? Give feedback.
-
First of all I apologise, I wrote the wrong exception (I have already fixed it in the title and my first post). The exception NATSNoRespondersException happens to me from time to time in an old version using NATS.Net v1. In the current version, which is the reason why I opened this thread, using version 2.0.1 the consumer "freezes" in NextAsync and PublishAsync throws the exception NatsJSPublishNoResponseException. I think in my first post I didn't make myself clear, the VMs that are restarted are the ones on which the application is running, not NATS. Finally, following your advice I'm testing with version 2.0.3 and there doesn't seem to be any problem. I'll continue testing this afternoon to see if I can solve the problem by simply upgrading. |
Beta Was this translation helpful? Give feedback.
-
The last days I have been looking for how to reliably reproduce the error and have not found anything conclusive. The only thing I have seen is that when the application neither consumes nor publishes, this log appears on the NATS server. The third line only appears when there are problems: However in the logs of the application everything seems correct until I try to publish a message.... |
Beta Was this translation helpful? Give feedback.
-
I have finally fixed the problems I was experiencing. The problem was neither in the NATS NuGet nor in my code. I have NATS deployed as a cluster with 3 nodes, because for some reason one of them was corrupted and when my application connected to that node it could neither consume nor publish messages. In the end I solved it by completely deleting the cluster, including the permanent volumes, and recreating it again. I haven't been able to identify when the node got corrupted, but I guess it was an update. In any case, thank you very much @mtmk for your kind and quick replies. |
Beta Was this translation helpful? Give feedback.
-
I have an application deployed on virtual machines running on IIS that uses NATS JetStreams to manage the queueing and execution of tasks that take quite some time to complete.
On publish we check that the publish has been done successfully
And the consumption of messages is done by pull:
The application works correctly except when for some reason the VM has to be restarted. Sometimes after restarting the VM I have two problems:
Checking the logs of our application, it seems that the connection is successful:
I have also found in the NATS logs that the message arrives that an attempt to publish has been made, confirming that the connection exists.
On the other hand I have tried to simulate this behaviour by raising different instances of the containerised application on our machines, attacking the same NATS cluster and we have not been able to reproduce the problems we experienced in the VMs.
Any idea where the problem might be?
Regards and many thanks
Beta Was this translation helpful? Give feedback.
All reactions