Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

Open
mikehank96 opened this issue Sep 30, 2021 · 9 comments
Open

Comments

@mikehank96
Copy link

We have a cluster with 2 nodes and have been regularly getting random NoQuorum errors when creating many locks to it.
I've experimented by making creating 500 locks to it and usually get from 1-20 NoQuorum errors. When doing the same with a single node redis instance I don't get the errors. The readme states Using replicated instances is not the suggested way to use RedLock but also says it supports it so I'm not sure if these errors are expected or not. Currently using RedLock.net 2.3.1.

@mikehank96
Copy link
Author

Additional info:
Factory creation:

var lockFactory = RedLockFactory.Create(new List<RedLockEndPoint>
				{
					new RedLockEndPoint
					{
						EndPoint = new DnsEndPoint(redisConfigurationEndpoint, port),
						Password = secret,
						Ssl = true,
						RedisKeyFormat = "MyKey_{0}",
					}
                                   });

Lock creation:

await _redLockFactory.CreateLockAsync(resource, TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(1))

@samcook
Copy link
Owner

samcook commented Sep 30, 2021

Are you two servers a master/replica configuration, a Redis Cluster (with keys sharded across both), or two independent servers?

And in your RedLockFactory configuration, are you only connecting to one of them?

@mikehank96
Copy link
Author

mikehank96 commented Sep 30, 2021

Redis Cluster. We're using terraform and it's all created in one aws_elasticache_replication_group resource. As for the configuration I'm using the configuration endpoint which I figured that handles all connections.

@mikehank96
Copy link
Author

So I tried bosting the number of lock requests in my tests from 500 to 700 and now I get the NoQuorum errors from the single node redis instance as well so my it isn't a number of nodes issue.

@mikehank96
Copy link
Author

After some more experimentation I realized if I increase the wait time for the lock creation the errors disappear. Is it possible for NoQuorum statuses to be returned for timeouts?

@samcook
Copy link
Owner

samcook commented Oct 5, 2021

Yes, that is possible. If an attempt to acquire a lock in an instance doesn't complete within the timeout it is treated as a failure, and if there aren't enough successfully acquired instances to meet the quorum then it will fail with NoQuorum.

The quorum required is floor(n/2 + 1), where n is the number of independent instances you have.

Instances Quorum
1 1
2 2
3 2
4 3
5 3

@ryangardner
Copy link

Does Redlock do some kind of introspection when it is passed the endpoint? This is being used with an AWS elasticache "cluster mode enabled" cluster - it has 3 shards and 6 nodes. AWS docs state that you should just do all your writes through their "connection endpoint"

Redis (cluster mode enabled) clusters, use the cluster's Configuration Endpoint for all operations that support cluster mode enabled commands. You must use a client that supports Redis Cluster (Redis 3.2). You can still read from individual node endpoints (In the API/CLI these are referred to as Read Endpoints).
( https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Endpoints.html )

Does Redlock (or the underlying stackexchange redis driver) somehow look up the cluster information from that single endpoint, or do we need to configure things differently in order to use a cluster like this in AWS. Somehow it must know that it doesn't just have a single instance (otherwise why would it throw errors about NoQuorum)

It seems that the recommended way to do it would be to use multiple (3) standalone redis nodes and pass each of them in as endpoints to the redlock and then redlock will maintain a quorum with those standalone nodes on its own?

@samcook
Copy link
Owner

samcook commented Oct 6, 2021

I haven't used Redis on AWS myself, so I'm not too sure whether they do things any differently to a standard Redis Cluster.

Does Redlock (or the underlying stackexchange redis driver) somehow look up the cluster information from that single endpoint, or do we need to configure things differently in order to use a cluster like this in AWS. Somehow it must know that it doesn't just have a single instance (otherwise why would it throw errors about NoQuorum)

RedLock.net doesn't do anything specific to look up cluster information - if anything happens there it would be within StackExchange.Redis.

If you are only providing one RedLockEndPoint (or one ConnectionMultiplexer, if you are using them directly) when you create your RedLockFactory then RedLock.net will treat your cluster as a single instance.

It is possible to get NoQuorum responses even with a single instance if that instance doesn't acquire a lock within the timeout period (in this situation the quorum is 1 and locks were acquired in 0 instances).

It seems that the recommended way to do it would be to use multiple (3) standalone redis nodes and pass each of them in as endpoints to the redlock and then redlock will maintain a quorum with those standalone nodes on its own?

Yes, that would be the suggested way to do it if you want more resilience than is offered by a single standalone instance.

@mikehank96
Copy link
Author

What's the difference between retries configured by RedLockRetryConfiguration and retries configured by CreateLockAsync? When setting RedLockRetryConfiguration instead of the CreateLockAsync ones the NoQuorum errors seemed to stop. Looking at the source code it looks like they almost do the same thing except one loops outside of AcquireAsync and one loops inside it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants