Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

mikehank96 · 2021-09-30T20:36:18Z

We have a cluster with 2 nodes and have been regularly getting random NoQuorum errors when creating many locks to it.
I've experimented by making creating 500 locks to it and usually get from 1-20 NoQuorum errors. When doing the same with a single node redis instance I don't get the errors. The readme states Using replicated instances is not the suggested way to use RedLock but also says it supports it so I'm not sure if these errors are expected or not. Currently using RedLock.net 2.3.1.

The text was updated successfully, but these errors were encountered:

mikehank96 · 2021-09-30T21:02:07Z

Additional info:
Factory creation:

var lockFactory = RedLockFactory.Create(new List<RedLockEndPoint>
				{
					new RedLockEndPoint
					{
						EndPoint = new DnsEndPoint(redisConfigurationEndpoint, port),
						Password = secret,
						Ssl = true,
						RedisKeyFormat = "MyKey_{0}",
					}
                                   });

Lock creation:

await _redLockFactory.CreateLockAsync(resource, TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(1))

samcook · 2021-09-30T23:09:39Z

Are you two servers a master/replica configuration, a Redis Cluster (with keys sharded across both), or two independent servers?

And in your RedLockFactory configuration, are you only connecting to one of them?

mikehank96 · 2021-09-30T23:32:41Z

Redis Cluster. We're using terraform and it's all created in one aws_elasticache_replication_group resource. As for the configuration I'm using the configuration endpoint which I figured that handles all connections.

mikehank96 · 2021-10-01T14:59:45Z

So I tried bosting the number of lock requests in my tests from 500 to 700 and now I get the NoQuorum errors from the single node redis instance as well so my it isn't a number of nodes issue.

mikehank96 · 2021-10-04T20:44:51Z

After some more experimentation I realized if I increase the wait time for the lock creation the errors disappear. Is it possible for NoQuorum statuses to be returned for timeouts?

samcook · 2021-10-05T11:14:02Z

Yes, that is possible. If an attempt to acquire a lock in an instance doesn't complete within the timeout it is treated as a failure, and if there aren't enough successfully acquired instances to meet the quorum then it will fail with NoQuorum.

The quorum required is floor(n/2 + 1), where n is the number of independent instances you have.

Instances	Quorum
1	1
2	2
3	2
4	3
5	3

ryangardner · 2021-10-06T18:41:47Z

Does Redlock do some kind of introspection when it is passed the endpoint? This is being used with an AWS elasticache "cluster mode enabled" cluster - it has 3 shards and 6 nodes. AWS docs state that you should just do all your writes through their "connection endpoint"

Redis (cluster mode enabled) clusters, use the cluster's Configuration Endpoint for all operations that support cluster mode enabled commands. You must use a client that supports Redis Cluster (Redis 3.2). You can still read from individual node endpoints (In the API/CLI these are referred to as Read Endpoints).
( https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Endpoints.html )

Does Redlock (or the underlying stackexchange redis driver) somehow look up the cluster information from that single endpoint, or do we need to configure things differently in order to use a cluster like this in AWS. Somehow it must know that it doesn't just have a single instance (otherwise why would it throw errors about NoQuorum)

It seems that the recommended way to do it would be to use multiple (3) standalone redis nodes and pass each of them in as endpoints to the redlock and then redlock will maintain a quorum with those standalone nodes on its own?

samcook · 2021-10-06T21:39:00Z

I haven't used Redis on AWS myself, so I'm not too sure whether they do things any differently to a standard Redis Cluster.

Does Redlock (or the underlying stackexchange redis driver) somehow look up the cluster information from that single endpoint, or do we need to configure things differently in order to use a cluster like this in AWS. Somehow it must know that it doesn't just have a single instance (otherwise why would it throw errors about NoQuorum)

RedLock.net doesn't do anything specific to look up cluster information - if anything happens there it would be within StackExchange.Redis.

If you are only providing one RedLockEndPoint (or one ConnectionMultiplexer, if you are using them directly) when you create your RedLockFactory then RedLock.net will treat your cluster as a single instance.

It is possible to get NoQuorum responses even with a single instance if that instance doesn't acquire a lock within the timeout period (in this situation the quorum is 1 and locks were acquired in 0 instances).

It seems that the recommended way to do it would be to use multiple (3) standalone redis nodes and pass each of them in as endpoints to the redlock and then redlock will maintain a quorum with those standalone nodes on its own?

Yes, that would be the suggested way to do it if you want more resilience than is offered by a single standalone instance.

mikehank96 · 2021-10-08T15:10:01Z

What's the difference between retries configured by RedLockRetryConfiguration and retries configured by CreateLockAsync? When setting RedLockRetryConfiguration instead of the CreateLockAsync ones the NoQuorum errors seemed to stop. Looking at the source code it looks like they almost do the same thing except one loops outside of AcquireAsync and one loops inside it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

mikehank96 commented Sep 30, 2021

mikehank96 commented Sep 30, 2021

samcook commented Sep 30, 2021

mikehank96 commented Sep 30, 2021 •

edited

Loading

mikehank96 commented Oct 1, 2021

mikehank96 commented Oct 4, 2021

samcook commented Oct 5, 2021

ryangardner commented Oct 6, 2021

samcook commented Oct 6, 2021

mikehank96 commented Oct 8, 2021

Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

Random NoQuorum Errors when using a redis cluster with multiple nodes. #91

Comments

mikehank96 commented Sep 30, 2021

mikehank96 commented Sep 30, 2021

samcook commented Sep 30, 2021

mikehank96 commented Sep 30, 2021 • edited Loading

mikehank96 commented Oct 1, 2021

mikehank96 commented Oct 4, 2021

samcook commented Oct 5, 2021

ryangardner commented Oct 6, 2021

samcook commented Oct 6, 2021

mikehank96 commented Oct 8, 2021

mikehank96 commented Sep 30, 2021 •

edited

Loading