You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ElastiCache will occasionally replace nodes during a configurable maintenance window ("These replacements are needed to apply mandatory software updates to your underlying host"), they refused to provide information on the maintenance failover process, but it looks similar to CLUSTER FAILOVER (source).
For ElastiCache redisclusters that do not have replica nodes for failover, this can trigger an edge case, such that rediscluster tries to free a connection referencing a local variable before it is assigned (commit). This was patched in v2.1.3.
In our case, the rediscluster client entered a bad state, and all requests to the redis node failed. The only way to recover was a manual service restart.
The text was updated successfully, but these errors were encountered:
Thanks for the report! That sounds like a nasty issue.
What would you like to see changed here? Baseplate.py only sets a minimum version (https://github.com/reddit/baseplate.py/blob/develop/setup.py#L15, 2.1.2) which we can happily bump if 2.1.2 is really bad. But the actual version used by your service is entirely up to your service, not Baseplate.
ElastiCache will occasionally replace nodes during a configurable maintenance window ("These replacements are needed to apply mandatory software updates to your underlying host"), they refused to provide information on the maintenance failover process, but it looks similar to CLUSTER FAILOVER (source).
For ElastiCache redisclusters that do not have replica nodes for failover, this can trigger an edge case, such that rediscluster tries to free a connection referencing a local variable before it is assigned (commit). This was patched in v2.1.3.
In our case, the rediscluster client entered a bad state, and all requests to the redis node failed. The only way to recover was a manual service restart.
The text was updated successfully, but these errors were encountered: