Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

enjoy-binbin · 2024-10-29T16:19:38Z

Currently, SENTINEL FAILOVER selects a replica to send REPLICAOF NO ONE,
then waits for PROMOTION, and then sends REPLICAOF new_ip new_port to the
old primary and other replicas to complete the failover. The problem here
is that if the old primary has written to it before the role change during
the failover, these writes will be lost.

We can use the FAILOVER command to avoid this data loss. FAILOVER was added
in 0d18a1e, it can coordinates the failover
between the primary and the replica.

Before the original step REPLICAOF NO ONE, we try to send FAILOVER TO ip port
to the primary to go through the FAILOVER process. If the primary does not
support FAILOVER, for example, returns an error, it will fallback to the old
logic, that means fallback to REPLICAOF NO ONE.

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2024-10-29T16:21:38Z

This is a PR that was written very quickly (I wrote it at night and i want to push it as soon as possible). There may be problems with the details or it can be optimized.

Also see the redis/redis#6062 for more data loss details. The test case also proves data loss locally.

src/sentinel.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast

Yes, this looks good.

I'm not enough familiar with Sentinel. @hwware you are a sentinel expert. Do you want to take a look?

src/networking.c

src/sentinel.c

tests/sentinel/tests/05-manual.tcl

Signed-off-by: Binbin <binloveplay1314@qq.com>

codecov · 2024-10-31T11:59:45Z

Codecov Report

Attention: Patch coverage is 0% with 40 lines in your changes missing coverage. Please review.

Project coverage is 70.64%. Comparing base (c21f1dc) to head (21a6f4c).
Report is 26 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/sentinel.c	0.00%	40 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1238      +/-   ##
============================================
- Coverage     70.72%   70.64%   -0.08%     
============================================
  Files           114      114              
  Lines         63150    63195      +45     
============================================
- Hits          44660    44642      -18     
- Misses        18490    18553      +63

Files with missing lines	Coverage Δ
src/sentinel.c	`0.00% <0.00%> (ø)`

... and 12 files with indirect coverage changes

Signed-off-by: Binbin <binloveplay1314@qq.com>

tests/sentinel/tests/05-manual.tcl

hwware · 2024-11-01T02:57:52Z

Overall, LTGM, just one concern about the test case.

Signed-off-by: Binbin <binloveplay1314@qq.com>

gmbnomis · 2024-11-06T22:35:15Z

Some time ago, I proposed to use FAILOVER with SENTINEL FAILOVER in redis, see redis/redis#13118. I implemented this proposal and also ported my first implementation to Valkey some time ago, see gmbnomis@146850a.

The main issues I learned about when doing that were: (I think that this PR does not address these)

We should follow the Sentinel protocol. Clients can expect to be disconnected when the role of the primary changes (see https://valkey.io/topics/sentinel-clients/). A simple FAILOVER command does not achieve that. Thus, clients may find themselves unable to write suddenly.
FAILOVER pauses all client writes during the failover (in Valkey, it even pauses some client reads during the FAILOVER_IN_PROGRESS phase, see To avoid bouncing -REDIRECT during FAILOVER #871 (comment)). This means that Sentinels (remember that SENTINEL FAILOVER is performed without an election) will regard the former primary as down during the failover procedure and may initiate a failover. Also, if I remember correctly, the sentinel instance in charge of the failover may block its own connection to the primary.
The FAILOVER command may hang indefinitely in the FAILOVER_IN_PROGRESS phase even if TIMEOUT is given. We should probably have a mechanism to detect such a condition and do a FAILOVER ABORT.

For reference: my proposal addresses 1. It addresses 2 somewhat (probably a solution seeking a leader election would be safer here) and does not address 3 (I remember that I started working on this but I did not finish it. The idea was to monitor the INFO field "master_failover_state" and send an FAILOVER ABORT if a node is in the state for too long. )

enjoy-binbin · 2024-11-07T02:45:02Z

thanks for the detail comments.

We should follow the Sentinel protocol. Clients can expect to be disconnected when the role of the primary changes (see https://valkey.io/topics/sentinel-clients/). A simple FAILOVER command does not achieve that. Thus, clients may find themselves unable to write suddenly.

Yes, we should. I remember this thing but nerver be able to found the documentation on it. thanks for the link.

FAILOVER pauses all client writes during the failover (in Valkey, it even pauses some client reads during the FAILOVER_IN_PROGRESS phase, see To avoid bouncing -REDIRECT during FAILOVER #871 (comment)). This means that Sentinels (remember that SENTINEL FAILOVER is performed without an election) will regard the former primary as down during the failover procedure and may initiate a failover. Also, if I remember correctly, the sentinel instance in charge of the failover may block its own connection to the primary.

I did not read the whole story, i will find some times to catch up. Yes it seems you are right, all my thinking is in happy path. Maybe we should use FORCE option?

The FAILOVER command may hang indefinitely in the FAILOVER_IN_PROGRESS phase even if TIMEOUT is given. We should probably have a mechanism to detect such a condition and do a FAILOVER ABORT.

I support in this, a FORCE option is needed, in this case, the worst case is the same as before.

For reference: my proposal addresses 1. It addresses 2 somewhat (probably a solution seeking a leader election would be safer here) and does not address 3 (I remember that I started working on this but I did not finish it. The idea was to monitor the INFO field "master_failover_state" and send an FAILOVER ABORT if a node is in the state for too long. )

@gmbnomis i did not read your proposal and implementation before, i will read it later, do you want to pick it up? You have done a lot of work here, it would be great that if you could pick it up, or we can slowly push it forward in the future and make all of this happen.

gmbnomis · 2024-11-07T20:20:22Z

I did not read the whole story, i will find some times to catch up. Yes it seems you are right, all my thinking is in happy path. Maybe we should use FORCE option?

Using FAILOVER with the FORCE option would be more similar to the current SENTINEL FAILOVER in that we don't care about the replication state and just fail over at some point in time.

The FAILOVER command may hang indefinitely in the FAILOVER_IN_PROGRESS phase even if TIMEOUT is given. We should probably have a mechanism to detect such a condition and do a FAILOVER ABORT.

I support in this, a FORCE option is needed, in this case, the worst case is the same as before.

The FORCE option of FAILOVER does not apply to the FAILOVER_IN_PROGRESS state, so we might get stuck as well and need some mechanism to get out of it. (My current thinking is that this does not need to be very efficient because the probability of this happening seems rather low. But the system should be able to recover from such a failure)

For reference: my proposal addresses 1. It addresses 2 somewhat (probably a solution seeking a leader election would be safer here) and does not address 3 (I remember that I started working on this but I did not finish it. The idea was to monitor the INFO field "master_failover_state" and send an FAILOVER ABORT if a node is in the state for too long. )

@gmbnomis i did not read your proposal and implementation before, i will read it later, do you want to pick it up? You have done a lot of work here, it would be great that if you could pick it up, or we can slowly push it forward in the future and make all of this happen.

@enjoy-binbin I am not completely sure, but you mean to pick up my old proposal and re-start working on it? As there seems to be interest in this feature I'd be happy to do that. (If so, how should I start? Create a new issue with an update of redis/redis#13118 to flesh out the implementation options further?)

I am also happy to support getting another implementation into shape.

gmbnomis · 2024-11-12T07:49:09Z

@enjoy-binbin I have posted an updated description #1291 and implementation #1292 now. I think it addresses all three topics I mentioned above.

enjoy-binbin · 2024-11-12T07:54:51Z

thanks for the work, i am closing this one, let move the discusstion to there

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss

6fa3f78

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast reviewed Oct 29, 2024

View reviewed changes

src/sentinel.c Outdated Show resolved Hide resolved

code review: remove legacy type

ad3f9d0

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast reviewed Oct 31, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

src/sentinel.c Outdated Show resolved Hide resolved

tests/sentinel/tests/05-manual.tcl Outdated Show resolved Hide resolved

tests/sentinel/tests/05-manual.tcl Outdated Show resolved Hide resolved

code review

34fb9b5

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Nov 1, 2024

renmae the type name

9c372d6

Signed-off-by: Binbin <binloveplay1314@qq.com>

hwware reviewed Nov 1, 2024

View reviewed changes

tests/sentinel/tests/05-manual.tcl Show resolved Hide resolved

update comment

21a6f4c

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin marked this pull request as draft November 7, 2024 02:45

enjoy-binbin closed this Nov 12, 2024

enjoy-binbin deleted the sentinel_failover branch November 12, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

enjoy-binbin commented Oct 29, 2024 •

edited

Loading

enjoy-binbin commented Oct 29, 2024

zuiderkwast left a comment

codecov bot commented Oct 31, 2024 •

edited

Loading

hwware commented Nov 1, 2024

gmbnomis commented Nov 6, 2024 •

edited

Loading

enjoy-binbin commented Nov 7, 2024

gmbnomis commented Nov 7, 2024

gmbnomis commented Nov 12, 2024

enjoy-binbin commented Nov 12, 2024

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238

Conversation

enjoy-binbin commented Oct 29, 2024 • edited Loading

enjoy-binbin commented Oct 29, 2024

zuiderkwast left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 31, 2024 • edited Loading

Codecov Report

hwware commented Nov 1, 2024

gmbnomis commented Nov 6, 2024 • edited Loading

enjoy-binbin commented Nov 7, 2024

gmbnomis commented Nov 7, 2024

gmbnomis commented Nov 12, 2024

enjoy-binbin commented Nov 12, 2024

enjoy-binbin commented Oct 29, 2024 •

edited

Loading

codecov bot commented Oct 31, 2024 •

edited

Loading

gmbnomis commented Nov 6, 2024 •

edited

Loading