Brocast a PONG to all node in cluster when role changed #1295

enjoy-binbin · 2024-11-12T16:44:58Z

When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica
became a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:

    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        /* This message is acceptable only if I'm a primary and the sender
         * is one of my replicas. */
        if (!sender || sender->replicaof != myself) return 1;

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.

void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to %.40s (%s) for epoch %llu: it is a primary node", node->name,
                      node->human_nodename, (unsigned long long)requestCurrentEpoch);

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping to a random
node every second (see clusterCron).

When a node role changes, we should brocast the change to notify other nodes. For example, one primary and one replica, after a failover, the replica became a new primary, the primary became a new replica. And then we trigger a second cluster failover for the new replica, the new replica will send a MFSTART to its primary, ie, the new primary. But the new primary may reject the MFSTART due to this logic: ``` } else if (type == CLUSTERMSG_TYPE_MFSTART) { /* This message is acceptable only if I'm a primary and the sender * is one of my replicas. */ if (!sender || sender->replicaof != myself) return 1; ``` In the new primary views, sender is still a primary, and sender->replicaof is NULL, so we will return. Then the manual failover timedout. Another possibility is that other primaries refuse to vote after receiving the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary, so it refuse to vote, and then manual failover timedout. ``` void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) { ... if (clusterNodeIsPrimary(node)) { serverLog(LL_WARNING, "Failover auth denied to %.40s (%s) for epoch %llu: it is a primary node", node->name, node->human_nodename, (unsigned long long)requestCurrentEpoch); ``` The reason is that, currently, we only update the node->replicaof information when we receive a PING/PONG from the sender. For details, see clusterProcessPacket. Therefore, in some scenarios, such as clusters with many nodes and a large cluster-ping-interval (that is, cluster-node-timeout), the role change of the node will be very delayed. Signed-off-by: Binbin <binloveplay1314@qq.com>

codecov · 2024-11-12T16:59:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.76%. Comparing base (979f4c1) to head (8d993ad).
Report is 2 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1295      +/-   ##
============================================
+ Coverage     70.74%   70.76%   +0.02%     
============================================
  Files           116      116              
  Lines         63280    63285       +5     
============================================
+ Hits          44767    44784      +17     
+ Misses        18513    18501      -12

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.74% <100.00%> (+0.22%)`	⬆️
src/debug.c	`53.17% <100.00%> (+0.12%)`	⬆️
src/server.c	`87.65% <100.00%> (+<0.01%)`	⬆️
src/server.h	`100.00% <ø> (ø)`

... and 11 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

src/debug.c

src/server.h

Signed-off-by: Binbin <binloveplay1314@qq.com>

src/debug.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

src/cluster_legacy.c

madolson

Overall I think sending out the broadcast of messages is fine and should be safe.

Signed-off-by: Binbin <binloveplay1314@qq.com>

src/debug.c

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Binbin <binloveplay1314@qq.com>

Signed-off-by: Binbin <binloveplay1314@qq.com>

When a node role changes, we should brocast the change to notify other nodes. For example, one primary and one replica, after a failover, the replica became a new primary, the primary became a new replica. And then we trigger a second cluster failover for the new replica, the new replica will send a MFSTART to its primary, ie, the new primary. But the new primary may reject the MFSTART due to this logic: ``` } else if (type == CLUSTERMSG_TYPE_MFSTART) { if (!sender || sender->replicaof != myself) return 1; ``` In the new primary views, sender is still a primary, and sender->replicaof is NULL, so we will return. Then the manual failover timedout. Another possibility is that other primaries refuse to vote after receiving the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary, so it refuse to vote, and then manual failover timedout. ``` void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) { ... if (clusterNodeIsPrimary(node)) { serverLog(LL_WARNING, "Failover auth denied to... ``` The reason is that, currently, we only update the node->replicaof information when we receive a PING/PONG from the sender. For details, see clusterProcessPacket. Therefore, in some scenarios, such as clusters with many nodes and a large cluster-ping-interval (that is, cluster-node-timeout), the role change of the node will be very delayed. Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping to a random node every second (see clusterCron). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: vudiep411 <vdiep@amazon.com>

enjoy-binbin requested review from zuiderkwast and PingXie November 12, 2024 16:45

hwware reviewed Nov 12, 2024

View reviewed changes

src/debug.c Outdated Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

update comment and var name

5f0ba6a

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast reviewed Nov 17, 2024

View reviewed changes

src/debug.c Outdated Show resolved Hide resolved

update the name

40838da

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast approved these changes Nov 18, 2024

View reviewed changes

enjoy-binbin requested a review from madolson November 18, 2024 14:35

madolson reviewed Nov 19, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

madolson reviewed Nov 19, 2024

View reviewed changes

enjoy-binbin added 2 commits November 20, 2024 10:57

change the debug name to disable-cluster-random-ping

6d3a83e

Signed-off-by: Binbin <binloveplay1314@qq.com>

Merge remote-tracking branch 'upstream/unstable' into pong_all

67a5073

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast reviewed Nov 20, 2024

View reviewed changes

src/debug.c Outdated Show resolved Hide resolved

Update src/debug.c

17e2ab0

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Binbin <binloveplay1314@qq.com>

madolson approved these changes Nov 22, 2024

View reviewed changes

enjoy-binbin added 3 commits November 22, 2024 22:56

Merge remote-tracking branch 'upstream/unstable' into pong_all

fe0d417

Signed-off-by: Binbin <binloveplay1314@qq.com>

remove bad merge

b75cc40

Signed-off-by: Binbin <binloveplay1314@qq.com>

adjust the position

8d993ad

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin merged commit b9d2240 into valkey-io:unstable Nov 22, 2024
45 checks passed

enjoy-binbin deleted the pong_all branch November 22, 2024 16:22

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brocast a PONG to all node in cluster when role changed #1295

Brocast a PONG to all node in cluster when role changed #1295

enjoy-binbin commented Nov 12, 2024 •

edited

Loading

codecov bot commented Nov 12, 2024 •

edited

Loading

madolson left a comment

Brocast a PONG to all node in cluster when role changed #1295

Brocast a PONG to all node in cluster when role changed #1295

Conversation

enjoy-binbin commented Nov 12, 2024 • edited Loading

codecov bot commented Nov 12, 2024 • edited Loading

Codecov Report

madolson left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Nov 12, 2024 •

edited

Loading

codecov bot commented Nov 12, 2024 •

edited

Loading