Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brocast a PONG to all node in cluster when role changed #1295

Merged
merged 9 commits into from
Nov 22, 2024

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Nov 12, 2024

When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica
became a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:

    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        /* This message is acceptable only if I'm a primary and the sender
         * is one of my replicas. */
        if (!sender || sender->replicaof != myself) return 1;

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.

void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to %.40s (%s) for epoch %llu: it is a primary node", node->name,
                      node->human_nodename, (unsigned long long)requestCurrentEpoch);

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping to a random
node every second (see clusterCron).

When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica
became a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:
```
    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        /* This message is acceptable only if I'm a primary and the sender
         * is one of my replicas. */
        if (!sender || sender->replicaof != myself) return 1;
```

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.
```
void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to %.40s (%s) for epoch %llu: it is a primary node", node->name,
                      node->human_nodename, (unsigned long long)requestCurrentEpoch);
```

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.76%. Comparing base (979f4c1) to head (8d993ad).
Report is 2 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1295      +/-   ##
============================================
+ Coverage     70.74%   70.76%   +0.02%     
============================================
  Files           116      116              
  Lines         63280    63285       +5     
============================================
+ Hits          44767    44784      +17     
+ Misses        18513    18501      -12     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.74% <100.00%> (+0.22%) ⬆️
src/debug.c 53.17% <100.00%> (+0.12%) ⬆️
src/server.c 87.65% <100.00%> (+<0.01%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 11 files with indirect coverage changes

---- 🚨 Try these New Features:

src/debug.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
Signed-off-by: Binbin <binloveplay1314@qq.com>
src/debug.c Outdated Show resolved Hide resolved
Signed-off-by: Binbin <binloveplay1314@qq.com>
src/cluster_legacy.c Outdated Show resolved Hide resolved
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think sending out the broadcast of messages is fine and should be safe.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
src/debug.c Outdated Show resolved Hide resolved
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin merged commit b9d2240 into valkey-io:unstable Nov 22, 2024
45 checks passed
@enjoy-binbin enjoy-binbin deleted the pong_all branch November 22, 2024 16:22
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Nov 22, 2024
vudiep411 pushed a commit to vudiep411/valkey that referenced this pull request Nov 26, 2024
When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica became
a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:
```
    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        if (!sender || sender->replicaof != myself) return 1;
```

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.
```
void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to...
```

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping
to a random node every second (see clusterCron).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: vudiep411 <vdiep@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants