Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11243. SCM SafeModeRule Support EC. #7008

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Jul 31, 2024

What changes were proposed in this pull request?

We aim for SCM to immediately switch to leader once it exits safe mode. Currently, due to certain issues, we need to wait for at least one full container report from a DataNode before proceeding with the switch.

Currently, SCM SafeMode has the following issues:

Issue1

DataNodeSafeModeRule cannot effectively verify the registration status of DataNodes. In most cases, as long as there are more than one DataNode, this rule passes. Therefore, we need to strengthen this rule.

Issue2

ContainerSafeModeRule does not support verification of EC (Erasure Coding) Containers. EC Containers differ significantly from RATIS/THREE Containers because EC Containers require determining how many replicas are needed based on the EC type. For instance, for EC-6-3-1024K, we need to ensure that the Container reports having all 6 replicas before it can provide services.

This PR aims to enhance and improve the above two points.

For code Improve:

Enhance DataNodeSafeModeRule

For the registration of Datanodes, we need to obtain the complete list of Datanodes from SCM. This list can be retrieved from the Pipeline. I pass PipelineManager as a parameter into DataNodeSafeModeRule to calculate the number of Datanodes.

Enhance ContainerSafeModeRule

  • Enhance replica validation for EC containers. Obtain the required replicas based on ECReplicationConfig. Consider container reporting complete only when sufficient replicas have been reported.

  • Modify the message sending location of ContainerSafeModeRule.

// TODO : Return the list of Nodes that forms the SCM HA.
RegisteredCommand registeredCommand = scm.getScmNodeManager()
.register(datanodeDetails, nodeReport, pipelineReportsProto,
layoutInfo);
if (registeredCommand.getError()
== SCMRegisteredResponseProto.ErrorCode.success) {
eventPublisher.fireEvent(CONTAINER_REPORT,
new SCMDatanodeHeartbeatDispatcher.ContainerReportFromDatanode(
datanodeDetails, containerReportsProto));
eventPublisher.fireEvent(SCMEvents.NODE_REGISTRATION_CONT_REPORT,
new NodeRegistrationContainerReport(datanodeDetails,
containerReportsProto));
eventPublisher.fireEvent(PIPELINE_REPORT,
new PipelineReportFromDatanode(datanodeDetails,
pipelineReportsProto));
}

There are some issues in this part of the code. The handling of NODE_REGISTRATION_CONT_REPORT and CONTAINER_REPORT is asynchronous. There is a scenario where NODE_REGISTRATION_CONT_REPORT processing completes, but CONTAINER_REPORT processing does not. This still leads to insufficient EC replicas issue.

I adjusted the sending position of NODE_REGISTRATION_CONT_REPORT (requiring the message to be sent only after CONTAINER_REPORT processing completes) and introduced a new type, CONTAINER_REGISTRATION_REPORT, to distinguish it.

Page display:

image

What is the link to the Apache JIRA

JIRA: HDDS-11243: SCM SafeModeRule Support EC.

How was this patch tested?

Junit Test & Production environment validation

@slfan1989 slfan1989 marked this pull request as draft July 31, 2024 02:15
@slfan1989 slfan1989 changed the title HDDS-11243: SCM SafeModeRule Support EC. HDDS-11243. SCM SafeModeRule Support EC. Aug 1, 2024
@slfan1989 slfan1989 marked this pull request as ready for review August 4, 2024 13:04
@@ -199,6 +201,11 @@ public void onMessage(final ContainerReportFromDatanode reportFromDatanode,
// list
processMissingReplicas(datanodeDetails, expectedContainersInDatanode);
containerManager.notifyContainerReportProcessing(true, true);
if (reportFromDatanode.isRegister()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the CONTAINER_REPORT is completed, we send the message to CONTAINER_REGISTRATION_REPORT to ensure that the container count is accurate.

@slfan1989
Copy link
Contributor Author

@errose28 @siddhantsangwan Can you help review this pr? Thank you very much!

@siddhantsangwan
Copy link
Contributor

@slfan1989 Thanks for taking this up, I was earlier thinking of fixing this myself. I'll review the PR soon.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Aug 22, 2024

@siddhantsangwan Can you help review this pr? Thank you very much! The unit test errors are not caused by our changes.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 I've reviewed this partly. Have some comments below.

if (replicationConfig != null && replicationConfig instanceof ECReplicationConfig) {
ECReplicationConfig ecReplicationConfig = (ECReplicationConfig) replicationConfig;
int data = ecReplicationConfig.getData();
if (uuids != null && uuids.size() > data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Ratis, just one replica per container is required. So for EC, data number of Datanodes should be sufficient. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. For EC, the amount of data we have is already sufficient. I will improve the code.

Comment on lines 175 to 187
if (ratisContainerMap.containsKey(containerID)) {
ratisContainerDNsMap.computeIfAbsent(containerID, key -> Sets.newHashSet());
ratisContainerDNsMap.get(containerID).add(datanodeUUID);
if (!reportedConatinerIDSet.contains(containerID)) {
Set<UUID> uuids = ratisContainerDNsMap.get(containerID);
if (uuids != null && uuids.size() >= 1) {
ratisContainerWithMinReplicas.getAndAdd(1);
reportedConatinerIDSet.add(containerID);
getSafeModeMetrics()
.incCurrentContainersWithOneReplicaReportedCount();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before? Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well? Or is it done this way just so it's similar to the EC logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the question!

I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before?

The previous logic was correct. I made this modification for two reasons:

  • To align with the EC's logic and improve code readability.
  • To facilitate the retrieval of additional data in future pr. For example, this will allow users not only to understand the progress but also to identify which containers have not reported and which DataNodes are included in the reported containers.

Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well?

The type of ratisContainerDNsMap is Map<Long, Set<UUID>>, where the key is the ContainerId. The reason for using a Set as the value is to avoid retaining duplicate DN information, as we may encounter the same DN registering multiple times.

Or is it done this way just so it's similar to the EC logic?

Here's one reason; it has already been explained in the previous comment.

Can we modify it this way? The original code contains some insufficient information.

long ratisCutOff = (long) Math.ceil(ratisMaxContainer * safeModeCutoff);
long ecCutOff = (long) Math.ceil(ecMaxContainer * safeModeCutoff);

getSafeModeMetrics().setNumContainerWithOneReplicaReportedThreshold(ratisCutOff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set EC metrics as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this part of the code.

Comment on lines 276 to 277
private void reInitializeRule() {
containerMap.clear();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like most of the code inside this method is the same as before. If possible, let's refactor this to avoid repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this part of the code.

@@ -75,10 +79,18 @@ public void setNumContainerWithOneReplicaReportedThreshold(long val) {
this.numContainerWithOneReplicaReportedThreshold.set(val);
}

public void setNumContainerWithECDataReplicaReportedThreshold(long val) {
this.numContainerWithECDataReplicaReportedThreshold.incr(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use set() instead of incr().

@slfan1989
Copy link
Contributor Author

@slfan1989 I've reviewed this partly. Have some comments below.

Thank you very much for reviewing this PR! I will respond to your questions as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants