Handle cluster status reconcile autoscaler

aws · Jun 11, 2024 · 0706a51 · 0706a51
1 parent 505b20b
commit 0706a51
Show file tree

Hide file tree

Showing 26 changed files with 383 additions and 44 deletions.
diff --git a/designs/handle-autoscaler-cluster-status-reconciliation.md b/designs/handle-autoscaler-cluster-status-reconciliation.md
@@ -0,0 +1,91 @@
+# Handle cluster status reconciliation in EKS Anywhere controller for Autoscaling Configuration
+
+## Problem Statement
+
+When a customer configures the [autoscaling](https://anywhere.eks.amazonaws.com/docs/getting-started/optional/autoscaling/) configuration for any of their worker node group, the number of worker nodes created in the cluster will be handled by the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) in order to ensure that all pods have a place to run and there are no unneeded nodes. The cluster autoscaler also updates the replicas in the corresponding machine deployment object to match the actual number of machines provisioned. When the EKS-A controller reconciles the cluster status, it sees that the expected count of worker nodes does not match the observed count and marks the `WorkersReady` condition to False with the following message `Scaling down worker nodes, 1 expected (10 actual)`  This is because it gets the expected count from the worker node groups count in the cluster spec which is set by the customer during cluster creation or upgrade whereas the actual replicas are handled by the autoscaler. This doc discusses various options to fix this issue with the EKS Anywhere controller cluster status reconciliation for worker node groups with autoscaling configured.
+
+## Overview of Solution
+
+#### Handling cluster status reconciliation
+
+Update the [totalExpected](https://github.com/aws/eks-anywhere/blob/a2a19920f4b7b54f6bc21f608ee5ecd5c6f0c45b/pkg/controller/clusters/status.go#L202) count of worker nodes to be equal to the count of worker nodes specified in the cluster spec *only* for worker node groups without autoscaling configured. For worker node groups which are configured with autoscaling, we include another validation for the `workersReady` condition which checks that the number of replicas lies between the minCount and maxCount specified in the autoscaler configuration. This validation will be done only after all the existing validations are done for worker node groups without autoscaling configured.
+
+#### Handling cluster spec updates
+
+When cluster spec is applied during cluster create/upgrade, we will not set the replicas in the md template for the worker node groups which have autoscaling configured. It will be defaulted to the minCount specified in the autoscaling configuration for new md objects during cluster creation whereas for cluster upgrades, it will be the same as the old md object’s replicas field value.
+
+**Pros:**
+
+* Removing the dependency on worker node group count for cluster creation too
+* Worker node count is ignored which is what we want because autoscaler should handle it
+
+**Cons:**
+
+* Source of truth for worker nodes count would be md replicas which is not coming from an object that we own
+
+#### Testing
+
+E2E test will be added to test the worker node groups configured with autoscaler for cluster upgrades
+
+#### Documentation
+
+We need to explicitly document that the count will be ignored for all the worker node groups configuration which have autoscaling configured in the cluster spec for both cluster creation as well as upgrade.
+
+## Alternate Solutions Considered
+
+Here, option number corresponds to options for cluster status reconciliation (2 options)
+Here, option letter corresponds to options for cluster spec updates (3 options)
+
+### **Option 1a**
+
+#### Handling cluster status reconciliation
+
+For each worker node group, if the [count](https://anywhere.eks.amazonaws.com/docs/getting-started/vsphere/vsphere-spec/#workernodegroupconfigurationscount-required) in the worker node group configuration for the cluster object is not equal to the replicas field in the machine deployment object, update the count to match it to the number of md replicas. This will be implemented in the MachineDeploymentReconciler in the EKS Anywhere controller.
+
+#### Handling cluster spec updates
+
+When cluster spec is applied during cluster create/upgrade, we will not set the replicas in the md template for the worker node groups which have autoscaling configured. It will be defaulted to the minCount specified in the autoscaling configuration for new md objects during cluster creation whereas for cluster upgrades, it will be the same as the old md object’s replicas field value.
+
+### **Option 1b:**
+
+#### Handling cluster status reconciliation
+
+For each worker node group, if the [count](https://anywhere.eks.amazonaws.com/docs/getting-started/vsphere/vsphere-spec/#workernodegroupconfigurationscount-required) in the worker node group configuration for the cluster object is not equal to the replicas field in the machine deployment object, update the count to match it to the number of md replicas. This will be implemented in the MachineDeploymentReconciler in the EKS Anywhere controller.
+
+#### Handling cluster spec updates
+
+We will deny any updates to the worker node count in the webhook if the autoscaling configuration is set. This will ensure that the md object is not re-applied by the controller to avoid updating the replicas field which should be handled by the autoscaler only.
+
+### **Option 1c:**
+
+#### Handling cluster status reconciliation
+
+For each worker node group, if the [count](https://anywhere.eks.amazonaws.com/docs/getting-started/vsphere/vsphere-spec/#workernodegroupconfigurationscount-required) in the worker node group configuration for the cluster object is not equal to the replicas field in the machine deployment object, update the count to match it to the number of md replicas. This will be implemented in the _MachineDeploymentReconciler_ in the EKS Anywhere controller.
+
+#### Handling cluster spec updates
+
+For each worker node group, if the [count](https://anywhere.eks.amazonaws.com/docs/getting-started/vsphere/vsphere-spec/#workernodegroupconfigurationscount-required) in the worker node group configuration for the cluster object is not equal to the replicas field in the machine deployment object, update the count to match it to the number of md replicas. This will be implemented in the _ClusterReconciler_ in the EKS Anywhere controller.
+
+### Option 2a:
+
+#### Handling cluster status reconciliation
+
+Update the [totalExpected](https://github.com/aws/eks-anywhere/blob/a2a19920f4b7b54f6bc21f608ee5ecd5c6f0c45b/pkg/controller/clusters/status.go#L202) count of worker nodes to be equal to the count of worker nodes specified in the cluster spec *only* for worker node groups without autoscaling configured. For worker node groups which are configured with autoscaling, we include another validation for the `workersReady` condition which checks that the number of replicas lies between the minCount and maxCount specified in the autoscaler configuration. This validation will be done only after all the existing validations are done for worker node groups without autoscaling configured.
+
+#### Handling cluster spec updates
+
+We will deny any updates to the worker node count in the webhook if the autoscaling configuration is set. This will ensure that the md object is not re-applied by the controller to avoid updating the replicas field which should be handled by the autoscaler only.
+
+Proposed solution is better than this option because it does not force the user to remove their autoscaling configuration if they decide to make an update to the worker nodes count and not rely on the autoscaler anymore.
+
+### Option 2b:
+
+#### Handling cluster status reconciliation
+
+Update the [totalExpected](https://github.com/aws/eks-anywhere/blob/a2a19920f4b7b54f6bc21f608ee5ecd5c6f0c45b/pkg/controller/clusters/status.go#L202) count of worker nodes to be equal to the count of worker nodes specified in the cluster spec *only* for worker node groups without autoscaling configured. For worker node groups which are configured with autoscaling, we include another validation for the `workersReady` condition which checks that the number of replicas lies between the minCount and maxCount specified in the autoscaler configuration. This validation will be done only after all the existing validations are done for worker node groups without autoscaling configured.
+
+#### Handling cluster spec updates
+
+For each worker node group, if the [count](https://anywhere.eks.amazonaws.com/docs/getting-started/vsphere/vsphere-spec/#workernodegroupconfigurationscount-required) in the worker node group configuration for the cluster object is not equal to the replicas field in the machine deployment object, update the count to match it to the number of md replicas. This will be implemented in the _ClusterReconciler_ in the EKS Anywhere controller.
+
+Option 1c is better than this option because it can use the same function for updating the count in both MachineDeploymentReconciler as well as ClusterReconciler and also it does not have to change any logic for the cluster status reconciliation.
diff --git a/docs/content/en/docs/clustermgmt/cluster-status.md b/docs/content/en/docs/clustermgmt/cluster-status.md
@@ -77,7 +77,10 @@ Conditions provide a high-level status report representing an assessment of clus
   * `DefaultCNIConfigured` - reports the configuration state of the default CNI specified in the cluster specifications. It will be marked as `True` once the default CNI has been successfully configured on the cluster. 
   However, if the EKS Anywhere default cilium CNI has been [configured to skip upgrades]({{< relref "../getting-started/optional/cni/#use-a-custom-cni" >}}) in the cluster specification, then this condition will be marked as `False` with the reason `SkipUpgradesForDefaultCNIConfigured`.
 
-  * `WorkersReady` - reports that the condition of the current state of worker machines versus the desired state specified in the Cluster specification. This condition is marked `True` once the number of worker nodes in the cluster match the expected number of worker nodes as defined in the cluster specifications and all the worker nodes are up to date and ready.
+  * `WorkersReady` - reports that the condition of the current state of worker machines versus the desired state specified in the Cluster specification. This condition is marked `True` once the following conditions are met:
+    * For worker node groups with [autoscaling]({{< relref "../getting-started/optional/autoscaling" >}}) configured, number of worker nodes in that group lies between the minCount and maxCount number of worker nodes as defined in the cluster specification.
+    * For fixed worker node groups, number of worker nodes in that group matches the expected number of worker nodes in those groups as defined in the cluster specification.
+    * All the worker nodes are up to date and ready.
 
   * `Ready` - reports a summary of the following conditions: `ControlPlaneInitialized`, `ControlPlaneReady`, and `WorkersReady`. It indicates an overall operational state of the EKS Anywhere cluster. It will be marked `True` once the current state of the cluster has fully reached the desired state specified in the Cluster spec.
 
diff --git a/docs/content/en/docs/getting-started/baremetal/bare-spec.md b/docs/content/en/docs/getting-started/baremetal/bare-spec.md
@@ -184,7 +184,7 @@ You can omit `workerNodeGroupConfigurations` when creating Bare Metal clusters.
 >**_NOTE:_** Empty `workerNodeGroupConfigurations` is not supported when Kubernetes version <= 1.21.
 
 ### workerNodeGroupConfigurations[*].count (optional)
-Number of worker nodes. (default: `1`) Optional if autoscalingConfiguration is used, in which case count will default to `autoscalingConfiguration.minCount`.
+Number of worker nodes. (default: `1`) It will be ignored if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and `autoscalingConfiguration` is used to specify the desired range of replicas.
 
 Refers to [troubleshooting machine health check remediation not allowed]({{< relref "../../troubleshooting/troubleshooting/#machine-health-check-shows-remediation-is-not-allowed" >}}) and choose a sufficient number to allow machine health check remediation.
 

diff --git a/docs/content/en/docs/getting-started/cloudstack/cloud-spec.md b/docs/content/en/docs/getting-started/cloudstack/cloud-spec.md
@@ -235,7 +235,7 @@ This takes in a list of node groups that you can define for your workers.
 You may define one or more worker node groups.
 
 ### workerNodeGroupConfigurations[*].count (optional)
-Number of worker nodes. (default: `1`) Optional if autoscalingConfiguration is used, in which case count will default to `autoscalingConfiguration.minCount`.
+Number of worker nodes. (default: `1`) It will be ignored if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and `autoscalingConfiguration` is used to specify the desired range of replicas.
 
 Refers to [troubleshooting machine health check remediation not allowed]({{< relref "../../troubleshooting/troubleshooting/#machine-health-check-shows-remediation-is-not-allowed" >}}) and choose a sufficient number to allow machine health check remediation.
 

diff --git a/docs/content/en/docs/getting-started/nutanix/nutanix-spec.md b/docs/content/en/docs/getting-started/nutanix/nutanix-spec.md
@@ -190,7 +190,7 @@ creation process are [here]({{< relref "./nutanix-prereq/#prepare-a-nutanix-envi
 This takes in a list of node groups that you can define for your workers. You may define one or more worker node groups.
 
 ### workerNodeGroupConfigurations[*].count (optional)
-Number of worker nodes. (default: `1`) Optional if `autoscalingConfiguration` is used, in which case count will default to `autoscalingConfiguration.minCount`.
+Number of worker nodes. (default: `1`) It will be ignored if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and `autoscalingConfiguration` is used to specify the desired range of replicas.
 
 Refers to [troubleshooting machine health check remediation not allowed]({{< relref "../../troubleshooting/troubleshooting/#machine-health-check-shows-remediation-is-not-allowed" >}}) and choose a sufficient number to allow machine health check remediation.
 

diff --git a/docs/content/en/docs/getting-started/optional/autoscaling.md b/docs/content/en/docs/getting-started/optional/autoscaling.md
@@ -35,10 +35,9 @@ Configure an EKS Anywhere worker node group to be picked up by a Cluster Autosca
           machineGroupRef:
             kind: VSphereMachineConfig
             name: worker-machine-b
-          count: 1
 ```
 
-Note that if no `count` is specified for the worker node group it will default to the `autoscalingConfiguration.minCount` value.
+Note that if `count` is specified for the worker node group, it's value will be ignored during cluster creation as well as cluster upgrade. If only one of `minCount` or `maxCount` is specified, then the other will have a default value of `0` and `count` will have a default value of `minCount`.
 
 EKS Anywhere automatically applies the following annotations to your `MachineDeployment` objects for worker node groups with autoscaling enabled. The Cluster Autoscaler component uses these annotations to identify which node groups to autoscale. If a node group is not autoscaling as expected, check for these annotations on the `MachineDeployment` to troubleshoot.
 ```

diff --git a/docs/content/en/docs/getting-started/snow/snow-spec.md b/docs/content/en/docs/getting-started/snow/snow-spec.md
@@ -147,7 +147,7 @@ This takes in a list of node groups that you can define for your workers.
 You may define one or more worker node groups.
 
 ### workerNodeGroupConfigurations[*].count (optional)
-Number of worker nodes. (default: `1`) Optional if autoscalingConfiguration is used, in which case count will default to `autoscalingConfiguration.minCount`.
+Number of worker nodes. (default: `1`) It will be ignored if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and `autoscalingConfiguration` is used to specify the desired range of replicas.
 
 Refers to [troubleshooting machine health check remediation not allowed]({{< relref "../../troubleshooting/troubleshooting/#machine-health-check-shows-remediation-is-not-allowed" >}}) and choose a sufficient number to allow machine health check remediation.
 

diff --git a/docs/content/en/docs/getting-started/vsphere/vsphere-spec.md b/docs/content/en/docs/getting-started/vsphere/vsphere-spec.md
@@ -159,7 +159,7 @@ This takes in a list of node groups that you can define for your workers.
 You may define one or more worker node groups.
 
 ### workerNodeGroupConfigurations[*].count (optional)
-Number of worker nodes. (default: `1`) Optional if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and autoscalingConfiguration is used, in which case count will default to `autoscalingConfiguration.minCount`.
+Number of worker nodes. (default: `1`) It will be ignored if the [cluster autoscaler curated package]({{< relref "../../packages/cluster-autoscaler/addclauto" >}}) is installed and `autoscalingConfiguration` is used to specify the desired range of replicas.
 
 Refers to [troubleshooting machine health check remediation not allowed]({{< relref "../../troubleshooting/troubleshooting/#machine-health-check-shows-remediation-is-not-allowed" >}}) and choose a sufficient number to allow machine health check remediation.
 

diff --git a/pkg/api/v1alpha1/condition_consts.go b/pkg/api/v1alpha1/condition_consts.go
@@ -55,6 +55,9 @@ const (
 
 	// ExternalEtcdNotAvailable reports the Cluster status is waiting for Etcd to be available.
 	ExternalEtcdNotAvailable = "ExternalEtcdNotAvailable"
+
+	// AutoscalerConstraintNotMetReason reports the Cluster status is waiting for autoscaler constraint to be met.
+	AutoscalerConstraintNotMetReason = "AutoscalerConstraintNotMet"
 )
 
 const (

diff --git a/pkg/clusterapi/autoscaler.go b/pkg/clusterapi/autoscaler.go
@@ -8,9 +8,10 @@ import (
 	anywherev1 "github.com/aws/eks-anywhere/pkg/api/v1alpha1"
 )
 
+// Autoscaler annotation constants.
 const (
-	nodeGroupMinSizeAnnotation = "cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size"
-	nodeGroupMaxSizeAnnotation = "cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size"
+	NodeGroupMinSizeAnnotation = "cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size"
+	NodeGroupMaxSizeAnnotation = "cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size"
 )
 
 func ConfigureAutoscalingInMachineDeployment(md *clusterv1.MachineDeployment, autoscalingConfig *anywherev1.AutoScalingConfiguration) {
@@ -22,6 +23,6 @@ func ConfigureAutoscalingInMachineDeployment(md *clusterv1.MachineDeployment, au
 		md.ObjectMeta.Annotations = map[string]string{}
 	}
 
-	md.ObjectMeta.Annotations[nodeGroupMinSizeAnnotation] = strconv.Itoa(autoscalingConfig.MinCount)
-	md.ObjectMeta.Annotations[nodeGroupMaxSizeAnnotation] = strconv.Itoa(autoscalingConfig.MaxCount)
+	md.ObjectMeta.Annotations[NodeGroupMinSizeAnnotation] = strconv.Itoa(autoscalingConfig.MinCount)
+	md.ObjectMeta.Annotations[NodeGroupMaxSizeAnnotation] = strconv.Itoa(autoscalingConfig.MaxCount)
 }