A cluster capacity formula for TF-controller at scale #281
chanwit
started this conversation in
Show and tell
Replies: 3 comments 10 replies
-
When you say modules are you referring to a |
Beta Was this translation helpful? Give feedback.
6 replies
-
Will the size of the terraform module affect this? Let's say you have a terraform module that takes a few minutes to vs a small module that only takes a few seconds? |
Beta Was this translation helpful? Give feedback.
3 replies
-
I would assume that node size (instance type) should also be accounted for |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
After reconciling 1,500 Terraform modules concurrently using TF-controller v0.10.1 on a 25-node cluster, I would like to share a formula derived from our successful experiment with anyone interested in setting up an EKS cluster and using TF-controller at a similar scale.
Formula
The following formula can be used to determine the number of nodes needed to provision an EKS cluster when using TF-controller at scale:
In this formula:
.spec.interval
of each TF-controller object (infra.contrib.fluxcd.io/v1alpha1.Terraform
).Example
For the given experiment with 1,500 Terraform modules, we can use the formula to calculate the number of nodes needed for provisioning. Given that each module has a$\tau_i$ ) of $\tau_p$ ) is less than 30 seconds, we can set $\sum{max\{\tau_p, \tau_i\}}$ to 1,500. From the experiment, we found that $\alpha = 1.2$ . Assuming an average wait time of 3 minutes for each reconciliation loop of a Terraform module, we can set $O(\bar{w})$ to 3. In our setup, $\bar{C_n}$ is 29, which can be obtained from the information of each Kubernetes node. Finally, we have $k = 4$ for this setup.
.spec.interval
(1m
, and the maximum processing time of each module (By substituting the appropriate values into the formula, we can determine the optimal size of an EKS cluster for managing Terraform modules, similar to the setup used in the experiment.
That's it. We got$N = 25$ , the number of nodes we need to provision for reconciling 1,500 Terraform modules concurrently, with no more than a 3-minute wait in each reconciliation loop.
Maximum Concurrency
You'll find that a key setting to scale the cluster is in the values of the TF-controller's Helm chart. It's$\sum{max\{\tau_p, \tau_i\}} \cdot O(\bar{w})^{-1}$ .
.concurrency
, the maximum number of Go routines allowed in the controller. The value of thisconcurrency
setting can be obtained from this termSetting for a Small Cluster
You may ask: OK, I have a 5-node management cluster. How many Terraform modules can I manage at a time on the cluster?
Let's inverse the equation and round down instead of up.
So it's OK to use a 5-node EKS cluster (with the default${72 \cdot 1} / {3}$ = 24 for the TF-controller used by this cluster.
eksctl
config) to manage 72 Terraform modules at a 3-minute wait time.We should set
concurrency
toBeta Was this translation helpful? Give feedback.
All reactions