Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add control plane failure domains feature for Nutanix provider #8192

Merged
merged 6 commits into from
Jun 6, 2024

Conversation

adiantum
Copy link
Contributor

Description of changes:
Add control plane failure domains feature for Nutanix provider

  • change Nutanix Datacenter CRD
  • change templates
  • generate manifests
  • add unit test

Testing (if applicable):

$ eksctl anywhere create cluster -f ./multizone.cluster.yaml -v10 --bundles-override bin/local-bundle-release.yaml
2024-05-21T17:49:24.102Z        V6      Executing command       {"cmd": "/usr/bin/docker info --format '{{json .MemTotal}}'"}
2024-05-21T17:49:24.149Z        V4      Reading bundles manifest        {"url": "bin/local-bundle-release.yaml"}
2024-05-21T17:49:24.170Z        V4      Using CAPI provider versions    {"Core Cluster API": "v1.6.0+ae39aac", "Kubeadm Bootstrap": "v1.6.0+49ef750", "Kubeadm Control Plane": "v1.6.0+a7122b7", "External etcd Bootstrap": "v1.0.10+1ceb898", "External etcd Controller": "v1.0.17+5e33062", "Cluster API Provider Nutanix": "v1.3.3"}
2024-05-21T17:49:24.256Z        V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-05-21T17:49:24.256Z        V2      Pulling docker image    {"image": "public.ecr.aws/l0g8r8j6/eks-anywhere-cli-tools:v0.18.6-eks-a-v0.0.0-dev-build.8321"}
2024-05-21T17:49:24.256Z        V6      Executing command       {"cmd": "/usr/bin/docker pull public.ecr.aws/l0g8r8j6/eks-anywhere-cli-tools:v0.18.6-eks-a-v0.0.0-dev-build.8321"}
2024-05-21T17:49:24.842Z        V5      Retry execution successful      {"retries": 1, "duration": "585.928748ms"}
2024-05-21T17:49:24.842Z        V3      Initializing long running container     {"name": "eksa_1716313764256829428", "image": "public.ecr.aws/l0g8r8j6/eks-anywhere-cli-tools:v0.18.6-eks-a-v0.0.0-dev-build.8321"}
2024-05-21T17:49:24.842Z        V6      Executing command       {"cmd": "/usr/bin/docker run -d --name eksa_1716313764256829428 --network host -w /home/ubuntu/multizone -v /var/run/docker.sock:/var/run/docker.sock -v /home/ubuntu/multizone:/home/ubuntu/multizone -v /home/ubuntu/multizone:/home/ubuntu/multizone --entrypoint sleep public.ecr.aws/l0g8r8j6/eks-anywhere-cli-tools:v0.18.6-eks-a-v0.0.0-dev-build.8321 infinity"}
2024-05-21T17:49:25.004Z        V0      Using the new workflow using the controller for management cluster create
2024-05-21T17:49:25.005Z        V4      Task start      {"task_name": "setup-validate"}
2024-05-21T17:49:25.005Z        V0      Performing setup and validations
2024-05-21T17:49:25.005Z        V0      ValidateClusterSpec for Nutanix datacenter      {"NutanixDatacenter": "multizone"}
2024-05-21T17:49:30.699Z        V0      ✅ Nutanix Provider setup is valid
2024-05-21T17:49:30.699Z        V0      ✅ Validate OS is compatible with registry mirror configuration
2024-05-21T17:49:30.699Z        V0      ✅ Validate certificate for registry mirror
2024-05-21T17:49:30.699Z        V0      ✅ Validate authentication for git provider
2024-05-21T17:49:30.699Z        V0      ✅ Validate cluster's eksaVersion matches EKS-A version
2024-05-21T17:49:30.699Z        V4      Task finished   {"task_name": "setup-validate", "duration": "5.694920554s"}
2024-05-21T17:49:30.700Z        V4      ----------------------------------
...
2024-05-21T17:59:55.978Z        V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1716313764256829428 kind delete cluster --name multizone-eks-a-cluster"}
2024-05-21T17:59:57.129Z        V5      Retry execution successful      {"retries": 1, "duration": "1.150870568s"}
2024-05-21T17:59:57.129Z        V0      🎉 Cluster created!
2024-05-21T17:59:57.129Z        V4      Task finished   {"task_name": "delete-kind-cluster", "duration": "1.553823627s"}
2024-05-21T17:59:57.129Z        V4      ----------------------------------
2024-05-21T17:59:57.129Z        V4      Task start      {"task_name": "install-curated-packages"}
--------------------------------------------------------------------------------------
The Amazon EKS Anywhere Curated Packages are only available to customers with the
Amazon EKS Anywhere Enterprise Subscription
--------------------------------------------------------------------------------------
2024-05-21T17:59:57.129Z        V0      Enabling curated packages on the cluster
...
Screenshot 2024-05-21 at 20 08 17

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

Hi @adiantum. Thanks for your PR.

I'm waiting for a aws member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eks-distro-bot eks-distro-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 21, 2024
@abhinavmpandey08
Copy link
Member

/ok-to-test

Copy link

codecov bot commented May 21, 2024

Codecov Report

Attention: Patch coverage is 84.74576% with 9 lines in your changes missing coverage. Please review.

Project coverage is 73.42%. Comparing base (5940b0e) to head (2726504).
Report is 1 commits behind head on main.

Files Patch % Lines
pkg/api/v1alpha1/nutanixdatacenterconfig_types.go 63.15% 3 Missing and 4 partials ⚠️
pkg/providers/nutanix/validator.go 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8192      +/-   ##
==========================================
+ Coverage   73.37%   73.42%   +0.04%     
==========================================
  Files         578      578              
  Lines       35995    36054      +59     
==========================================
+ Hits        26413    26473      +60     
+ Misses       7907     7904       -3     
- Partials     1675     1677       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

UUID: fd.Cluster.UUID,
},
Subnets: subnets,
ControlPlane: true,
Copy link
Contributor

@deepakm-ntnx deepakm-ntnx May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be fd.ControlPlane or are we expecting controlPlane to be always be using all FDs? also can we add a documentation when we are hardcoding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah what does ControlPlane: true signify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ControlPlane: true means the failure domain will be used for control plane VMs and etcd VMs. Th requested feature is to spread Control Plane VMs across several PE's so we're setting CAPX ControlPlane option to true.

Copy link
Member

@abhinavmpandey08 abhinavmpandey08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also try out a testcase where you dont have failure domain at cluster creation time and then add it in upgrade?

pkg/providers/nutanix/template.go Outdated Show resolved Hide resolved
pkg/providers/nutanix/config/cp-template.yaml Outdated Show resolved Hide resolved
UUID: fd.Cluster.UUID,
},
Subnets: subnets,
ControlPlane: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah what does ControlPlane: true signify?

adiantum added 5 commits June 6, 2024 18:48
 - change Nutanix Datacenter CRD
 - change templates
 - generate manifests
 - add unittest
 - add validation
 - fix template
 - add unittest for validation
@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 6, 2024
@adiantum
Copy link
Contributor Author

adiantum commented Jun 6, 2024

Can you also try out a testcase where you dont have failure domain at cluster creation time and then add it in upgrade?

Current implementation in CAPX doesn't support this case. In current implementation we're allowing to use FDs for new clusters. These limitations will be improved in future.

@adiantum
Copy link
Contributor Author

adiantum commented Jun 6, 2024

After discussion with @abhinavmpandey08 we're allowing to update failure domains but we should mention in docs that rollout should be triggered separately.

Copy link
Member

@abhinavmpandey08 abhinavmpandey08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavmpandey08

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavmpandey08 abhinavmpandey08 merged commit e3d0f13 into aws:main Jun 6, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants