Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPX clusterclass support #344

Merged
merged 22 commits into from
Jan 15, 2024

Conversation

deepakm-ntnx
Copy link
Contributor

@deepakm-ntnx deepakm-ntnx commented Dec 11, 2023

What this PR does / why we need it:
This PR adds support for clusterclass in CAPX. More indepth details on need for clusterclass can be found here https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/

have kept the go/v3 kuberbuilder scffolding as is to reduce the code churn

Done

  • add clusterclass template by reusing some of the existing base templates
  • re-org templates/clusterclass to use base/ correctly.
  • able to create clusterclass based cluster using env variables
  • merge yamls with upstream ccm default related changes
  • added few jsonPatches. need to keep adding more.
  • upgrade of cluster with topology from 1.27.8 to 1.28.4 was successful. had to only change the image names in cp and md nmt and kubernetes version.
  • complete testing of clusterclass_changes.go test with nutanix clusterclass. local test run passed

Dev Steps:

make docker-build
make deploy
make test-cc-cluster-create
make list-cc-cluster-resources
make test-cc-cluster-install-cni
make test-cc-cluster-delete
make test-e2e LABEL_FILTERS=clusterclass

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

How Has This Been Tested?:
Currently its tested manually as mentioned above dev steps.
local test passed

make test-e2e-calico LABEL_FILTERS=clusterclass
...
When testing ClusterClass changes [ClusterClass] Should successfully rollout the managed topology upon changes to the ClusterClass [clusterclass, slow, network]
/Users/deepak.muley/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.3.5/e2e/clusterclass_changes.go:132
  STEP: Creating a namespace for hosting the "clusterclass-changes" test spec @ 01/05/24 12:04:31.426
  INFO: Creating namespace clusterclass-changes-qey1b9
  INFO: Creating event watcher for namespace "clusterclass-changes-qey1b9"
  STEP: Creating a workload cluster @ 01/05/24 12:04:31.451
  INFO: Creating the workload cluster with name "clusterclass-changes-u2v3sb" using the "topology" template (Kubernetes v1.28.4, 1 control-plane machines, 1 worker machines)
  INFO: Getting the cluster template yaml
  INFO: clusterctl config cluster clusterclass-changes-u2v3sb --infrastructure (default) --kubernetes-version v1.28.4 --control-plane-machine-count 1 --worker-machine-count 1 --flavor topology
  INFO: Applying the cluster template yaml to the cluster
configmap/clusterclass-changes-u2v3sb-pc-trusted-ca-bundle created
configmap/nutanix-ccm created
secret/clusterclass-changes-u2v3sb created
secret/nutanix-ccm-secret created
clusterresourceset.addons.cluster.x-k8s.io/nutanix-ccm-crs created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/clusterclass-changes-u2v3sb-kcfg-0 created
clusterclass.cluster.x-k8s.io/e2e created
kubeadmcontrolplanetemplate.controlplane.cluster.x-k8s.io/e2e-kcpt created
nutanixclustertemplate.infrastructure.cluster.x-k8s.io/e2e-nct created
nutanixmachinetemplate.infrastructure.cluster.x-k8s.io/e2e-cp-nmt created
nutanixmachinetemplate.infrastructure.cluster.x-k8s.io/e2e-md-nmt created
configmap/cni-clusterclass-changes-u2v3sb-crs-cni created
clusterresourceset.addons.cluster.x-k8s.io/clusterclass-changes-u2v3sb-crs-cni created
cluster.cluster.x-k8s.io/clusterclass-changes-u2v3sb created

  INFO: Waiting for the cluster infrastructure to be provisioned
  STEP: Waiting for cluster to enter the provisioned phase @ 01/05/24 12:04:34.228
  INFO: Waiting for control plane to be initialized
  INFO: Waiting for the first control plane machine managed by clusterclass-changes-qey1b9/clusterclass-changes-u2v3sb-rgb5c to be provisioned
  STEP: Waiting for one control plane node to exist @ 01/05/24 12:04:44.287
  INFO: Waiting for control plane to be ready
  INFO: Waiting for control plane clusterclass-changes-qey1b9/clusterclass-changes-u2v3sb-rgb5c to be ready (implies underlying nodes to be ready as well)
  STEP: Waiting for the control plane to be ready @ 01/05/24 12:05:54.374
  STEP: Checking all the control plane machines are in the expected failure domains @ 01/05/24 12:05:54.379
  INFO: Waiting for the machine deployments to be provisioned
  STEP: Waiting for the workload nodes to exist @ 01/05/24 12:05:54.401
  STEP: Checking all the machines controlled by clusterclass-changes-u2v3sb-md-0-k6z6w are in the "" failure domain @ 01/05/24 12:06:14.436
  INFO: Waiting for the machine pools to be provisioned
  STEP: Modifying the control plane configuration in ClusterClass and wait for changes to be applied to the control plane object @ 01/05/24 12:06:14.478
  INFO: Modifying the ControlPlaneTemplate of ClusterClass clusterclass-changes-qey1b9/e2e
  INFO: Waiting for ControlPlane rollout to complete.
  STEP: Modifying the MachineDeployment configuration in ClusterClass and wait for changes to be applied to the MachineDeployment objects @ 01/05/24 12:06:24.575
  INFO: Modifying the BootstrapConfigTemplate of MachineDeploymentClass "e2e-worker" of ClusterClass clusterclass-changes-qey1b9/e2e
  INFO: Waiting for MachineDeployment rollout for MachineDeploymentClass "e2e-worker" to complete.
  INFO: Waiting for MachineDeployment rollout for MachineDeploymentTopology "md-0" (class "e2e-worker") to complete.
  STEP: Rebasing the Cluster to a ClusterClass with a modified label for MachineDeployments and wait for changes to be applied to the MachineDeployment objects @ 01/05/24 12:06:34.667
  INFO: Waiting for MachineDeployment rollout to complete.
  INFO: Waiting for MachineDeployment rollout for MachineDeploymentTopology "md-0" (class "e2e-worker") to complete.
  STEP: Deleting a MachineDeploymentTopology in the Cluster Topology and wait for associated MachineDeployment to be deleted @ 01/05/24 12:06:44.977
  INFO: Removing MachineDeploymentTopology from the Cluster Topology.
  INFO: Waiting for MachineDeployment to be deleted.
  STEP: PASSED! @ 01/05/24 12:06:55.066
  STEP: Dumping logs from the "clusterclass-changes-u2v3sb" workload cluster @ 01/05/24 12:06:55.066
Failed to get logs for Machine clusterclass-changes-u2v3sb-rgb5c-5sb62, Cluster clusterclass-changes-qey1b9/clusterclass-changes-u2v3sb: error creating container exec: Error response from daemon: No such container: clusterclass-changes-u2v3sb-rgb5c-5sb62
  STEP: Dumping all the Cluster API resources in the "clusterclass-changes-qey1b9" namespace @ 01/05/24 12:06:55.142
  STEP: Deleting cluster clusterclass-changes-qey1b9/clusterclass-changes-u2v3sb @ 01/05/24 12:06:55.386
  STEP: Deleting cluster clusterclass-changes-u2v3sb @ 01/05/24 12:06:55.429
  INFO: Waiting for the Cluster clusterclass-changes-qey1b9/clusterclass-changes-u2v3sb to be deleted
  STEP: Waiting for cluster clusterclass-changes-u2v3sb to be deleted @ 01/05/24 12:06:55.479
  STEP: Deleting namespace used for hosting the "clusterclass-changes" test spec @ 01/05/24 12:07:15.635
  INFO: Deleting namespace clusterclass-changes-qey1b9
• [164.231 seconds]
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[SynchronizedAfterSuite] 
/Users/deepak.muley/go/src/github.com/deepakm-ntnx/cluster-api-provider-nutanix/test/e2e/e2e_suite_test.go:113
  STEP: Dumping logs from the bootstrap cluster @ 01/05/24 12:07:15.659
Failed to get logs for the bootstrap cluster node test-6vt5e4-control-plane: exit status 1
  STEP: Tearing down the management cluster @ 01/05/24 12:07:16.138
[SynchronizedAfterSuite] PASSED [1.757 seconds]
------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.003 seconds]
------------------------------

Ran 1 of 34 Specs in 313.076 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 33 Skipped

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration and test output

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@deepakm-ntnx deepakm-ntnx changed the title [WIP] capx clusterclass support CAPX clusterclass support Dec 11, 2023
Copy link

codecov bot commented Dec 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ba2b194) 15.14% compared to head (31a01eb) 15.21%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
+ Coverage   15.14%   15.21%   +0.07%     
==========================================
  Files          17       18       +1     
  Lines        1208     1209       +1     
==========================================
+ Hits          183      184       +1     
  Misses       1025     1025              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deepakm-ntnx
Copy link
Contributor Author

deepakm-ntnx commented Dec 12, 2023

Please note the TODO list in description. while adding e2e tests, it was discovered that we need to reorg the templates folder yaml with respect to bases else it casues issues in generate e2e templates. working on the same

base
- Cluster without topology
- KubeadmControlPlane
- KubeadmConfigTemplate
- NutanixCluster
- NutanixMachineTemplate
- MachineDeployment
- ConfigMap
- Secret
Overlay: Cluster class
- Template patches
    - Add ClusterClass
    - Add KubeadmControlPlaneTemplate
    - Update Cluster with topology
    - Kustomize
        - Include everything from base
        - Drop NutanixCluster
        - Drop MachineDeployment
        - Drop KubeadmControlPlane

Copy link
Member

@jimmidyson jimmidyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @deepakm-ntnx for this! Looking forward to getting this merged and CAPX supporting ClusterTopology!

The big thing that stands out is that the ClusterClass definition and associated templates should be using variables and patches, rather than clusterctl env var templating. This will allow configuration to be specified per-cluster rather than embedding in the CC at time of applying the template.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
api/v1beta1/nutanixclustertemplate_types.go Outdated Show resolved Hide resolved
templates/cluster-template-clusterclass.yaml Show resolved Hide resolved
templates/cluster-template-clusterclass.yaml Outdated Show resolved Hide resolved
templates/cluster-template-clusterclass.yaml Show resolved Hide resolved
templates/clusterclass/nutanix-clusterclass.yaml Outdated Show resolved Hide resolved
templates/clusterclass/cluster-template-topology.yaml Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@jimmidyson
Copy link
Member

Oops sorry @deepakm-ntnx I didn't see the TODO about the clusterclass variables and patches and commented 🫣

@deepakm-ntnx deepakm-ntnx force-pushed the clusterclassv3 branch 2 times, most recently from 50233dc to e188070 Compare December 19, 2023 22:55
@dlipovetsky
Copy link
Contributor

For anyone exercising this in a shared PC: Your cluster name must be unique. Here, it's determined by the TEST_CLUSTER_NAME make variable, e.g., make test-cc-cluster-create TEST_CLUSTER_NAME=<unique name>.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
config/manager/manager.yaml Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
docs/developer_workflow.md Outdated Show resolved Hide resolved
templates/clusterclass/nmt-cp.yaml Show resolved Hide resolved
@thunderboltsid
Copy link
Contributor

/retest

Copy link
Contributor

@adiantum adiantum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@nutanix-cn-prow-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adiantum, deepakm-ntnx, dkoshkin, thunderboltsid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adiantum,deepakm-ntnx,thunderboltsid]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thunderboltsid thunderboltsid merged commit d5cf036 into nutanix-cloud-native:main Jan 15, 2024
8 of 10 checks passed
@tuxtof tuxtof added the enhancement New feature or request label Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants