Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first version of failure domain support for CAPX #335

Conversation

yannickstruyf3
Copy link
Contributor

@yannickstruyf3 yannickstruyf3 commented Nov 28, 2023

What this PR does / why we need it:

  • Allows control plane and worker nodes to be spread across failure domains using CAPI failure domain constructs. See cluster-api documentation.
    Each failure domain consists of the combination of a Prism Element cluster and a subnet. All Prism Element clusters need to be managed by the same Prism Central instance.

Example NutanixCluster spec:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: NutanixCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  failureDomains:
    - name: failuredomain-1
      controlPlane: true
      cluster:
        type: name
        name: <pe-1>
      subnets:
        - type: name
          name: <subnet-1>
  ....

Example MachineDeployment:

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
spec:
  template:
    spec:
      failureDomain: <failure-domain-1>
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: NutanixMachineTemplate
        name: <nutanixMachineTemplate-1>

Each Machine created by the MachineDeployment will have the failureDomain attribute in the spec.

  • Added a new e2e test for failure domains.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
N/A

How Has This Been Tested?:

make test-e2e-cilium

Ran 14 of 34 Specs in 3274.392 seconds
SUCCESS! -- 14 Passed | 0 Failed | 0 Pending | 20 Skipped

To run the failure-domain e2e tests, following env variables are required:

  NUTANIX_FAILURE_DOMAIN_1_PRISM_ELEMENT_NAME: "<pe-1>"
  NUTANIX_FAILURE_DOMAIN_1_SUBNET_NAME: "<subnet-1>"
  NUTANIX_FAILURE_DOMAIN_2_PRISM_ELEMENT_NAME: "<pe-2>"
  NUTANIX_FAILURE_DOMAIN_2_SUBNET_NAME: "<subnet-2>"
  NUTANIX_FAILURE_DOMAIN_3_PRISM_ELEMENT_NAME: "<pe-3>"
  NUTANIX_FAILURE_DOMAIN_3_SUBNET_NAME: "<subnet-3>"

LABEL_FILTERS="capx-feature-test&&failure-domains" RUN_VALIDATION_TESTS_ONLY=true make test-e2e-cilium

Ran 1 of 34 Specs in 616.406 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 33 Skipped

Special notes for your reviewer:
N/A

Release note:

- Failure domain support

@yannickstruyf3 yannickstruyf3 force-pushed the feat/failure-domain-functionality branch from d42cdb0 to eb5abd5 Compare November 28, 2023 15:20
Copy link

codecov bot commented Nov 28, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (4d78273) 2.61% compared to head (a525379) 8.89%.

Files Patch % Lines
controllers/nutanixmachine_controller.go 69.44% 11 Missing ⚠️
controllers/nutanixcluster_controller.go 86.66% 2 Missing ⚠️
controllers/helpers.go 90.90% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #335      +/-   ##
========================================
+ Coverage   2.61%   8.89%   +6.27%     
========================================
  Files          4       4              
  Lines        995    1046      +51     
========================================
+ Hits          26      93      +67     
+ Misses       969     953      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deepakm-ntnx
Copy link
Contributor

deepakm-ntnx commented Nov 28, 2023

As discussed, lets add following

  • need to add unit tests to meet 75% of incremental code coverage for this PR
  • need to add design spec either in docs/proposals or docs/design folder in github to explain currently supported and future plan eg one PC/PE1-PE2 or two PC-1/PE-1, PC-2/PE-2 config etc
  • need to add more context in description with sample data and tests run

api/v1alpha4/nutanixmachine_types.go Show resolved Hide resolved
api/v1alpha4/nutanixmachine_types.go Show resolved Hide resolved
api/v1beta1/nutanixcluster_types.go Show resolved Hide resolved
api/v1beta1/nutanixmachine_types.go Show resolved Hide resolved
api/v1beta1/nutanixmachine_types.go Show resolved Hide resolved
controllers/nutanixcluster_controller.go Show resolved Hide resolved
test/e2e/failure_domains_test.go Show resolved Hide resolved
test/e2e/config/nutanix.yaml Show resolved Hide resolved
@yannickstruyf3 yannickstruyf3 added the enhancement New feature or request label Nov 29, 2023
@yannickstruyf3 yannickstruyf3 force-pushed the feat/failure-domain-functionality branch 6 times, most recently from eb577f9 to 50bc865 Compare November 30, 2023 11:01
@thunderboltsid
Copy link
Contributor

/retest

@thunderboltsid
Copy link
Contributor

thunderboltsid commented Dec 1, 2023

Changes otherwise lgtm. E2E failed due to bad config so that need changing. FYI @adiantum

�[38;5;9m�[1m[SynchronizedBeforeSuite] �[0m
�[38;5;243m/home/prow/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e/e2e_suite_test.go:61�[0m

  �[38;5;9m[FAILED] The e2e test config file is not valid
  Expected success, but got an error:
      <*errors.fundamental | 0xc000863290>: {
          msg: "invalid argument: Providers[3].Sources[7].Files[13].SourcePath=\"/home/prow/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e/data/infrastructure-nutanix/v1beta1/cluster-template-failure-domains.yaml\"",
          stack: [0x1c18517, 0x1c1a8c5, 0x1c1858f, 0x1c176ff, 0x1cfe71b, 0x1cfdeba, 0x4e3547, 0x4e2619, 0x15d8dfc, 0x15e792e, 0x15eb08d, 0x473b81],
      }
      invalid argument: Providers[3].Sources[7].Files[13].SourcePath="/home/prow/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e/data/infrastructure-nutanix/v1beta1/cluster-template-failure-domains.yaml"�[0m
  �[38;5;9mIn �[1m[SynchronizedBeforeSuite]�[0m�[38;5;9m at: �[1m/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.3.5/framework/clusterctl/e2e_config.go:63�[0m �[38;5;243m@ 11/30/23 18:09:48.212�[0m

  �[38;5;9mFull Stack Trace�[0m
    sigs.k8s.io/cluster-api/test/framework/clusterctl.LoadE2EConfig({0x0?, 0x0?}, {{0x7ffe2c6cb535?, 0xc000827718?}})
    	/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.3.5/framework/clusterctl/e2e_config.go:63 +0x28f
    github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e.loadE2EConfig({0x7ffe2c6cb535, 0x6b})
    	/home/prow/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e/e2e_suite_test.go:138 +0x3b
    github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e.glob..func12()
    	/home/prow/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/test/e2e/e2e_suite_test.go:71 +0x1fa
    reflect.Value.call({0x1e63a60?, 0x2440460?, 0x13?}, {0x231d175, 0x4}, {0xc0000b1f20, 0x0, 0x0?})
    	/usr/local/go/src/reflect/value.go:596 +0xce7
    reflect.Value.Call({0x1e63a60?, 0x2440460?, 0x0?}, {0xc000581720?, 0x0?, 0x0?})
    	/usr/local/go/src/reflect/value.go:380 +0xb9

@yannickstruyf3 yannickstruyf3 force-pushed the feat/failure-domain-functionality branch from a6398f3 to cd6eb7c Compare December 1, 2023 10:47
@yannickstruyf3
Copy link
Contributor Author

/retest

deepakm-ntnx
deepakm-ntnx previously approved these changes Dec 1, 2023
tuxtof
tuxtof previously approved these changes Dec 4, 2023
Copy link
Contributor

@tuxtof tuxtof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yannickstruyf3 yannickstruyf3 dismissed stale reviews from tuxtof and deepakm-ntnx via 2840653 December 5, 2023 13:38
@yannickstruyf3 yannickstruyf3 force-pushed the feat/failure-domain-functionality branch from 2840653 to a525379 Compare December 5, 2023 13:47
Copy link
Contributor

@tuxtof tuxtof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
/lgtm
/approve

@nutanix-cn-prow-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tuxtof, yannickstruyf3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [tuxtof,yannickstruyf3]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tuxtof tuxtof merged commit 7f5909c into nutanix-cloud-native:main Dec 6, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved enhancement New feature or request lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants