Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awscapi #342

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Awscapi #342

wants to merge 2 commits into from

Conversation

huali9
Copy link
Contributor

@huali9 huali9 commented Nov 21, 2024

Auto 3 capi case:
OCP-51071 - [CAPI] Create machineset with CAPI on aws
OCP-75395 - [CAPI] AWS Placement group support
OCP-75396 - [CAPI] Creating machines using KMS keys from AWS

liuhuali@Lius-MacBook-Pro cluster-api-actuator-pkg % ./hack/ci-integration.sh -focus "Cluster API AWS MachineSet" -v
Running Suite: Machine Suite - /Users/liuhuali/project/cluster-api-actuator-pkg/pkg
===================================================================================
Random Seed: 1732174559

Will run 3 of 45 specs
------------------------------
[BeforeSuite] 
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/e2e_test.go:63
[BeforeSuite] PASSED [1.856 seconds]
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
Cluster API AWS MachineSet should be able to run a machine with a default provider spec [capi, disruptive]
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/capi/aws.go:103
  STEP: Creating core cluster @ 11/21/24 15:36:29.579
  STEP: Creating AWS machine template @ 11/21/24 15:36:30.591
  STEP: Creating MachineSet "aws-machineset-51071" @ 11/21/24 15:36:31.27
  STEP: Waiting for MachineSet machines "aws-machineset-51071" to enter Running phase @ 11/21/24 15:36:31.616
  STEP: Deleting MachineSet "aws-machineset-51071" @ 11/21/24 15:40:17.064
  STEP: Waiting for MachineSet "aws-machineset-51071" to be deleted @ 11/21/24 15:40:17.834
  STEP: Deleting /aws-machine-template @ 11/21/24 15:42:22.615
• [356.869 seconds]
------------------------------
Cluster API AWS MachineSet should be able to run a machine with cluster placement group [capi, disruptive]
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/capi/aws.go:115
  STEP: Creating AWS machine template @ 11/21/24 15:42:27.294I1121 15:42:27.294175   86632 aws_client.go:72] The created placementGroupID is pg-0ff9c0b32fae36690

  STEP: Creating MachineSet "aws-machineset-75395" @ 11/21/24 15:42:27.685
  STEP: Waiting for MachineSet machines "aws-machineset-75395" to enter Running phase @ 11/21/24 15:42:28.063
  STEP: Deleting MachineSet "aws-machineset-75395" @ 11/21/24 15:45:36.624
  STEP: Waiting for MachineSet "aws-machineset-75395" to be deleted @ 11/21/24 15:45:37.133
  STEP: Deleting /aws-machine-template @ 11/21/24 15:47:42.003
• [321.381 seconds]
------------------------------
Cluster API AWS MachineSet should be able to run a machine using KMS keys [capi, disruptive]
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/capi/aws.go:142
  STEP: Creating AWS machine template @ 11/21/24 15:47:44.44
  STEP: Creating MachineSet "aws-machineset-75396" @ 11/21/24 15:47:45.128
  STEP: Waiting for MachineSet machines "aws-machineset-75396" to enter Running phase @ 11/21/24 15:47:45.488
  STEP: Deleting MachineSet "aws-machineset-75396" @ 11/21/24 15:50:10.189
  STEP: Waiting for MachineSet "aws-machineset-75396" to be deleted @ 11/21/24 15:50:10.532
  STEP: Deleting /aws-machine-template @ 11/21/24 15:51:15.803
• [212.339 seconds]
------------------------------
SSSSSSSSSS
------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.002 seconds]
------------------------------

Ran 3 of 45 Specs in 892.446 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 42 Skipped
PASS

Ginkgo ran 1 suite in 15m17.211460531s
Test Suite Passed

@sunzhaohua2 @miyadav @shellyyang1989 PTAL, thanks!

@huali9 huali9 mentioned this pull request Nov 21, 2024
@huali9 huali9 force-pushed the awscapi branch 2 times, most recently from b6329be to 202e9ef Compare November 21, 2024 09:49
@huali9
Copy link
Contributor Author

huali9 commented Nov 21, 2024

/retest-required

@huali9
Copy link
Contributor Author

huali9 commented Nov 22, 2024

@huali9: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-operator 26311ca link true /test e2e-azure-operator
ci/prow/e2e-azure-capi-techpreview 26311ca link true /test e2e-azure-capi-techpreview
Full PR test history. Your PR dashboard.

I checked the azure failure, it should be flake, and this pr will not affect azure, so we can ignore that.
I think the pr is ready to review. @sunzhaohua2 @miyadav @shellyyang1989 PTAL, thanks!

@@ -45,3 +45,41 @@ func (a *AwsClient) CancelCapacityReservation(capacityReservationID string) (boo

return ptr.Deref(result.Return, false), err
}

// CreateCapacityReservation Create CapacityReservation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be CreatePlacementGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thank you!

pkg/capi/aws.go Outdated

func getDefaultAWSMAPIProviderSpec(cl client.Client) (*mapiv1.MachineSet, *mapiv1.AWSMachineProviderConfig) {
machineSetList := &mapiv1.MachineSetList{}
Expect(cl.List(ctx, machineSetList, client.InNamespace(framework.MachineAPINamespace))).To(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dam suggested use Evenutally on all cl. (client) actions instead of Expect, so we allow for retry on brief API/network hiccups. see here #337 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thank you!

pkg/capi/aws.go Outdated
It("should be able to run a machine with a default provider spec", func() {
awsMachineTemplate = newAWSMachineTemplate(mapiDefaultProviderSpec)
if err = cl.Create(ctx, awsMachineTemplate); err != nil {
Expect(err).ToNot(HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all the .Should() and .To() (both with the Eventuallys and the Expects) could we add the optional description
#337 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all updated, thank you!

case "us-east-1":
key = "arn:aws:kms:us-east-1:301721915996:key/c471ec83-cfaf-41a2-9241-d9e99c4da344"
case "us-east-2":
key = "arn:aws:kms:us-east-2:301721915996:key/c228ef83-df2c-4151-84c4-d9f39f39a972"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be ran in dev's account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the job result, and yes, you are right, this cannot be ran in dev's account, I just comment this case out for now and think of how to deal with it, will update in a following pr. Thank you @sunzhaohua2 PTAL again, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this case to skip it in dev's account.

pkg/capi/aws.go Outdated
Comment on lines 210 to 223
awsMachineSpec := awsv1.AWSMachineSpec{
UncompressedUserData: &uncompressedUserData,
IAMInstanceProfile: *mapiProviderSpec.IAMInstanceProfile.ID,
InstanceType: mapiProviderSpec.InstanceType,
AMI: awsv1.AMIReference{
ID: mapiProviderSpec.AMI.ID,
},
Ignition: &awsv1.Ignition{
Version: "3.4",
StorageType: awsv1.IgnitionStorageTypeOptionUnencryptedUserData,
},
Subnet: &awsv1.AWSResourceReference{
Filters: []awsv1.Filter{
{
Name: "tag:Name",
Values: mapiProviderSpec.Subnet.Filters[0].Values,
},
},
},
AdditionalSecurityGroups: []awsv1.AWSResourceReference{
{
Filters: []awsv1.Filter{
{
Name: "tag:Name",
Values: mapiProviderSpec.SecurityGroups[0].Filters[0].Values,
},
},
},
},
}

awsMachineTemplate := &awsv1.AWSMachineTemplate{
ObjectMeta: metav1.ObjectMeta{
Name: awsMachineTemplateName,
Namespace: framework.ClusterAPINamespace,
},
Spec: awsv1.AWSMachineTemplateSpec{
Template: awsv1.AWSMachineTemplateResource{
Spec: awsMachineSpec,
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thank you @sunzhaohua2 PTAL again, it passed in local.

liuhuali@Lius-MacBook-Pro cluster-api-actuator-pkg % ./hack/ci-integration.sh -focus "Cluster API AWS MachineSet" -v
Running Suite: Machine Suite - /Users/liuhuali/project/cluster-api-actuator-pkg/pkg
===================================================================================
Random Seed: 1732506965

Will run 2 of 44 specs
------------------------------
[BeforeSuite] 
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/e2e_test.go:63
[BeforeSuite] PASSED [1.360 seconds]
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
Cluster API AWS MachineSet should be able to run a machine with a default provider spec [capi, disruptive]
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/capi/aws.go:100
  STEP: Creating core cluster @ 11/25/24 11:56:29.931
  STEP: Creating AWS machine template @ 11/25/24 11:56:30.766
  STEP: Creating MachineSet "aws-machineset-51071" @ 11/25/24 11:56:31.32
  STEP: Waiting for MachineSet machines "aws-machineset-51071" to enter Running phase @ 11/25/24 11:56:31.602
  STEP: Deleting MachineSet "aws-machineset-51071" @ 11/25/24 12:00:17.049
  STEP: Waiting for MachineSet "aws-machineset-51071" to be deleted @ 11/25/24 12:00:17.327
  STEP: Deleting /aws-machine-template @ 11/25/24 12:01:21.212
• [294.166 seconds]
------------------------------
Cluster API AWS MachineSet should be able to run a machine with cluster placement group [capi, disruptive]
/Users/liuhuali/project/cluster-api-actuator-pkg/pkg/capi/aws.go:112
I1125 12:01:24.951211   28912 aws_client.go:72] The created placementGroupID is pg-0b86fc1ccb6aa9742
  STEP: Creating AWS machine template @ 11/25/24 12:01:24.951
  STEP: Creating MachineSet "aws-machineset-75395" @ 11/25/24 12:01:25.238
  STEP: Waiting for MachineSet machines "aws-machineset-75395" to enter Running phase @ 11/25/24 12:01:25.521
  STEP: Deleting MachineSet "aws-machineset-75395" @ 11/25/24 12:05:51.76
  STEP: Waiting for MachineSet "aws-machineset-75395" to be deleted @ 11/25/24 12:05:52.055
  STEP: Deleting /aws-machine-template @ 11/25/24 12:06:55.93
• [336.224 seconds]
------------------------------
SSSSSSSSSSSSSSSS
------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.003 seconds]
------------------------------

Ran 2 of 44 Specs in 631.752 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 42 Skipped
PASS

Ginkgo ran 1 suite in 10m52.059989621s
Test Suite Passed

@huali9
Copy link
Contributor Author

huali9 commented Nov 25, 2024

@huali9: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-operator 2b5eb5c link true /test e2e-azure-operator
Full PR test history. Your PR dashboard.

The azure job failed should be flake, and this pr will not affect azure, so we can ignore that.

@sunzhaohua2
Copy link
Contributor

Thank you! @huali9
LGTM

@huali9
Copy link
Contributor Author

huali9 commented Nov 26, 2024

/assign @JoelSpeed @damdo

Copy link
Contributor

openshift-ci bot commented Nov 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from damdo. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shellyyang1989
Copy link

cc @RadekManak for review. Thank you!

placementGroupID := ptr.Deref(result.PlacementGroup.GroupId, "")
klog.Infof("The created placementGroupID is %s", placementGroupID)

return placementGroupID, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err will always be nil here, better to return nil

Suggested change
return placementGroupID, err
return placementGroupID, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thank you!

Comment on lines 124 to 126
result, err := a.Svc.DeletePlacementGroup(input)

return result.String(), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to check the error explicitly

Suggested change
result, err := a.Svc.DeletePlacementGroup(input)
return result.String(), err
result, err := a.Svc.DeletePlacementGroup(input)
if err != nil {
return "", fmt.Errorf("could not delete placement group: %w", err)
}
return result.String(), nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thank you!

pkg/capi/aws.go Outdated
Comment on lines 30 to 36
var (
cl client.Client
ctx = context.Background()
platform configv1.PlatformType
clusterName string
oc *gatherer.CLI
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need these be exposed at this scope, or could they be within the Describe on L38?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to within the Describe, thank you!

pkg/capi/aws.go Outdated
if platform != configv1.AWSPlatformType {
Skip("Skipping AWS E2E tests")
}
oc, _ = framework.NewCLI()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second return variable here and why are we ignoring it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second return variable is error, updated, thank you!

pkg/capi/aws.go Outdated
Comment on lines 88 to 92
if platform != configv1.AWSPlatformType {
// Because AfterEach always runs, even when tests are skipped, we have to
// explicitly skip it here for other platforms.
Skip("Skipping AWS E2E tests")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not have skipped in the BeforeAll?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, they should have skipped in the BeforeAll, deleted here, thank you!

pkg/capi/aws.go Outdated
Skip("Skipping AWS E2E tests")
}
if !deleted {
framework.DeleteCAPIMachineSets(ctx, cl, machineSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this framework tolerated the object not existing, you wouldn't need to track whether the test expected to delete the machineset or not.

Are there cases where you want to be sure it did delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should delete the objects we created in the test, right?

pkg/capi/aws.go Outdated
placementGroupID, err := awsClient.CreatePlacementGroup(placementGroupName, "cluster")
Expect(err).ToNot(HaveOccurred(), "Failed to create placementgroup")
Expect(placementGroupID).ToNot(Equal(""), "expected the placementGroupID to not be empty string")
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you were to leverage DeferCleanup you might be able to avoid some complexity here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, great, updated, thank you!

pkg/capi/aws.go Outdated
Comment on lines 120 to 119
framework.DeleteCAPIMachineSets(ctx, cl, machineSet)
framework.WaitForCAPIMachineSetsDeleted(ctx, cl, machineSet)
framework.DeleteObjects(ctx, cl, awsMachineTemplate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for these to cause the defer to exit prior to deleting the placement group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that may occur, updated to use DeferCleanup, thank you!

pkg/capi/aws.go Outdated
Comment on lines 155 to 135
if err != nil {
Skip(fmt.Sprintf("Skip because cannot get the key %v", err))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping because of errors is likely to mean this test gets skipped when we don't want it to. Is there a better way to skip tests when we are in the dev account, vs QE account? Perhaps using labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped for errors is a temporary solution, using labels should be better way, but we don't have that labels at present, and that's what we are discussing on slack https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1732789329032099, we need identify which cases run in DEV CI, which cases run in QE CI, then add labels for them, then update all the DEV jobs to only run the cases chosen for DEV CI. Currently all the cases in this repo are run in DEV CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JoelSpeed I added a new label qe-account-only and updated Makefile to exclude use cases with such labels. And all other comments also updated, PLAT again, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this new label align with the plan @sunzhaohua2 has proposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the meaning is the same, just the name is different, Zhaohua named it qe-only and I named it qe-account-only, I can update the name when we make a decision.

pkg/capi/aws.go Outdated
Comment on lines 169 to 171
if err := cl.Create(ctx, awsMachineTemplate); err != nil {
Expect(err).ToNot(HaveOccurred(), "Failed to create awsmachinetemplate")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
if err := cl.Create(ctx, awsMachineTemplate); err != nil {
Expect(err).ToNot(HaveOccurred(), "Failed to create awsmachinetemplate")
}
Expect(cl.Create(ctx, awsMachineTemplate)).To(Succeed(), "Failed to create awsmachinetemplate")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thank you!

@huali9 huali9 force-pushed the awscapi branch 4 times, most recently from 4e20177 to d5e1c87 Compare December 2, 2024 03:40
Copy link
Contributor

openshift-ci bot commented Dec 2, 2024

@huali9: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-capi-techpreview 49e57a2 link true /test e2e-azure-capi-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants