Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STOR-1819: Add OpenShift specific CSI certification test #28967

Merged

Conversation

jsafrane
Copy link
Contributor

@jsafrane jsafrane commented Aug 1, 2024

Add a new OpenShift-specific test to openshift/csi suite that test a larger number of volumes used in a pod in sequence (or in manageable chunks). There was a 3rd party CSI driver that overflew a kernel counter after 256th attached volume.

This PR introduces OpenShift specific configuration manifest that configures OCP specific tests. There is just one test today, more can come in the future.

Usage:

cat <<EOF >ocp-specific-manifest.yaml
Driver: disk.csi.azure.com
LUNStressTest:
  Timeout: "40m"
  PodsTotal: 260
EOF

export TEST_OCP_CSI_DRIVER_FILES=ocp-specific-manifest.yaml
export TEST_CSI_DRIVER_FILES=upstream-test-manifest.yaml
openshift-tests run openshift/csi 

The test itself selects a random schedulable node, and creates configured nr. of Pods in parallel (260 by default). The test expects that the CSI driver reports correct attachment limit and the Kubernetes scheduler respects it.
We know the scheduler can put more volumes to a node than the CSI driver can handle, however, we expect CSI drivers to handle it (and e.g. return errors).

@jsafrane jsafrane changed the title Add OpenShift specific CSI certification test STOR-1819: Add OpenShift specific CSI certification test Aug 1, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 1, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 1, 2024

@jsafrane: This pull request references STOR-1819 which is a valid jira issue.

In response to this:

Add a new OpenShift-specific test to openshift/csi suite that test a larger number of volumes used in a pod in sequence (or in manageable chunks). There was a 3rd party CSI driver that overflew a kernel counter after 256th attached volume.

This PR introduces OpenShift specific configuration manifest that configures OCP specific tests. There is just one test today, more can come in the future.

Usage:

cat <<EOF >ocp-specific-manifest.yaml
Driver: disk.csi.azure.com
LUNStressTest:
 Timeout: "40m"
 PodsTotal: 260
EOF

export TEST_OCP_DRIVER_FILES=ocp-specific-manifest.yaml
export TEST_CSI_DRIVER_FILES=upstream-test-manifest.yaml
openshift-tests run openshift/csi 

The test itself selects a random schedulable node, and creates configured nr. of Pods in parallel (260 by default). The test expects that the CSI driver reports correct attachment limit and the Kubernetes scheduler respects it.
We know the scheduler can put more volumes to a node than the CSI driver can handle, however, we expect CSI drivers to handle it (and e.g. return errors).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gnufied
Copy link
Member

gnufied commented Aug 1, 2024

I was running into:

error: failed to load OCP manifest from "./ocp-specific-manifest.yaml": ./ocp-specific-manifest.yaml: json: cannot unmarshal string into Go struct field LUNStressTestConfig.LUNStressTest.Timeout of type time.Du
ration

But it did start once I removed Timeout field.

@jsafrane
Copy link
Contributor Author

jsafrane commented Aug 2, 2024

I fixed parsing of the timeout, apparently it must be a string and not a Duration.

jsafrane added a commit to jsafrane/release that referenced this pull request Aug 5, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 5, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 5, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 5, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 5, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
pkg/clioptions/clusterdiscovery/csi.go Outdated Show resolved Hide resolved
pkg/clioptions/clusterdiscovery/csi.go Show resolved Hide resolved
test/extended/storage/csi/csi.go Outdated Show resolved Hide resolved
test/extended/storage/csi/csi.go Outdated Show resolved Hide resolved
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 7, 2024

@jsafrane: This pull request references STOR-1819 which is a valid jira issue.

In response to this:

Add a new OpenShift-specific test to openshift/csi suite that test a larger number of volumes used in a pod in sequence (or in manageable chunks). There was a 3rd party CSI driver that overflew a kernel counter after 256th attached volume.

This PR introduces OpenShift specific configuration manifest that configures OCP specific tests. There is just one test today, more can come in the future.

Usage:

cat <<EOF >ocp-specific-manifest.yaml
Driver: disk.csi.azure.com
LUNStressTest:
 Timeout: "40m"
 PodsTotal: 260
EOF

export TEST_OCP_CSI_DRIVER_FILES=ocp-specific-manifest.yaml
export TEST_CSI_DRIVER_FILES=upstream-test-manifest.yaml
openshift-tests run openshift/csi 

The test itself selects a random schedulable node, and creates configured nr. of Pods in parallel (260 by default). The test expects that the CSI driver reports correct attachment limit and the Kubernetes scheduler respects it.
We know the scheduler can put more volumes to a node than the CSI driver can handle, however, we expect CSI drivers to handle it (and e.g. return errors).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

We use testsuite "openshift/csi" to run test against 3rd party CSI drivers
as a way to certify them in OpenShift. So far we used upstream tests there.

Add OCP specific tests to the suite, based on a new OCP specific YAML
manifest provided by the CSI driver vendor.

Usage:
TEST_OCP_CSI_DRIVER_FILES=azure-disk-ocp.yaml TEST_CSI_DRIVER_FILES=azure-disk-manifest.yaml ./openshift-tests run openshift/csi
Add OpenShift specific test that creates > 256 PVs + Pods on a single node in
a large batch.

This makes sure that a CSI driver can support larger-ish number of separate
volumes per node. There was a case that a CSI driver created too high LUN
numbers (256) that was not supported by the Linux kernel.

All pods are created at the same time, expecting the CSI driver reports a
correct attach limit and the Kubernetes scheduler respects it.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 7, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 7, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 7, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
jsafrane added a commit to jsafrane/release that referenced this pull request Aug 7, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the periodic jobs have the long test enabled.
Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2024
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Contributor

openshift-ci bot commented Aug 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto, jsafrane, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 49b0c04 and 2 for PR HEAD 58e496c in total

@jsafrane
Copy link
Contributor Author

jsafrane commented Aug 7, 2024

/label px-approved
/label docs-approved
/label qe-approved
This will be tested in CI in openshift/release#55204.

@openshift-ci openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels Aug 7, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 7, 2024

@jsafrane: This pull request references STOR-1819 which is a valid jira issue.

In response to this:

Add a new OpenShift-specific test to openshift/csi suite that test a larger number of volumes used in a pod in sequence (or in manageable chunks). There was a 3rd party CSI driver that overflew a kernel counter after 256th attached volume.

This PR introduces OpenShift specific configuration manifest that configures OCP specific tests. There is just one test today, more can come in the future.

Usage:

cat <<EOF >ocp-specific-manifest.yaml
Driver: disk.csi.azure.com
LUNStressTest:
 Timeout: "40m"
 PodsTotal: 260
EOF

export TEST_OCP_CSI_DRIVER_FILES=ocp-specific-manifest.yaml
export TEST_CSI_DRIVER_FILES=upstream-test-manifest.yaml
openshift-tests run openshift/csi 

The test itself selects a random schedulable node, and creates configured nr. of Pods in parallel (260 by default). The test expects that the CSI driver reports correct attachment limit and the Kubernetes scheduler respects it.
We know the scheduler can put more volumes to a node than the CSI driver can handle, however, we expect CSI drivers to handle it (and e.g. return errors).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jsafrane
Copy link
Contributor Author

jsafrane commented Aug 7, 2024

/test e2e-aws-ovn-fips

1 similar comment
@jsafrane
Copy link
Contributor Author

jsafrane commented Aug 8, 2024

/test e2e-aws-ovn-fips

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 56c3413 and 1 for PR HEAD 58e496c in total

Copy link
Contributor

openshift-ci bot commented Aug 8, 2024

@jsafrane: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-rt-upgrade 58e496c link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-openstack-ovn 58e496c link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node 58e496c link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-ipsec-serial 58e496c link false /test e2e-aws-ovn-ipsec-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jsafrane
Copy link
Contributor Author

jsafrane commented Aug 8, 2024

/test e2e-aws-ovn-edge-zones

@openshift-merge-bot openshift-merge-bot bot merged commit 55f41c0 into openshift:master Aug 8, 2024
20 of 24 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.18.0-202408081846.p0.g55f41c0.assembly.stream.el9.
All builds following this will include this PR.

jsafrane added a commit to jsafrane/release that referenced this pull request Aug 9, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the 4.18 AWS EBS CSI periodic jobs have the long test enabled.
openshift-merge-bot bot pushed a commit to openshift/release that referenced this pull request Aug 12, 2024
With openshift/origin#28967, openshift-tests
accepts additional env. var. TEST_OCP_CSI_DRIVER_FILES that points to a file
with OCP specific CSI certification tests.

There is just one test for now, a test that creates 260 volumes + Pods on a
single node and waits for them to complete (with a very long timeout).
This test can be expensive, so don't run it unless explicitly enabled by
ENABLE_LONG_CSI_CERTIFICATION_TESTS=true

In the future, there may be more OCP specific test enabled by
TEST_OCP_CSI_DRIVER_FILES manifest and I'd like to keep the ability to skip
just the long ones somehow.

Only the 4.18 AWS EBS CSI periodic jobs have the long test enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants