-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA][HIP] Remove CUDA and HIP PI unit tests #12459
Conversation
These tests are not currently running and are covered in other test suites: * `test_primary_context.cpp` * Deprecated feature, covered in `test-e2e/Basic/context.cpp` * `test_commands.cpp` * Covered by UR CTS * `test_sampler_properties.cpp` * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/tree/main/test/conformance/sampler * `PlatformTest.cpp` * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/platform/urPlatformGetInfo.cpp * `test_device.cpp` * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/device/urDeviceGetInfo.cpp * `EnqueueMemTest.cpp` * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/enqueue/urEnqueueMemBufferFill.cpp * `test_mem_obj.cpp` * Moved to UR CTS * `test_contexts.cpp` * https://github.com/oneapi-src/unified-runtime/blob/main/test/adapters/cuda/context_tests.cpp * `test_kernels.cpp` * https://github.com/oneapi-src/unified-runtime/blob/main/test/adapters/cuda/kernel_tests.cpp * `test_base_objects.cpp` * Basic tests mostly covered in UR * `test_interop_get_native.cpp` * Mostly covered in UR tests and E2E tests
ping @intel/llvm-reviewers-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! Seems reasonable, given the UR replacements.
Post commit failures on Arc GPU:
|
I don't see how this patch could have caused these failures, could it be something on the machine or from a previous patch? This just removed old tests from a separate test suite, that were already disabled and not running. |
That wasn't a call to action. I'm just posting the failures (and encourage others in @intel/llvm-gatekeepers do the same) so that failures are searchable through Github interface (it can't look into the logs). That way, we can get some statistics on how flaky the test is, what configuration it could fail on, etc. Ultimately, once a search for a failing test provides several instances, we should be creating an issue/internal bug report and disabling the test until that is resolved. |
It sounds like something better be automated rather than requested from gatekeepers. |
While I agree that some automation would be helpful, it's still gatekeeper's responsibilities to explain every failure in the post-commit or request the PR author to do so. We cannot be simply ignoring all the flaky failures in our CI. |
How does this make failures searchable? Are you using some consistent language in your comment?
Do we really want to divert the attention of all users in the failure blame list even if it's obvious which commits are not responsible? Seems like pointless busywork for the author and the gatekeeper. If we have a buildbot commenting on every PR in the blamelist, then fine - since the machine can't make that distinction - but if a human is in the loop, then surely they should use their judgment; if the human is not allowed to use their judgment they should be replaced by a machine in order to not waste human time. |
Copy-paste the failing test and then search for it using github's repo search:
I usually tag people if I expect some action from them, so I don't think we divert attention that much. We do have lots of flaky tests though, and we need to do something about that. The first step there is to gather some statistics and that's the best we can do for now. If somebody is willing to write scripts to parse the logs, update some spreadsheet/database with those flaky fails, and maintain that process I'd be more than happy to switch to it, but nobody volunteered so far. Yes, it's gatekeepers' responsibility, because we can't place that burden on occasional contributors, but we have to have somebody looking into the issues. |
An alternative would be to create issues for each flaky test (or a number of related flaky tests) and post PRs for which the test failed in post-commit without being related to changes in the PR in that issue. That way, the information for a flaky test would be collected in a single location (the issue) and the PR author's attention would not be diverted. |
That would only work if you already know the issue number. In my experience, we couldn't even make people post a comment without searches, expecting them to find an issue first in unrealistic. |
These tests are not currently running and are covered in other test
suites:
test_primary_context.cpp
test-e2e/Basic/context.cpp
test_commands.cpp
test_sampler_properties.cpp
PlatformTest.cpp
test_device.cpp
EnqueueMemTest.cpp
test_mem_obj.cpp
test_contexts.cpp
test_kernels.cpp
test_base_objects.cpp
test_interop_get_native.cpp
After this both the CUDA and HIP directories could be removed. There are two PI tests remaining, one with regards to xpti handling of PI call arguments, and one regarding OpenCL interop ownership.