Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix urDeviceGetGlobalTimestampTest and urContextCreateWithNativeHandle #837

Closed

Conversation

fabiomestre
Copy link
Contributor

No description provided.

@fabiomestre fabiomestre marked this pull request as ready for review August 31, 2023 12:29
ur_result_t result = urContextCreateWithNativeHandle(native_context, 1,
&device, &props, &ctx);

if (result == UR_RESULT_ERROR_INVALID_OPERATION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a valid error code for urContextCreateWithNativeHandle, if its not supported it should return UR_RESULT_ERROR_UNSUPPORTED_FEATURE.

@@ -102,7 +103,12 @@ TEST_F(urDeviceGetGlobalTimestampTest, SuccessSynchronizedTime) {
const uint64_t allowedDiff = static_cast<uint64_t>(
std::min(deviceTimeDiff, hostTimeDiff) * allowedTimerError);

ASSERT_LE(observedDiff, allowedDiff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not change the allowdDiff/allowedTimerError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem of changing allowDiff is that there is no amount of time that is guaranteed to work.

I did run the test locally many times and there is some pretty big outliers. I suspect that it might be due to context switches in the OS which can take an unpredictable amount of time.

So I could change allowDiff to some huge number and the test would still not be guaranteed to pass 100% of the time.

@@ -102,7 +103,12 @@ TEST_F(urDeviceGetGlobalTimestampTest, SuccessSynchronizedTime) {
const uint64_t allowedDiff = static_cast<uint64_t>(
std::min(deviceTimeDiff, hostTimeDiff) * allowedTimerError);

ASSERT_LE(observedDiff, allowedDiff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are seeing that this test is flaky also on L0 in the CI (hence #836)...

@fabiomestre fabiomestre marked this pull request as draft September 1, 2023 11:53
@@ -787,14 +787,15 @@ returns:
- "If the adapter has no underlying equivalent handle."
--- #--------------------------------------------------------------------------
type: function
desc: "Returns synchronized Host and Device global timestamps."
desc: "Returns a reasonably synchronized Host and Device global timestamps."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what "reasonably" means here? are we adding that because L0 or other adapter is returning synchronized timestamps with some jitter?

Copy link
Contributor Author

@fabiomestre fabiomestre Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small wording change aimed at managing expectations of the accuracy of this entrypoint in UR. I took the wording from the OpenCL spec. OpenCL is the only adapter which has native support for an equivalent entrypoint. And, even OpenCL does not provide guarantees on the accuracy of the synchronization (hence the "reasonably" wording). I think that the UR spec should be relaxed to at least be similar in scope to OpenCL. I don't think we will ever be able to enforce hard constraints on this entrypoint for CUDA / HIP (and probably L0 as well?).

This PR is marked as draft because I'm not sure what to do with the existing CTS testing. As it stands, OpenCL is probably the only adapter that can pass this reliably.

@fabiomestre fabiomestre closed this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants