[CUDA] Update hint functions to only return warnings instead of errors #989

fabiomestre · 2023-10-23T17:42:45Z

The UR spec was recently changed to guarantee that hint entryponts never return errors. This commit changes the CUDA adapter to be conformant with this change.
This commit also changes the type of PointerRangeSize which was causing a stack corruption.

JackAKirk · 2023-11-07T14:41:34Z

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027
is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient by keeping e.g.

setErrorMessage("Prefetch hint ignored as prefetch only works with USM",
                    UR_RESULT_SUCCESS);
    return UR_RESULT_ERROR_ADAPTER_SPECIFIC;

instead of replacing it with the is_supported check.

GeorgeWeb · 2023-11-07T16:22:18Z

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027 is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient...

Correct, the point is that UR_RESULT_ERROR_ADAPTER_SPECIFIC was never going to lead to an error here because the runtime will get into handleErrorOrWarning which calls checkPiResult to do the error handling. There, UR_RESULT_ERROR_ADAPTER_SPECIFIC translates to PI_ERROR_PLUGIN_SPECIFIC_ERROR, which is checked exclusively because this error code allows to be treated as a warning if the last error code (the one passed to setErrorMessage in our case) is UR_RESULT_SUCCESS(translated to PI_SUCCESS). Hence, all of the changed logic around the attribute queries and setErrorMessage on unsupported was fine and conformant to the UR spec before.

Now, the thing that is not conforming with the definition in the UR spec wrt advise hints is the throw UR_RESULT_ERROR_INVALID_ENUMERATION; line in the setCuMemAdvise helper function, which should instead return this result code and set a warning message via setErrorMessage with errc UR_RESULT_SUCCESS similar to the above referenced cases. So, I think this one should be addressed instead.

GeorgeWeb · 2023-11-07T16:27:33Z

source/adapters/cuda/enqueue.cpp

+ CU_MEM_ADVISE_UNSET_ACCESSED_BY,
+ hQueue->getContext()->getDevice()->get()));
+ } else {
+ Result = setCuMemAdvise((CUdeviceptr)pMem, size, advice,


This function (setCuMemAdvise) can throw if an unsupported advice flag is passed, but it should not as suggested by the UR spec. Hence, we should have it return the result code within the function instead of throwing an error, and then handle it here by checking the value of Result and likely setting a warning with SUCCESS - that is if we want to verify and warn on unsupported advice flags, otherwise we can omit checking if the flags are supported altogether if that's more sensible to you but having a warning on the user-end is always nice.

I have updated the PR to return UR_RESULT_ERROR_ADAPTER_SPECIFIC in this case

fabiomestre · 2023-11-08T16:26:19Z

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027 is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient...

Correct, the point is that UR_RESULT_ERROR_ADAPTER_SPECIFIC was never going to lead to an error here because the runtime will get into handleErrorOrWarning which calls checkPiResult to do the error handling. There, UR_RESULT_ERROR_ADAPTER_SPECIFIC translates to PI_ERROR_PLUGIN_SPECIFIC_ERROR, which is checked exclusively because this error code allows to be treated as a warning if the last error code (the one passed to setErrorMessage in our case) is UR_RESULT_SUCCESS(translated to PI_SUCCESS). Hence, all of the changed logic around the attribute queries and setErrorMessage on unsupported was fine and conformant to the UR spec before.

Now, the thing that is not conforming with the definition in the UR spec wrt advise hints is the throw UR_RESULT_ERROR_INVALID_ENUMERATION; line in the setCuMemAdvise helper function, which should instead return this result code and set a warning message via setErrorMessage with errc UR_RESULT_SUCCESS similar to the above referenced cases. So, I think this one should be addressed instead.

From the point of view of the current implementation, I think that makes sense. I wasn't aware that SYCL RT is treating UR_RESULT_ERROR_ADAPTER_SPECIFIC as a warning if it returns UR_RESULT_SUCCESS. It is quite a misleading error code and the UR spec is not clear about this.

I will just confirm with the team that this is the behaviour that we expect from UR going forward. If it is, I'm happy to change this PR as you suggested.

P.S. Sorry, I somehow edited your reply by mistake

- The UR spec was recently changed to make hint entryponts always return UR_RESULT_SUCCESS. This commit changes the CUDA adapter to be conformant with this change. - This commit also changes the type of PointerRangeSize which was causing a stack corruption.

GeorgeWeb · 2023-11-09T14:43:33Z

LGTM!

One last thing - I am not too verse with the merging policies in this repo, but it may be worth rewording the PR name (and description) now that it is addressing a different thing (replacing error throwing with setting a warning in the memadvise API).
May also be useful to make the commits more clear by interactive rebasing to squash and reword, before merging the PR.

fabiomestre · 2023-11-09T14:57:48Z

LGTM!

One last thing - I am not too verse with the merging policies in this repo, but it may be worth rewording the PR name (and description) now that it is addressing a different thing (replacing error throwing with setting a warning in the memadvise API). May also be useful to make the commits more clear by interactive rebasing to squash and reword, before merging the PR.

Thanks Georgi. I updated the title. I would expect that this will be merged in separate PR. So, will write a better commit message when I create it.

fabiomestre marked this pull request as ready for review October 23, 2023 17:43

fabiomestre requested a review from a team as a code owner October 23, 2023 17:43

JackAKirk approved these changes Oct 24, 2023

View reviewed changes

fabiomestre added the conformance Conformance test suite issues. label Nov 2, 2023

JackAKirk mentioned this pull request Nov 7, 2023

[SYCL][HIP] Implement mem_advise for HIP #1027

Merged

GeorgeWeb requested changes Nov 7, 2023

View reviewed changes

fabiomestre added 2 commits November 8, 2023 17:42

Address review comments

e1902fc

fabiomestre force-pushed the fabio/cuda_update_hint_ep branch from 9c77328 to e1902fc Compare November 9, 2023 14:29

Remove extra spaces

5b2c2dd

fabiomestre changed the title ~~[CUDA] Change hint functions to return UR_SUCCESS~~ [CUDA] Update hint functions to only return warnings instead of errors Nov 9, 2023

This was referenced Nov 10, 2023

[CUDA] Combined CTX Fixes #1065

Closed

[CUDA][HIP] Combined CTS Fixes #1077

Merged

fabiomestre closed this Nov 14, 2023

fabiomestre mentioned this pull request Nov 15, 2023

[UR] Bump tag to 534071e52f84bad1dd7fb210a360414507f3b3ae intel/llvm#11880

Merged

fabiomestre mentioned this pull request Nov 27, 2023

[SPEC] Clarify the behaviour of UR warning messages #1055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Update hint functions to only return warnings instead of errors #989

[CUDA] Update hint functions to only return warnings instead of errors #989

fabiomestre commented Oct 23, 2023 •

edited

Loading

JackAKirk commented Nov 7, 2023

GeorgeWeb commented Nov 7, 2023 •

edited by fabiomestre

Loading

GeorgeWeb Nov 7, 2023

fabiomestre Nov 9, 2023

fabiomestre commented Nov 8, 2023

GeorgeWeb commented Nov 9, 2023 •

edited

Loading

fabiomestre commented Nov 9, 2023

[CUDA] Update hint functions to only return warnings instead of errors #989

[CUDA] Update hint functions to only return warnings instead of errors #989

Conversation

fabiomestre commented Oct 23, 2023 • edited Loading

JackAKirk commented Nov 7, 2023

GeorgeWeb commented Nov 7, 2023 • edited by fabiomestre Loading

GeorgeWeb Nov 7, 2023

Choose a reason for hiding this comment

fabiomestre Nov 9, 2023

Choose a reason for hiding this comment

fabiomestre commented Nov 8, 2023

GeorgeWeb commented Nov 9, 2023 • edited Loading

fabiomestre commented Nov 9, 2023

fabiomestre commented Oct 23, 2023 •

edited

Loading

GeorgeWeb commented Nov 7, 2023 •

edited by fabiomestre

Loading

GeorgeWeb commented Nov 9, 2023 •

edited

Loading