Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Update hint functions to only return warnings instead of errors #989

Closed

Conversation

fabiomestre
Copy link
Contributor

@fabiomestre fabiomestre commented Oct 23, 2023

  • The UR spec was recently changed to guarantee that hint entryponts never return errors. This commit changes the CUDA adapter to be conformant with this change.
  • This commit also changes the type of PointerRangeSize which was causing a stack corruption.

@fabiomestre fabiomestre marked this pull request as ready for review October 23, 2023 17:43
@fabiomestre fabiomestre requested a review from a team as a code owner October 23, 2023 17:43
@fabiomestre fabiomestre added the conformance Conformance test suite issues. label Nov 2, 2023
@JackAKirk
Copy link
Contributor

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027
is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient by keeping e.g.

setErrorMessage("Prefetch hint ignored as prefetch only works with USM",
                    UR_RESULT_SUCCESS);
    return UR_RESULT_ERROR_ADAPTER_SPECIFIC;

instead of replacing it with the is_supported check.

@GeorgeWeb
Copy link
Contributor

GeorgeWeb commented Nov 7, 2023

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027 is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient...

Correct, the point is that UR_RESULT_ERROR_ADAPTER_SPECIFIC was never going to lead to an error here because the runtime will get into handleErrorOrWarning which calls checkPiResult to do the error handling. There, UR_RESULT_ERROR_ADAPTER_SPECIFIC translates to PI_ERROR_PLUGIN_SPECIFIC_ERROR, which is checked exclusively because this error code allows to be treated as a warning if the last error code (the one passed to setErrorMessage in our case) is UR_RESULT_SUCCESS(translated to PI_SUCCESS). Hence, all of the changed logic around the attribute queries and setErrorMessage on unsupported was fine and conformant to the UR spec before.

Now, the thing that is not conforming with the definition in the UR spec wrt advise hints is the throw UR_RESULT_ERROR_INVALID_ENUMERATION; line in the setCuMemAdvise helper function, which should instead return this result code and set a warning message via setErrorMessage with errc UR_RESULT_SUCCESS similar to the above referenced cases. So, I think this one should be addressed instead.

CU_MEM_ADVISE_UNSET_ACCESSED_BY,
hQueue->getContext()->getDevice()->get()));
} else {
Result = setCuMemAdvise((CUdeviceptr)pMem, size, advice,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (setCuMemAdvise) can throw if an unsupported advice flag is passed, but it should not as suggested by the UR spec. Hence, we should have it return the result code within the function instead of throwing an error, and then handle it here by checking the value of Result and likely setting a warning with SUCCESS - that is if we want to verify and warn on unsupported advice flags, otherwise we can omit checking if the flags are supported altogether if that's more sensible to you but having a warning on the user-end is always nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to return UR_RESULT_ERROR_ADAPTER_SPECIFIC in this case

@fabiomestre
Copy link
Contributor Author

Actually @fabiomestre after speaking to @GeorgeWeb I realize that what he has done here #1027 is compliant with the spec change, but actually can still give users a warning that advise hint is ignored. I think he will explain in a comment. This I think means that you can make this PR simpler, and the code more efficient...

Correct, the point is that UR_RESULT_ERROR_ADAPTER_SPECIFIC was never going to lead to an error here because the runtime will get into handleErrorOrWarning which calls checkPiResult to do the error handling. There, UR_RESULT_ERROR_ADAPTER_SPECIFIC translates to PI_ERROR_PLUGIN_SPECIFIC_ERROR, which is checked exclusively because this error code allows to be treated as a warning if the last error code (the one passed to setErrorMessage in our case) is UR_RESULT_SUCCESS(translated to PI_SUCCESS). Hence, all of the changed logic around the attribute queries and setErrorMessage on unsupported was fine and conformant to the UR spec before.

Now, the thing that is not conforming with the definition in the UR spec wrt advise hints is the throw UR_RESULT_ERROR_INVALID_ENUMERATION; line in the setCuMemAdvise helper function, which should instead return this result code and set a warning message via setErrorMessage with errc UR_RESULT_SUCCESS similar to the above referenced cases. So, I think this one should be addressed instead.

From the point of view of the current implementation, I think that makes sense. I wasn't aware that SYCL RT is treating UR_RESULT_ERROR_ADAPTER_SPECIFIC as a warning if it returns UR_RESULT_SUCCESS. It is quite a misleading error code and the UR spec is not clear about this.

I will just confirm with the team that this is the behaviour that we expect from UR going forward. If it is, I'm happy to change this PR as you suggested.

P.S. Sorry, I somehow edited your reply by mistake

- The UR spec was recently changed to make hint entryponts
always return UR_RESULT_SUCCESS. This commit changes the CUDA
adapter to be conformant with this change.
- This commit also changes the type of PointerRangeSize which
 was causing a stack corruption.
@GeorgeWeb
Copy link
Contributor

GeorgeWeb commented Nov 9, 2023

LGTM!

One last thing - I am not too verse with the merging policies in this repo, but it may be worth rewording the PR name (and description) now that it is addressing a different thing (replacing error throwing with setting a warning in the memadvise API).
May also be useful to make the commits more clear by interactive rebasing to squash and reword, before merging the PR.

@fabiomestre fabiomestre changed the title [CUDA] Change hint functions to return UR_SUCCESS [CUDA] Update hint functions to only return warnings instead of errors Nov 9, 2023
@fabiomestre
Copy link
Contributor Author

LGTM!

One last thing - I am not too verse with the merging policies in this repo, but it may be worth rewording the PR name (and description) now that it is addressing a different thing (replacing error throwing with setting a warning in the memadvise API). May also be useful to make the commits more clear by interactive rebasing to squash and reword, before merging the PR.

Thanks Georgi. I updated the title. I would expect that this will be merged in separate PR. So, will write a better commit message when I create it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conformance Conformance test suite issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants