Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Exp][usm-p2p] Initial usm-p2p UR extension. #631

Merged
merged 16 commits into from
Jun 21, 2023

Conversation

JackAKirk
Copy link
Contributor

This PR adds new functions and enum to UR corresponding to the new PI functions introduced in this PR (which will be updated to take into account the UR migration): intel/llvm#8303

FYI note that currently "P2P" is being translated to "P2_P". I was thinking of getting around this by switching to "usm-ptop", but this is not ideal since I don't think I am allows the case variation: "PtoP"?

JackAKirk and others added 5 commits June 15, 2023 15:42
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
include/ur.py Outdated Show resolved Hide resolved
scripts/core/EXP-USM-P2P.rst Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
include/ur.py Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
JackAKirk and others added 6 commits June 19, 2023 10:53
Co-authored-by: Petr Vesely <22935437+veselypeta@users.noreply.github.com>
Co-authored-by: Petr Vesely <22935437+veselypeta@users.noreply.github.com>
Co-authored-by: Petr Vesely <22935437+veselypeta@users.noreply.github.com>
Co-authored-by: Petr Vesely <22935437+veselypeta@users.noreply.github.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

@pbalcer Thanks, it is looking good now.

@JackAKirk JackAKirk marked this pull request as ready for review June 20, 2023 08:29
scripts/core/EXP-USM-P2P.rst Outdated Show resolved Hide resolved
scripts/core/exp-usm-p2p.yml Outdated Show resolved Hide resolved
JackAKirk and others added 2 commits June 21, 2023 11:08
Correct subtitle format.

Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
@kbenzie
Copy link
Contributor

kbenzie commented Jun 21, 2023

@JackAKirk looks like you need to commit files after running the generate target.

Removed unrequired description of UR_RESULT_ERROR_ADAPTER_SPECIFIC.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

@JackAKirk looks like you need to commit files after running the generate target.

Just needed to add the indent changes but got distracted. Should be good now.

Copy link
Contributor

@veselypeta veselypeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@veselypeta
Copy link
Contributor

Let me know when you're happy to merge and I can hit the button (I'm assuming that you don't have the permissions) 👍

@JackAKirk
Copy link
Contributor Author

Let me know when you're happy to merge and I can hit the button (I'm assuming that you don't have the permissions) +1

Cheers, I just have a little query then I'm happy for this to be merged.

Currently "ur_api.cpp" has these generated functions returning UR_SUCCESS. By default we want e.g.

  std::ignore = command_device;
  std::ignore = peer_device;

  setErrorMessage("piextEnablePeerAccess not "
                  "implemented in hip backend",
                  PI_ERROR_PLUGIN_SPECIFIC_ERROR);
  return PI_ERROR_PLUGIN_SPECIFIC_ERROR;

So what is the procedure to do this. I guess in the intel/llvm PR I should define the function in each adapter. But I just wanted to check that it isn't the case that there will be a default definition that some adapters might revert to?
I think this isn't the case but I just want to check I shouldn't e.g. update a correct default definition. Thanks

@veselypeta
Copy link
Contributor

veselypeta commented Jun 21, 2023

If your question is about asking UR if the extension is supported. I believe we currently do this through the device query UR_DEVICE_INFO_EXTENSIONS which will return a char[] of implemented extensions. However, you could also just the entry-point and if it's not been implemented - UR will return UR_RESULT_ERROR_UNINITIALIZED.

This is something we should have documented so I will add this to the documentation.

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Jun 21, 2023

If your question is about asking UR if the extension is supported. I believe we currently do this through the device query UR_DEVICE_INFO_EXTENSIONS which will return a char[] of implemented extensions. However, you could also just the entry-point and if it's not been implemented - UR will return UR_RESULT_ERROR_UNINITIALIZED.

This is something we should have documented so I will add this to the documentation.

The point is that previously we could have a portable implementation of e.g. this function:
https://github.com/intel/llvm/pull/8303/files#diff-e5a5ec910bab410f31617da9dfb56ee9e435769e7d6a830e85094c07b80c5454R216

even when a particular backend did not yet implement P2P in the PI/UR: In such case the Adapter should return an error stating it isn't yet implemented. Note that the query functions had a default where the P2P_infos simply return false if the Enable/Disable functions are not yet implemented for that adapter. This behaviour was requested by Greg in this PR intel/llvm#8303.

So is this not still the case because the SYCL function will have to instead query if the extension is supported for each adapter?

@veselypeta
Copy link
Contributor

In such case the Adapter should return an error stating it isn't yet implemented.

I think you can implement this in the pi2ur layer. If the adapter doesn't support it, then you can set the PI error as you say.

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Jun 21, 2023

In such case the Adapter should return an error stating it isn't yet implemented.

I think you can implement this in the pi2ur layer. If the adapter doesn't support it, then you can set the PI error as you say.

OK cool, I'm just wondering what the long term solution is to such a situation when pi2ur is removed. But in any case I don't think anything is to be changed in this PR so happy for it to be merged now!
Thanks for the clarifications.

@veselypeta
Copy link
Contributor

I'm not sure long-term in terms of the SYCL-RT, but in UR I don't think we should be returning a UR_ADAPTER_SPECIFIC error if the adapter doesn't support the entry-point. I think you should have to query support before making the call.

@veselypeta veselypeta merged commit 65b8cb4 into oneapi-src:main Jun 21, 2023
@JackAKirk
Copy link
Contributor Author

I'm not sure long-term in terms of the SYCL-RT, but in UR I don't think we should be returning a UR_ADAPTER_SPECIFIC error if the adapter doesn't support the entry-point. I think you should have to query support before making the call.

So the advantage of returning UR_ADAPTER_SPECIFIC is that the sycl runtime looks for the PI equivalent of this error automatically via plugin::check: if it is returned then it passes the last message to sycl::exception. This allows the relevant information (that the function isn't yet implemented for that adapter) to be passed to the runtime automatically without any special backend specific checks.
I suppose equivalent behaviour could be implemented differently, but currently that is how it does it. Ideally any replacement could be done in such a way that it doesn't require the runtime to do anything other than plugin::check.

@veselypeta
Copy link
Contributor

Maybe this is a topic for discussion at the WG call @kbenzie ?

@JackAKirk
Copy link
Contributor Author

Maybe this is a topic for discussion at the WG call @kbenzie ?

The answer seems to be that atm people are using (see https://github.com/intel/llvm/pull/9992/files):

sycl::detail::ur::die("___ not implemented"

etc, which uses std::cerr. I think this is fine personally, although technically sycl spec/ extensions mandate functions should be throwing sycl::exception. I don't think it matters. I will just copy the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants