Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for sycl_ext_oneapi_composite_device #1192

Merged
merged 15 commits into from
Feb 7, 2024

Conversation

maarquitos14
Copy link
Contributor

Initial implementation to support sycl_ext_oneapi_composite_device specified in intel/llvm#11846.

include/ur_api.h Outdated Show resolved Hide resolved
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2023

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (c63ad9b) 15.80% compared to head (7564829) 15.79%.

Files Patch % Lines
include/ur_print.hpp 0.00% 26 Missing ⚠️
tools/urinfo/urinfo.hpp 0.00% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1192      +/-   ##
==========================================
- Coverage   15.80%   15.79%   -0.02%     
==========================================
  Files         223      223              
  Lines       31483    31513      +30     
  Branches     3558     3561       +3     
==========================================
+ Hits         4975     4976       +1     
- Misses      26457    26486      +29     
  Partials       51       51              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@GeorgeWeb GeorgeWeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIP and CUDA adapter changes look good to me.

maarquitos14 and others added 3 commits December 19, 2023 01:10
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
@maarquitos14
Copy link
Contributor Author

@pbalcer @aarongreig I'm unable to reproduce the CI errors in my local machine. Any tips on how to reproduce?

if (!isCombinedMode) {
ze_device_handle_t RootDev = nullptr;
// Query Root Device
ZE2UR_CALL(zeDeviceGetRootDevice, (D->ZeDevice, &RootDev));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new entry point? I'm wondering if the GitHub Actions failures could be due to an older driver being installed or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbenzie It is a new entry point, but as far as I see, the CI uses level zero v1.15.1 which includes this entry. What I'm seeing is that zeDeviceGetRootDevice returns ZE_RESULT_ERROR_UNSUPPORTED_FEATURE under certain circumstances (https://github.com/oneapi-src/level-zero/blob/1685d01497428ca4d8b99200972b64685424d5c9/source/lib/ze_libapi.cpp#L463). The CI is running into this, and that makes ZE2UR_CALL fail in UR_CHECK. I'm not sure how to deal with this, honestly. As of now, I think only PVC should care about this, but I don't think we should have code specific to PVC in UR. Otherwise, we can use ZE_CALL_NOCHECK rather than ZE2UR_CALL and check manually that the error code is either success or unsupported feature. What do you think? Any other suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I just checked locally using ZE_CALL_NOCHECK and all the tests pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the result of ZE_CALL_NOCHECK manually sounds acceptable if there's a safe fallback, I think the adapter does that in other places where stuff might not be supported. @oneapi-src/unified-runtime-level-zero-write should be consulted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 15d2953

@kbenzie
Copy link
Contributor

kbenzie commented Jan 9, 2024

@pbalcer @aarongreig I'm unable to reproduce the CI errors in my local machine. Any tips on how to reproduce?

I believe we test L0 on a machine with an A750.

@maarquitos14
Copy link
Contributor Author

@pbalcer @aarongreig I'm unable to reproduce the CI errors in my local machine. Any tips on how to reproduce?

I believe we test L0 on a machine with an A750.

Thanks @kbenzie. @pbalcer gave me access to a machine with such hardware. I'm trying to reproduce the issue in that machine now.

@maarquitos14
Copy link
Contributor Author

@oneapi-src/unified-runtime-level-zero-write @oneapi-src/unified-runtime-native-cpu-write can I have a review please?

source/adapters/level_zero/device.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/device.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/device.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/device.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@PietroGhg PietroGhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native CPU lgtm, sorry for the late review

Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one minor nit.

source/adapters/level_zero/device.cpp Outdated Show resolved Hide resolved
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Copy link
Contributor

@nrspruit nrspruit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for level zero

@maarquitos14
Copy link
Contributor Author

I think this is ready to merge @oneapi-src/unified-runtime-maintain.

@aarongreig aarongreig added the ready to merge Added to PR's which are ready to merge label Jan 11, 2024
@aarongreig
Copy link
Contributor

could you update https://github.com/intel/llvm/pull/12178/files#diff-1bf2ed05c0065126d4e2150f0eda4dbd561e5c3625452e96356aadafa1312e6fR64 to point at https://github.com/maarquitos14/unified-runtime/tree/maronas/ext_composite_device so the changes here get tested @maarquitos14 (assuming that PR is going to need to pull the changes in this PR in)

@maarquitos14
Copy link
Contributor Author

could you update https://github.com/intel/llvm/pull/12178/files#diff-1bf2ed05c0065126d4e2150f0eda4dbd561e5c3625452e96356aadafa1312e6fR64 to point at https://github.com/maarquitos14/unified-runtime/tree/maronas/ext_composite_device so the changes here get tested @maarquitos14 (assuming that PR is going to need to pull the changes in this PR in)

Done in intel/llvm@c559036.

@aarongreig
Copy link
Contributor

thanks, you'll need to make sure the CI on intel/llvm side is green as well before we can merge this

@maarquitos14
Copy link
Contributor Author

thanks, you'll need to make sure the CI on intel/llvm side is green as well before we can merge this

CI on intel/llvm is finally all green. How should we proceed now?

@aarongreig
Copy link
Contributor

The ready to merge label represents a queue, when we get to this PR we'll just merge it (assuming no conflicts or whatever come up between now and then) and then ping you on the llvm PR to update the UR tag. You should try to get approvals on the LLVM PR so it doesn't hold up subsequent merges when this PR goes in

@aarongreig aarongreig merged commit b76d907 into oneapi-src:main Feb 7, 2024
51 checks passed
steffenlarsen pushed a commit to intel/llvm that referenced this pull request Feb 12, 2024
Initial implementation to support `sycl_ext_oneapi_composite_device`
specified in #11846.

Depends on oneapi-src/unified-runtime#1192.

---------

Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Signed-off-by: Marcos Maronas <marcos.maronas@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants