-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ONEAPI_DEVICE_SELECTOR in UR #740
Implement ONEAPI_DEVICE_SELECTOR in UR #740
Conversation
With the introduction of config in #681 we have several functions in the API that are loader-only and are excluded from the ddi. Should this be the same? If so, it could also be a good idea to create a separate loader-specific |
@pbalcer what are the criteria for loader-only? ONEAPI_DEVICE_SELECTOR is general to all the devices exposed by UR and applies to all backends -- so it seems incorrect to limit it so that sometimes it takes effect (UR with loader) and other times it does not (UR without loader). Does that make sense? |
Using adapters standalone is not a supported usage model. I think (@kbenzie ?). If it is, it's currently broken (e.g., the loader config symbols would be missing, adapter libraries should only export the My understanding is that we want to implement this functionality at the loader level precisely so that it applies uniformly to all downstream adapters. If we treat the |
That is correct.
This matches my expectations. |
Is that something I could/should do as part of this PR or does that add another dependency? For now, in this PR, should I just insert "loader" into the API names? |
No, if everyone agrees that this makes sense, I can do that refactoring in a separate PR. |
Done in #770 |
999fcbb
to
8907ddb
Compare
093dca6
to
22578d2
Compare
///< pNumDevices will be updated with the total number of selected devices | ||
///< available for the given platform. | ||
) { | ||
ur_result_t result = UR_RESULT_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going on here? Is this just a hangover from the previous non-loader version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is just a sanity-check whether the header file compiles and is valid C. We should remove it.
source/loader/ur_lib.cpp
Outdated
std::regex validation_pattern("^(" | ||
"\\*" // C++ escape for \, regex escape for literal '*' | ||
"|" | ||
"cpu" // ensure case-insenitive, when using | ||
"|" | ||
"gpu" // ensure case-insenitive, when using | ||
"|" | ||
"fpga" // ensure case-insenitive, when using | ||
"|" | ||
"[[:digit:]]+" // '<num>' | ||
"|" | ||
"[[:digit:]]+\\.[[:digit:]]+" // '<num>.<num>' | ||
"|" | ||
"[[:digit:]]+\\.\\*" // '<num>.*.*' | ||
"|" | ||
"\\*\\.\\*" // C++ and regex escapes, literal '*.*' | ||
"|" | ||
"[[:digit:]]+\\.[[:digit:]]+\\.[[:digit:]]+" // '<num>.<num>.<num>' | ||
"|" | ||
"[[:digit:]]+\\.[[:digit:]]+\\.\\*" // '<num>.<num>.*' | ||
"|" | ||
"[[:digit:]]+\\.\\*\\.\\*" // '<num>.*.*' | ||
"|" | ||
"\\*\\.\\*\\.\\*" // C++ and regex escapes, literal '*.*.*' | ||
")$", std::regex_constants::icase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That (don't use std::regex
) is good to know, thanks.
Currently, that regex pattern is only referenced in this comment:
line 491 // TODO -- use regex validation_pattern to catch all other syntax errors in the ODS string
Without this catch-all, some syntax errors will make it into the code that assumes there are no syntax errors, which is not great.
I would not expect that this code would ever be on the critical path of a performance critical loop. I would not expect users to query the list of devices (directly or indirectly) in a performance critical situation. Do you anticipate user code that does this?
To mitigate such user code (if it is a design goal), the parsing of the env var (including creating and using the regex) could be done only once and the outcome could be cached in the loader. The current code does not implement that, but it could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, that regex pattern is only referenced in this comment:
line 491 // TODO -- use regex validation_pattern to catch all other syntax errors in the ODS string
Is the regex only intended to be used during validation or as part of the parsing of the ODS string?
To mitigate such user code (if it is a design goal), the parsing of the env var (including creating and using the regex) could be done only once and the outcome could be cached in the loader. The current code does not implement that, but it could.
I think this could be acceptable.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #740 +/- ##
==========================================
- Coverage 15.33% 15.12% -0.22%
==========================================
Files 241 242 +1
Lines 34820 35306 +486
Branches 3989 4044 +55
==========================================
- Hits 5340 5339 -1
- Misses 29429 29916 +487
Partials 51 51 ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,249 @@ | |||
// Copyright (C) 2022-2023 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (C) 2022-2023 Intel Corporation | |
// Copyright (C) 2024 Intel Corporation |
Should we document the expected behaviour of the environment variable in the UR specification? I have concerns that someone trying to implement an adapter for UR wouldn't be able to find all the information needed to implement urDeviceGetSelected() from the UR spec alone. Also have some concerns about the naming. I think that UR should strive to be agnostic regarding the upstream runtimes. Adding support for a variable with "ONEAPI" in the name goes against that in my opinion. I think we could easily map ONEAPI_DEVICE_SELECTOR to a UR specific variable while keeping the same syntax for simplicity. However, I know there have been some discussions about the naming of ONEAPI_DEVICE_SELECTOR before, so there might be a good reason not to do this. |
This has come up a few times, the naming is not going to change since its a product stack naming choice and applies to all of oneAPI, not just UR. Also note that this repo exists in the oneapi-src GitHub org, so its not out of place here. |
No adapter implements this API -- it is loader only. The implementation in loader uses urDeviceGet from the adapter(s), then collates and selects the information that it will pass on to the client. |
I thought which occurred to me recently @Wee-Free-Scot. When we merge this, how would it interact with the implementation of Would it not matter because the SYCL RT isn't using the UR loader yet? |
There are at least two "wouldn't matter" arguments:
I would expect an incremental approach to implementation usage in SYCL RT:
|
Makes sense. Do we have regression tests for 2. intermediate/3. target at the moment? |
Some of the UR tests that I created in this PR do a regression test of sorts: they use urDeviceGet and urDeviceGetSelected with ODS set such that I should/shouldn't get the same output. It's not complete coverage, but it hits some expected use-cases. In terms of regression testing for SYCL RT, we'd need to involve the appropriate people when we know who will do the implementation of future implementations (2) and (3). |
Given my team will be responsible for replacing PI with UR in the SYCL RT, that could be us. |
I'm testing this in my system which has both an NVIDIA and an AMD GPU. The topology looks like this:
I've modified the
And when set to
And set to
What I think is happening here is these GPU's are provided by different adapters but the currently implementation of |
Only looking for devices matching both the input parameters and the ODS env var is the intent of the code. |
Okay, I made this change: benie@bench git diff source/loader/ur_lib.cpp
diff --git a/source/loader/ur_lib.cpp b/source/loader/ur_lib.cpp
index 17489efc..64a0f3bb 100644
--- a/source/loader/ur_lib.cpp
+++ b/source/loader/ur_lib.cpp
@@ -564,7 +564,7 @@ ur_result_t urDeviceGetSelected(ur_platform_handle_t hPlatform,
if (acceptDeviceList.size() == 0 && discardDeviceList.size() == 0) {
// nothing in env var was understood as a valid term
- return UR_RESULT_ERROR_INVALID_VALUE;
+ return UR_RESULT_SUCCESS;
} else if (acceptDeviceList.size() == 0) {
// no accept terms were understood, but at least one discard term was
// we are magnanimous to the user when there were bad/ignored accept terms And am now getting these results:
Which is looking much better. |
The ODS=cuda:gpu run looks good, but the ODS=hip:gpu run gives contradictory output "No devices found platform 0" followed immediately by details of the 1 device in that zero-length list! Without the debug messages, it looks fine, but how it gets the answer is probably significant. |
Is there threading involved here? Are the output lines only partially-ordered? Does it check platform not-0 before it checks platform 0 and then mislabel one of the two final output lines? |
They both print
There's no threading, its this function urinfo::app::enumerateDevices() but using |
4c07be8
to
60373fa
Compare
60373fa
to
085f991
Compare
Now that oneapi-src#740 is merged, actually use `urDeviceGetSelected` in the `urinfo` tool to mirror the behaviour of the `sycl-ls` tool.
Add urDeviceGetSelected and implement to return only those devices specified by the ONEAPI_DEVICE_SELECTOR (for the platform given as input).
Add urPlatformGetSelected and implement to return only those platforms specified by the ONEAPI_DEVICE_SELECTOR.
Intended to fix issue: #220