Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ONEAPI_DEVICE_SELECTOR in UR #740

Merged
merged 13 commits into from
Feb 22, 2024

Conversation

Wee-Free-Scot
Copy link
Contributor

@Wee-Free-Scot Wee-Free-Scot commented Jul 21, 2023

Add urDeviceGetSelected and implement to return only those devices specified by the ONEAPI_DEVICE_SELECTOR (for the platform given as input).

Add urPlatformGetSelected and implement to return only those platforms specified by the ONEAPI_DEVICE_SELECTOR.

Intended to fix issue: #220

@Wee-Free-Scot Wee-Free-Scot marked this pull request as draft July 21, 2023 16:56
@Wee-Free-Scot Wee-Free-Scot added the enhancement New feature or request label Jul 21, 2023
@pbalcer
Copy link
Contributor

pbalcer commented Jul 24, 2023

With the introduction of config in #681 we have several functions in the API that are loader-only and are excluded from the ddi. Should this be the same? If so, it could also be a good idea to create a separate loader-specific .yml spec file and improve the way the entry point it would contain are excluded from the ddi (right now the scripts look whether the name contains the word 'loader'...).

@Wee-Free-Scot
Copy link
Contributor Author

@pbalcer what are the criteria for loader-only?

ONEAPI_DEVICE_SELECTOR is general to all the devices exposed by UR and applies to all backends -- so it seems incorrect to limit it so that sometimes it takes effect (UR with loader) and other times it does not (UR without loader). Does that make sense?

@pbalcer
Copy link
Contributor

pbalcer commented Jul 31, 2023

Using adapters standalone is not a supported usage model. I think (@kbenzie ?). If it is, it's currently broken (e.g., the loader config symbols would be missing, adapter libraries should only export the DDI symbols).

My understanding is that we want to implement this functionality at the loader level precisely so that it applies uniformly to all downstream adapters. If we treat the urPlatformGetSelected and urDeviceGetSelected just like all other APIs, each adapter would have to look at the environment variable individually, and then we'd have to stitch that together at the loader layer - that seems more complicated.

@kbenzie
Copy link
Contributor

kbenzie commented Jul 31, 2023

Using adapters standalone is not a supported usage model. I think (@kbenzie ?).

That is correct.

My understanding is that we want to implement this functionality at the loader level precisely so that it applies uniformly to all downstream adapters. If we treat the urPlatformGetSelected and urDeviceGetSelected just like all other APIs, each adapter would have to look at the environment variable individually, and then we'd have to stitch that together at the loader layer - that seems more complicated.

This matches my expectations.

@Wee-Free-Scot
Copy link
Contributor Author

@pbalcer

create a separate loader-specific .yml spec file and improve the way the entry point it would contain are excluded from the ddi

Is that something I could/should do as part of this PR or does that add another dependency?

For now, in this PR, should I just insert "loader" into the API names?

@pbalcer
Copy link
Contributor

pbalcer commented Aug 2, 2023

@pbalcer

create a separate loader-specific .yml spec file and improve the way the entry point it would contain are excluded from the ddi

Is that something I could/should do as part of this PR or does that add another dependency?

For now, in this PR, should I just insert "loader" into the API names?

No, if everyone agrees that this makes sense, I can do that refactoring in a separate PR.

@pbalcer
Copy link
Contributor

pbalcer commented Aug 3, 2023

Done in #770

@Wee-Free-Scot Wee-Free-Scot force-pushed the Issue220_Add-support-for-ODS branch 2 times, most recently from 999fcbb to 8907ddb Compare August 3, 2023 17:35
///< pNumDevices will be updated with the total number of selected devices
///< available for the given platform.
) {
ur_result_t result = UR_RESULT_SUCCESS;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here? Is this just a hangover from the previous non-loader version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is just a sanity-check whether the header file compiles and is valid C. We should remove it.

CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines 336 to 360
std::regex validation_pattern("^("
"\\*" // C++ escape for \, regex escape for literal '*'
"|"
"cpu" // ensure case-insenitive, when using
"|"
"gpu" // ensure case-insenitive, when using
"|"
"fpga" // ensure case-insenitive, when using
"|"
"[[:digit:]]+" // '<num>'
"|"
"[[:digit:]]+\\.[[:digit:]]+" // '<num>.<num>'
"|"
"[[:digit:]]+\\.\\*" // '<num>.*.*'
"|"
"\\*\\.\\*" // C++ and regex escapes, literal '*.*'
"|"
"[[:digit:]]+\\.[[:digit:]]+\\.[[:digit:]]+" // '<num>.<num>.<num>'
"|"
"[[:digit:]]+\\.[[:digit:]]+\\.\\*" // '<num>.<num>.*'
"|"
"[[:digit:]]+\\.\\*\\.\\*" // '<num>.*.*'
"|"
"\\*\\.\\*\\.\\*" // C++ and regex escapes, literal '*.*.*'
")$", std::regex_constants::icase);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently removed use of std::regex from the OpenCL adapter in #1039 due to profiling with PIN showed it to produce incredibly innefficient code. That was only for the OpenCL adapter, this would be on the critical path which makes me a bit uncomfortable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That (don't use std::regex) is good to know, thanks.

Currently, that regex pattern is only referenced in this comment:
line 491 // TODO -- use regex validation_pattern to catch all other syntax errors in the ODS string

Without this catch-all, some syntax errors will make it into the code that assumes there are no syntax errors, which is not great.

I would not expect that this code would ever be on the critical path of a performance critical loop. I would not expect users to query the list of devices (directly or indirectly) in a performance critical situation. Do you anticipate user code that does this?

To mitigate such user code (if it is a design goal), the parsing of the env var (including creating and using the regex) could be done only once and the outcome could be cached in the loader. The current code does not implement that, but it could.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, that regex pattern is only referenced in this comment:
line 491 // TODO -- use regex validation_pattern to catch all other syntax errors in the ODS string

Is the regex only intended to be used during validation or as part of the parsing of the ODS string?

To mitigate such user code (if it is a design goal), the parsing of the env var (including creating and using the regex) could be done only once and the outcome could be cached in the loader. The current code does not implement that, but it could.

I think this could be acceptable.

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2023

Codecov Report

Attention: 486 lines in your changes are missing coverage. Please review.

Comparison is base (be849b2) 15.33% compared to head (60373fa) 15.12%.
Report is 4 commits behind head on main.

Files Patch % Lines
source/loader/ur_lib.cpp 0.00% 275 Missing ⚠️
test/conformance/device/urDeviceGetSelected.cpp 0.00% 175 Missing ⚠️
include/ur_print.hpp 0.00% 28 Missing ⚠️
source/loader/ur_libapi.cpp 0.00% 4 Missing ⚠️
source/loader/ur_print.cpp 0.00% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #740      +/-   ##
==========================================
- Coverage   15.33%   15.12%   -0.22%     
==========================================
  Files         241      242       +1     
  Lines       34820    35306     +486     
  Branches     3989     4044      +55     
==========================================
- Hits         5340     5339       -1     
- Misses      29429    29916     +487     
  Partials       51       51              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wee-Free-Scot Wee-Free-Scot marked this pull request as ready for review January 5, 2024 11:19
@Wee-Free-Scot Wee-Free-Scot requested a review from a team as a code owner January 5, 2024 11:19
@Wee-Free-Scot Wee-Free-Scot self-assigned this Jan 5, 2024
@Wee-Free-Scot Wee-Free-Scot requested review from kbenzie, jandres742 and pbalcer and removed request for jandres742 January 5, 2024 12:11
@@ -0,0 +1,249 @@
// Copyright (C) 2022-2023 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (C) 2022-2023 Intel Corporation
// Copyright (C) 2024 Intel Corporation

@fabiomestre
Copy link
Contributor

Should we document the expected behaviour of the environment variable in the UR specification? I have concerns that someone trying to implement an adapter for UR wouldn't be able to find all the information needed to implement urDeviceGetSelected() from the UR spec alone.

Also have some concerns about the naming. I think that UR should strive to be agnostic regarding the upstream runtimes. Adding support for a variable with "ONEAPI" in the name goes against that in my opinion. I think we could easily map ONEAPI_DEVICE_SELECTOR to a UR specific variable while keeping the same syntax for simplicity. However, I know there have been some discussions about the naming of ONEAPI_DEVICE_SELECTOR before, so there might be a good reason not to do this.

@kbenzie
Copy link
Contributor

kbenzie commented Jan 8, 2024

Also have some concerns about the naming. I think that UR should strive to be agnostic regarding the upstream runtimes. Adding support for a variable with "ONEAPI" in the name goes against that in my opinion. I think we could easily map ONEAPI_DEVICE_SELECTOR to a UR specific variable while keeping the same syntax for simplicity. However, I know there have been some discussions about the naming of ONEAPI_DEVICE_SELECTOR before, so there might be a good reason not to do this.

This has come up a few times, the naming is not going to change since its a product stack naming choice and applies to all of oneAPI, not just UR. Also note that this repo exists in the oneapi-src GitHub org, so its not out of place here.

@Wee-Free-Scot
Copy link
Contributor Author

I have concerns that someone trying to implement an adapter for UR wouldn't be able to find all the information needed to implement urDeviceGetSelected() from the UR spec alone.

No adapter implements this API -- it is loader only. The implementation in loader uses urDeviceGet from the adapter(s), then collates and selects the information that it will pass on to the client.

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

I thought which occurred to me recently @Wee-Free-Scot. When we merge this, how would it interact with the implementation of ONEAPI_DEVICE_SELECTOR which already exists in the SYCL RT?

Would it not matter because the SYCL RT isn't using the UR loader yet?

@Wee-Free-Scot
Copy link
Contributor Author

There are at least two "wouldn't matter" arguments:

  1. SYCL RT does not use URT yet, so the new API will never get called.
  2. URT provides both urDeviceGet (unadulterated list of devices passed through from the underlying adapter) and urDeviceGetSelected (modified list of devices applying ONEAPI_DEVICE_SELECTOR), so future versions of SYCL RT can chose when to call the new API.

I would expect an incremental approach to implementation usage in SYCL RT:

  1. current -- gets list of devices without URT, runs SYCL RT code to apply ODS
  2. intermediate -- gets list of devices from urDeviceGet, runs SYCL RT code to apply ODS, regression test vs, (1)
  3. target -- gets list of devices from urDeviceGetSelected, bypasses SYCL RT code to apply ODS, regression test vs, (2)

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

Makes sense. Do we have regression tests for 2. intermediate/3. target at the moment?

@Wee-Free-Scot
Copy link
Contributor Author

Some of the UR tests that I created in this PR do a regression test of sorts: they use urDeviceGet and urDeviceGetSelected with ODS set such that I should/shouldn't get the same output. It's not complete coverage, but it hits some expected use-cases.

In terms of regression testing for SYCL RT, we'd need to involve the appropriate people when we know who will do the implementation of future implementations (2) and (3).

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

In terms of regression testing for SYCL RT, we'd need to involve the appropriate people when we know who will do the implementation of future implementations (2) and (3).

Given my team will be responsible for replacing PI with UR in the SYCL RT, that could be us.

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

I'm testing this in my system which has both an NVIDIA and an AMD GPU. The topology looks like this:

.
└── adapters
    ├── 0 hip
    │   └── platforms
    │       └── 0
    │           └── devices
    │               └── 0 AMD Radeon RX 6700 XT
    └── 1 cuda
        └── platforms
            └── 0
                └── devices
                    └── 0 NVIDIA GeForce RTX 3060 Ti

I've modified the urinfo tool to use urDeviceGetSelected instead of urDeviceGet for getting quick results. This is without ONEAPI_DEVICE_SELECTOR set, both GPU's are listed as expected:

$ build-r/bin/urinfo
DEBUG: getenv_to_map parsed env var and failed to produce a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and failed to produce a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and failed to produce a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and failed to produce a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
[adapter(0,hip):platform(0):device(0,gpu)] AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 60032.83]
[adapter(1,cuda):platform(0):device(0,gpu)] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Ti 8.6 [CUDA 12.2]

And when set to cuda:gpu:

$ ONEAPI_DEVICE_SELECTOR=cuda:gpu build-r/bin/urinfo
ninja: Entering directory `build-r'
ninja: no work to do.
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
WARNING: ignoring term with irrelevant backend
error: urDeviceGetSelected(platform, UR_DEVICE_TYPE_ALL, 0, nullptr, &numDevices) failed: UR_RESULT_ERROR_INVALID_VALUE

And set to hip:gpu:

$ ONEAPI_DEVICE_SELECTOR=hip:gpu build-r/bin/urinfo
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
WARNING: ignoring term with irrelevant backend
error: urDeviceGetSelected(platform, UR_DEVICE_TYPE_ALL, 0, nullptr, &numDevices) failed: UR_RESULT_ERROR_INVALID_VALUE

What I think is happening here is these GPU's are provided by different adapters but the currently implementation of urDeviceGetSelected is only considering the devices within the platform it is given and thus only the devices in a single adapter.

@Wee-Free-Scot
Copy link
Contributor Author

Only looking for devices matching both the input parameters and the ODS env var is the intent of the code.
I was expecting the new API to be a drop-in replacement for the existing UR function, which takes a platform handle.
I was anticipating that the caller/client would already have code for loading each adapter and calling urDeviceGet using each of the platform handles in turn.
If platform==level_zero and ODS==cuda:gpu then the list of selected devices is supposed to be empty.
The error looks wrong -- it should return an empty list, i.e. numDevices:=0 and return value UR_SUCCESS.

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

The error looks wrong -- it should return an empty list, i.e. numDevices:=0 and return value UR_SUCCESS.

Okay, I made this change:

benie@bench git diff source/loader/ur_lib.cpp
diff --git a/source/loader/ur_lib.cpp b/source/loader/ur_lib.cpp
index 17489efc..64a0f3bb 100644
--- a/source/loader/ur_lib.cpp
+++ b/source/loader/ur_lib.cpp
@@ -564,7 +564,7 @@ ur_result_t urDeviceGetSelected(ur_platform_handle_t hPlatform,

     if (acceptDeviceList.size() == 0 && discardDeviceList.size() == 0) {
         // nothing in env var was understood as a valid term
-        return UR_RESULT_ERROR_INVALID_VALUE;
+        return UR_RESULT_SUCCESS;
     } else if (acceptDeviceList.size() == 0) {
         // no accept terms were understood, but at least one discard term was
         // we are magnanimous to the user when there were bad/ignored accept terms

And am now getting these results:

$ ONEAPI_DEVICE_SELECTOR=cuda:gpu build-run urinfo
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
WARNING: ignoring term with irrelevant backend
No devices found platform 0.
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
[adapter(1,cuda):platform(0):device(0,gpu)] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 Ti 8.6 [CUDA 12.2]
$ ONEAPI_DEVICE_SELECTOR=hip:gpu buildr-/bin/urinfo
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
DEBUG: size of acceptDeviceList = 1
DEBUG: size of discardDeviceList = 0
DEBUG: In ApplyFilter, if block case 1, matches = 1
DEBUG: getenv_to_map parsed env var and produced a map
DEBUG: termType isAcceptFilter
WARNING: ignoring term with irrelevant backend
No devices found platform 0.
[adapter(0,hip):platform(0):device(0,gpu)] AMD HIP BACKEND, AMD Radeon RX 6700 XT gfx1031 [HIP 60032.83]

Which is looking much better.

@Wee-Free-Scot
Copy link
Contributor Author

The ODS=cuda:gpu run looks good, but the ODS=hip:gpu run gives contradictory output "No devices found platform 0" followed immediately by details of the 1 device in that zero-length list! Without the debug messages, it looks fine, but how it gets the answer is probably significant.

@Wee-Free-Scot
Copy link
Contributor Author

Is there threading involved here? Are the output lines only partially-ordered? Does it check platform not-0 before it checks platform 0 and then mislabel one of the two final output lines?

@kbenzie
Copy link
Contributor

kbenzie commented Jan 17, 2024

The ODS=cuda:gpu run looks good, but the ODS=hip:gpu run gives contradictory output "No devices found platform 0" followed immediately by details of the 1 device in that zero-length list! Without the debug messages, it looks fine, but how it gets the answer is probably significant.

They both print No devices found platform 0. but at different points, that's coming from urinfo so would need to update what it prints.

Is there threading involved here? Are the output lines only partially-ordered? Does it check platform not-0 before it checks platform 0 and then mislabel one of the two final output lines?

There's no threading, its this function urinfo::app::enumerateDevices() but using urDevicesGetSelected instead of urDevicesGet.

@kbenzie kbenzie merged commit 4e69cc6 into oneapi-src:main Feb 22, 2024
52 checks passed
kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Feb 23, 2024
Now that oneapi-src#740 is merged, actually use `urDeviceGetSelected` in the
`urinfo` tool to mirror the behaviour of the `sycl-ls` tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants