Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] [DOC] Group sorting algorithm design review #11974
base: sycl
Are you sure you want to change the base?
[SYCL] [DOC] Group sorting algorithm design review #11974
Changes from 2 commits
837ba01
92d59d9
25f022c
13ec21f
32a03cc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not directly call into Level Zero from SYCL RT, but instead we go through unified runtime. Therefore, if we need a new API to do queries to low-level runtimes, then UR should also be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expected that device compiler implements those functions as part of some "extension" so that SYCL RT can query if there is native support for that functionality and link fallback libraries if there is not. See extension spec, which is more of a design doc.
What should be the name of this "library"/"extension"? Should there be several of them so we can only link-in those libraries which are actually used (in case they would be huge)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bader, @AlexeySachkov, and I were just talking about the "__devicelib" functions recently. I think we want to stop using these as the "contract" between DPC++ and IGC. In fact, the IGC team has complained that there is no formal specification for these "__devicelib" functions.
If we need to rely on optimized support in IGC, we should instead define a SPIR-V extension, and we should write a formal specification as we do for other SPIR-V extensions. This provides a more precise contract between DPC++ and IGC, and it also provides a formal specification that other backend vendors could implement if a third party wanted to implement an OpenCL (or even Level Zero) backend to DPC++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck,
Does it relate to sorting functions only or to all functions in device lib like cmath, complex,...?
+@jinge90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To all functions. The conversation that @bader, @AlexeySachkov, and I had earlier was about the existing usage in cmath, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'm afraid it requires a lot of efforts to rewrite API for IGC, CPU backend, and other components. That's great that multiple teams have committed to make a lot of changes for their code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is IGC currently providing implementations for these "__devicelib" functions, or are we relying on the fallback implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was implemented in IGC and CPU backend both long time ago. They also have tests for such API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language extension spec does not mention any special handling for fixed-size arrays and therefore it is not clear to me where built-ins from this section are going to be used - can it be clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous comments: there is no mention of key-value sorting in the language spec and some implementation detail is clearly implied here, even though I don't understand which one - I think that it should be explicitly spelled out that high-level SYCL functions are built on top these low-level functions using the following mapping ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For functions which accept
T*
andU*
which combinations of different types should be implemented in the fallback library? All 11*11, or only some sub-set?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not
U
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have
p1
/p3
in the "mangling" but actual operands are generic address space, is that expected?Also, just an idea to discuss, we use the same device compiler on Lin/Win so the C++ mangling is stable. Can we use a normal C++ template with "extern template" to avoid manual mangling?