Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] [DOC] Group sorting algorithm design review #11974

Open
wants to merge 5 commits into
base: sycl
Choose a base branch
from

Conversation

andreyfe1
Copy link
Contributor

Replacing accidentally closed #3754.
All discussions are done within #3754. So, this PR just needs to be merged.

Signed-off-by: Fedorov, Andrey andrey.fedorov@intel.com

Signed-off-by: Fedorov, Andrey <andrey.fedorov@intel.com>
@andreyfe1 andreyfe1 requested a review from a team as a code owner November 22, 2023 10:10
@dm-vodopyanov
Copy link
Contributor

@andreyfe1 can you please fix this error?

Warning, treated as error:
/home/runner/work/llvm/llvm/repo/sycl/doc/design/GroupSort.md:document isn't included in any toctree

@andreyfe1 andreyfe1 requested a review from a team as a code owner November 22, 2023 18:28
@andreyfe1
Copy link
Contributor Author

Hi @intel/dpcpp-doc-reviewers,
Could you please give an approve?

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a handful of nits, otherwise it looks good.

sycl/doc/design/GroupSort.md Outdated Show resolved Hide resolved
sycl/doc/design/GroupSort.md Outdated Show resolved Hide resolved
sycl/doc/design/GroupSort.md Outdated Show resolved Hide resolved
sycl/doc/design/GroupSort.md Outdated Show resolved Hide resolved
sycl/doc/design/GroupSort.md Outdated Show resolved Hide resolved
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not ready to approve it, even though I'm only a codeowner of index.rst

All discussions are done within #3754. So, this PR just needs to be merged.

I do not see any approvals in that old PR. Tagging @gmlueck, @aelovikov-intel, @jinge90 to make a one more review pass and confirm that they agree with the proposed design


- Fallback implementation in case if backends don't have more optimized implementations yet.

- Level Zero extension for `memory_required` functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not directly call into Level Zero from SYCL RT, but instead we go through unified runtime. Therefore, if we need a new API to do queries to low-level runtimes, then UR should also be updated

long valueTypeSizeInBytes) const;
```

### Fallback SPIR-V library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected that device compiler implements those functions as part of some "extension" so that SYCL RT can query if there is native support for that functionality and link fallback libraries if there is not. See extension spec, which is more of a design doc.

What should be the name of this "library"/"extension"? Should there be several of them so we can only link-in those libraries which are actually used (in case they would be huge)?


void __devicelib_default_work_group_joint_sort_descending_<encoded_param_types>(T* first, uint n, byte* scratch);

// for fixed-size arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language extension spec does not mention any special handling for fixed-size arrays and therefore it is not clear to me where built-ins from this section are going to be used - can it be clarified?


T __devicelib_default_sub_group_private_sort_descending_<encoded_scalar_param_type>(T value);

// for key value sorting using the default algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comments: there is no mention of key-value sorting in the language spec and some implementation detail is clearly implied here, even though I don't understand which one - I think that it should be explicitly spelled out that high-level SYCL functions are built on top these low-level functions using the following mapping ...

Notes:
- `T`, `U` are from the following list `i8`, `i16`,
`i32`, `i64`, `u8`, `u16`, `u32`, `u64`, `f16`, `f32`, `f64`.
- `encoded_param_types` is `T` prepended with `p1` for global/private address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not U?

```

Notes:
- `T`, `U` are from the following list `i8`, `i16`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For functions which accept T* and U* which combinations of different types should be implemented in the fallback library? All 11*11, or only some sub-set?


Examples:
```cpp
void __devicelib_default_work_group_joint_sort_ascending_p1i32_u32_p3i8(int* first, uint n, byte* scratch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have p1/p3 in the "mangling" but actual operands are generic address space, is that expected?

Also, just an idea to discuss, we use the same device compiler on Lin/Win so the C++ mangling is stable. Can we use a normal C++ template with "extern template" to avoid manual mangling?

### Fallback SPIR-V library

If backend compilers can generate optimized implementations based on low-level instructions,
we need a function that they can take and optimize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bader, @AlexeySachkov, and I were just talking about the "__devicelib" functions recently. I think we want to stop using these as the "contract" between DPC++ and IGC. In fact, the IGC team has complained that there is no formal specification for these "__devicelib" functions.

If we need to rely on optimized support in IGC, we should instead define a SPIR-V extension, and we should write a formal specification as we do for other SPIR-V extensions. This provides a more precise contract between DPC++ and IGC, and it also provides a formal specification that other backend vendors could implement if a third party wanted to implement an OpenCL (or even Level Zero) backend to DPC++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck,
Does it relate to sorting functions only or to all functions in device lib like cmath, complex,...?

+@jinge90

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To all functions. The conversation that @bader, @AlexeySachkov, and I had earlier was about the existing usage in cmath, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm afraid it requires a lot of efforts to rewrite API for IGC, CPU backend, and other components. That's great that multiple teams have committed to make a lot of changes for their code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is IGC currently providing implementations for these "__devicelib" functions, or are we relying on the fallback implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was implemented in IGC and CPU backend both long time ago. They also have tests for such API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants