Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Fusion][Doc] Document reductions support #12641

Merged
merged 2 commits into from
Feb 7, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion sycl/doc/design/KernelFusionJIT.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,12 +313,44 @@ computing the private memory size. As range rounding only applies to basic
kernels (parametrized by a `sycl::range`), local internalization is not affected
by the range rounding transformation.

### Reductions

Kernel fusion of reductions is partially supported. In order to preserve the
legality of the fused kernel, i.e., the fact that fused kernel must perform the
same work as the graph of kernels to be fused, only the fusion of following
reduction strategies at the time of writing is supported:
victor-eds marked this conversation as resolved.
Show resolved Hide resolved

- `group_reduce_and_last_wg_detection`
- `local_atomic_and_atomic_cross_wg`
- `range_basic`
- `group_reduce_and_atomic_cross_wg`
- `local_mem_tree_and_atomic_cross_wg`

Other strategies require implicit inter-work-group synchronization, not
supported in kernel fusion.

This way, users should not use `sycl::reduction` directly when performing kernel
victor-eds marked this conversation as resolved.
Show resolved Hide resolved
fusion in their code, as an unsupported algorithm might be chosen. They should
instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion
strategy. Reductions implementation in
[`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an
insight into which kind of reductions to use for their purposes:

```c++
q.submit([&](sycl::handler &cgh) {
sycl::accessor in(dataBuf, cgh, sycl::read_only);
sycl::reduction sum(sumBuf, cgh, sycl::plus<>{});
// Force supported 'group_reduce_and_last_wg_detection' strategy
sycl::detail::reduction_parallel_for<sycl::detail::auto_name,
sycl::detail::strategy::group_reduce_and_last_wg_detection>(...);
});
```

### Unsupported SYCL constructs

The following SYCL API constructs are currently not officially supported for
kernel fusion and should be considered untested/unsupported:

- Reductions
- `sycl::stream`
- Specialization constants and `sycl::kernel_handler`
- Images (`sycl::unsampled_image` and `sycl::sampled_image`)
Loading