-
Notifications
You must be signed in to change notification settings - Fork 100
SYCL port
maddyscientist edited this page Dec 15, 2021
·
2 revisions
Development is in the sycl branch https://github.com/lattice/quda/tree/feature/sycl
Changes from develop can be seen in the PR https://github.com/lattice/quda/pull/1168
Outstanding changes/issues
- BlockReduce calls simplified to make it easier to implement in SYCL
- reducer_t types added for reductions (reducer.h, transform_reduce.cuh)
- multi blas Args may not fit in max_kernel_arg_size (using max_constant_size instead)
- quda_target.h needs to be included from quda_internal.h
- block size in dslash_coarse kernel must evenly divide threads
FAST_COMPILE_REDUCE version of block_orthogonalize.cu and restrictor.cu can't go larger than max_block_size