[SYCL][Graph] Update spec supported features #338

EwanC · 2023-10-30T08:14:14Z

The following features are defined in the specification as unsupported, but have working implementations upstream.

mfrancepillois · 2023-10-30T09:44:24Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

+the Explicit API. Empty nodes can be used instead of barriers when a user is
+building a graph with the explicit API.


Currently if users using the Explicit API want to implement a barrier that waits for all previous nodes to complete, they have to create an empty node that explicitly depends on all previous nodes. Does it make sense to change the definition of empty nodes such that if users add an empty node with no dependencies, this node will automatically take all previous nodes as dependencies (general barrier)? Or, if we want to keep a empty node without dependency (not sure why should be necessary?) to propose a wildcard (shortcut) to add all previous nodes as dependencies?

That idea makes sense to me 👍 keeping the existing empty node with no dependency semantics doesn't really add any value to the user, but making it depend on previous leaf nodes corresponds to barrier semantics to is useful and better aligns with the message of an alternative to the barrier extension.

I feel like having that automatic creation of dependencies on the explicit API is a little bit counter to the existing behaviour where the explicit API doesn't do things like this in the background. This perhaps makes it a bit unintuitive.

I tend to agree with @Bensuo here. However, we might want to add a barrier shortcut if this is frequently used in applications.

EwanC · 2023-10-31T09:07:29Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

 . Using reductions in a graph node.
 . Using sycl streams in a graph node.
-. Using a kernel bundle in a graph node.
 . Profiling an event returned from graph submission with
 `event::get_profiling_info()`.


I should also add a bullet point here about passing the no immediate command-list property on queue creation to workaround current issue with immediate command-lists.

This commit removes the typed pointer support from the LaunchFunc's lowering to Vukan dialect. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502

Recent versions of GNU binutils starting from 2.39 support symbol+offset lookup in addition to the usual numeric address lookup. This change adds symbol lookup to llvm-symbolize and llvm-addr2line. Now llvm-symbolize behaves closer to GNU addr2line, - if the value specified as address in command line or input stream is not a number, it is treated as a symbol name. For example: llvm-symbolize --obj=abc.so func_22 llvm-symbolize --obj=abc.so "CODE func_22" This lookup is now supported only for functions. Specification with offset is not supported yet. This is a recommit of 2b27948, reverted in 39fec54 because the test llvm/test/Support/interrupts.test started failing on Windows. The test was changed in 18f036d and is also updated in this commit. Differential Revision: https://reviews.llvm.org/D149759

…iku (#70434) Same as 12b87f6 and the addition to Gnu.

This patch moves `RecordDecl::ArgPassingKind` to DeclBase.h to namespace scope, so that it's complete at the time bit-field is declared.

This commit removes the support for lowering GPU to ROCDL dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502

This patch adds missing dependencies required by the new unittest introduced by #68406.

This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`

This adds a writable attribute, which in conjunction with dereferenceable(N) states that a spurious store of N bytes is introduced on function entry. This implies that this many bytes are writable without trapping or introducing data races. See https://llvm.org/docs/Atomics.html#optimization-outside-atomic for why the second point is important. This attribute can be added to sret arguments. I believe Rust will also be able to use it for by-value (moved) arguments. Rust likely won't be able to use it for &mut arguments (tree borrows does not appear to allow spurious stores). In this patch the new attribute is only used by LICM scalar promotion. However, the actual motivation for this is to fix a correctness issue in call slot optimization, which needs this attribute to avoid optimization regressions. Followup to the discussion on D157499. Differential Revision: https://reviews.llvm.org/D158081

…or is not declared in the base class Fixes #70464 When ctor is not declared in the base class, initializing the base class with the initializer list will not trigger a proper assignment of the base region, as a CXXConstructExpr doing that is not available in the AST. This patch checks whether the init expr is an InitListExpr under a base initializer, and adds a binding if so.

…9945)" This reverts commit 5bfd89b. It was causing build failures on ffmpeg on i686.

Progressive support of fastmath flag in the conversion of log type ops. See more detail https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981

Adds additional test coverage for Issue #68466

This is intended as the replacement for ConstantExpr::getIntegerCast(), which does not require availability of the corresponding constant expressions. It just forwards to ConstantFoldCastOperand with the correct opcode.

This always works on a constant integer or integer splat, so the constant fold here should always succeed.

Use ConstantFoldIntegerCast() instead, to remove the reliance on constant expressions.

…rted to other translation unit. (#68774) Fixes: #68769 Co-authored-by: miaozhiyuan <miaozhiyuan@feysh.com>

…SS/PACKUS truncateVectorWithPACK handling of sub-128-bit result types was improved some time ago, so remove the old 64-bit limit Fixes #68466

This patch moves `ObjCMethodDecl::ImplementationControl` to a DeclBase.h so that it's complete at the point where corresponsing bit-field is declared. This patch also converts it to a scoped enum `clang::ObjCImplementationControl`.

SME2 is documented as part of the main SME supplement: https://developer.arm.com/documentation/ddi0616/latest/ The one change for debug is this new ZT0 register. This register contains data to be used with new table lookup instructions. It's size is always 512 bits (not scalable) and can be interpreted in many different ways depending on the instructions that use it. The kernel has implemented this as a new register set containing this single register. It always returns register data (with no header, unlike ZA which does have a header). https://docs.kernel.org/arch/arm64/sme.html ZT0 is only active when ZA is active (when SVCR.ZA is 1). In the inactive state the kernel returns 0s for its contents. Therefore lldb doesn't need to create 0s like it does for ZA. However, we will skip restoring the value of ZT0 if we know that ZA is inactive. As writing to an inactive ZT0 sets SVCR.ZA to 1, which is not desireable as it would activate ZA also. Whether SVCR.ZA is set will be determined only by the ZA data we restore. Due to this, I've added a new save/restore kind SME2. This is easier than accounting for the variable length ZA in the SME data. We'll only save an SME2 data block if ZA is active. If it's not we can get fresh 0s back from the kernel for ZT0 anyway so there's nothing for us to restore. This new register will only show up if the system has SME2 therefore the SME set presented to the user may change, and I've had to account for that in in a few places. I've referred to it internally as simply "ZT" as the kernel does in NT_ARM_ZT, but the architecture refers to the specific register as "ZT0" so that's what you'll see in lldb. ``` (lldb) register read -s 6 Scalable Matrix Extension Registers: svcr = 0x0000000000000000 svg = 0x0000000000000004 za = {0x00 <...> 0x00} zt0 = {0x00 <...> 0x00} ```

Use IRBuilder or ConstantFolding instead.

Looks like there's code out there that, instead of using '__attribute__((constructor(x)))' to add constructor functions, they just declare a global function pointer and use '__attribute__((section('.ctors')))' instead. Problem is, with memtag-globals, we pad the global function pointer to be 16 bytes large. This of course means we have an 8-byte real function pointer, then 8 bytes of zero padding, and this trips up the loader when it processes this section. Fixes #69939

Small fix to strip down additional CPU information from AMD query, like: `gfx90a:sramecc+:xnack-` to just `gfx90a`

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

…intel#11793) QA asked us to do this. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…d joint_matrix_mad (intel#11738) This patch adds two new properties `joint_matrix` and `joint_matrix_mad` to device requirements in sycl-post-link. SYCL RT reads these properties and throws exception if objects of `joint_matrix` type or `joint_matrix_mad` functions are not supported by the current device. "Unsupported" means matrix type and sizes provided by user are not compatible with the list of all supported matrix types and sizes from the runtime query in `get_info<...matrix_combinations>`.

Combines the following L0 changes: * oneapi-src/unified-runtime#1033 * oneapi-src/unified-runtime#1028 * oneapi-src/unified-runtime#1022

Since bool vectors have a backing storage of chars, the unary minus operator was in fact producing `(char)-1`, and thus not adhering to the ABI where bools should be either 0 or 1. This could manifest itself in bugs, for instance where the "bool" elements wouldn't compare equal to other bools, such as those in arrays. This only manifested itself in device code (possibly because the array of bools is also an array of bytes under the hood), hence the test case that differs in style somewhat from the rest.

Old version is temporary kept in SPV_INTEL_joint_matrix_legacy.asciidoc --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

Since vectors of bools are backed by byte-sized storage, we must ensure that conversion results from other vector types are correctly brought into the expected range of bools, e.g., `(char)0` or `(char)1`.

…#11809) Previously, we committed intel#9642 aiming to break dependency on metadata order for check_has.cpp but the commit was broken unexpectedly. This PR re-lands it. --------- Signed-off-by: jinge90 <ge.jin@intel.com>

- Enables specialization constants handling in SYCL-Graph extension. - Adds E2E tests that verify this behavior. - Removes unittests tests that checked for unsupported feature exception throwing. --------- Co-authored-by: Maxime France-Pillois <maxime.francepillois@codeplay.com>

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

…el#11665) This patch also implements support for 8- and 16-bit data types in slm_block_load(). --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

intel#11832) Cuda specifies the launch bounds as: ``` __launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor, maxBlocksPerCluster) ``` making it impossible to specify `maxBlocksPerCluster` without the preceding two attributes (similarly for `minBlocksPerMultiprocessor`), issue warnings and ignore attributes if the condition is not met.

…1815) Fixes generating filters like this one: `DeviceName:{{gfx90a:sramecc\+:xnack\-}},DriverVersion:{{HIP 50422.80}}`

When using -O0 to disable optimizations, also set -fsycl-disable-range-rounding to further disable optimizations to improve debugability.

Unused variable - clean it up.

This started causing a hang on a newer GPU driver with O0, we have an internal tracker for this. Force O2 for now. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

… op (intel#11847) Even though std::max is completely legal/correct to use in SYCL/ESIMD especially in 'constexpr' context, it may cause problems on Windows in some non-trivial configurations with some odd order of includes of system and SYCL header files. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

intel#11850) The element-size address alignment is valid from correctness point of view, but using 1-byte and 2-byte alignment implicitly causes performance regression for block_load(const int8_t *, ...) and block_load(const int16_t *, ...) because GPU BE have to generate slower GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load causes up to 44% performance slow-down on some apps that used block_load() with alignment assumptions used before block_load(usm, ..., compile_time_props) was implemented. The reasoning for the expected/assumed alignment from element-size to 4-bytes for byte- and word-vectors is such: The idea of block_load() call (opposing to gather() call) is to have efficient block-load, and thus the assumed alignment is such that allows to generate block-load. This is a bit more tricky for user but that is how block_load/store API always worked before: block-load had restrictions that needed to be honored. To be on safer side, user can always pass the guaranteed alignment. --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

The following features defined in the specification as unsupported, have working implementations upstream. * intel#11418 * intel#11505 * intel#11556

EwanC added the Graph Specification Extension Specification related label Oct 30, 2023

EwanC requested review from reble, Bensuo, julianmi and mfrancepillois October 30, 2023 08:14

mfrancepillois reviewed Oct 30, 2023

View reviewed changes

EwanC commented Oct 31, 2023

View reviewed changes

Dinistro and others added 23 commits November 1, 2023 08:40

[Driver] Silence stdlib warning when linking C on *BSD / Solaris / Ha…

760658c

…iku (#70434) Same as 12b87f6 and the addition to Gnu.

[clang][NFC] Refactor ArgPassingKind

b120fe8

This patch moves `RecordDecl::ArgPassingKind` to DeclBase.h to namespace scope, so that it's complete at the time bit-field is declared.

[LLVM-C] Fix linking failure introduced by 3351097.

dfcb890

This patch adds missing dependencies required by the new unittest introduced by #68406.

[NFC][Transform] Cleanup magic constant usage

18669b1

[clang][NFC] Refactor OMPDeclareReductionDecl::InitKind

50dec54

This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`

Revert "VectorUtils: mark lrint, llrint as trivially vectorizable (#6…

ac7c816

…9945)" This reverts commit 5bfd89b. It was causing build failures on ffmpeg on i686.

[mlir][complex] Support Fastmath flag for complex log ops (#69798)

8aaa2cb

Progressive support of fastmath flag in the conversion of log type ops. See more detail https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981

[X86] Add fpclamptosat to vXi8 test coverage

dc5e6e4

Adds additional test coverage for Issue #68466

[X86] fpclamptosat_vec.ll - add AVX2/AVX512 test coverage

432e114

[ConstantFolding] Add ConstantFoldIntegerCast helper

d9f36c4

This is intended as the replacement for ConstantExpr::getIntegerCast(), which does not require availability of the corresponding constant expressions. It just forwards to ConstantFoldCastOperand with the correct opcode.

[InstSimplify] Avoid ConstantExpr::getIntegerCast() (NFCI)

e918127

This always works on a constant integer or integer splat, so the constant fold here should always succeed.

[ValueTracking] Avoid ConstantExpr::getIntegerCast()

d47e2ff

Use ConstantFoldIntegerCast() instead, to remove the reliance on constant expressions.

[clang][ASTImporter] Fix crash when template class static member impo…

39dfaf0

…rted to other translation unit. (#68774) Fixes: #68769 Co-authored-by: miaozhiyuan <miaozhiyuan@feysh.com>

[X86] combineTruncateWithSat - relax minimum truncation size for PACK…

f471f6f

…SS/PACKUS truncateVectorWithPACK handling of sub-128-bit result types was improved some time ago, so remove the old 64-bit limit Fixes #68466

[InstCombine] Avoid some uses of ConstantExpr::getIntegerCast() (NFC)

0b5e0fb

Use IRBuilder or ConstantFolding instead.

konradkusiak97 and others added 27 commits November 7, 2023 07:53

[SYCL][CUDA][HIP] Fix device architecture extension query (intel#11630)

3436a5a

Small fix to strip down additional CPU information from AMD query, like: `gfx90a:sramecc+:xnack-` to just `gfx90a`

[SYCL][ESIMD][E2E] Remove emulator support from E2E tests (intel#11780)

3698759

[SYCL][ESIMD][Doc] Add an example of using BFN API (intel#11781)

0e25d40

[UR] Update UR to 612a263613b235b0257fcaf2f128fc61b06e0c24 (intel#11694)

8cb3343

[SYCL][NFC] Fix test check for code location file name (intel#11803)

75cf8d8

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

[SYCL] Fix multi_ptr conversion operator (intel#11807)

31a0aee

[SYCL][ESIMD][E2E] Add ze_debug as unsupported for slm_init_no_inline (…

8ea8566

…intel#11793) QA asked us to do this. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[UR] Bump to ec7982bac6cb3a6b9ed610cd6b7cb41fcbc780dc (intel#11811)

5cdc096

Combines the following L0 changes: * oneapi-src/unified-runtime#1033 * oneapi-src/unified-runtime#1028 * oneapi-src/unified-runtime#1022

[SPIR-V][DOC] Update SPV_INTEL_joint_matrix (intel#11764)

c7f3736

Old version is temporary kept in SPV_INTEL_joint_matrix_legacy.asciidoc --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

[SYCL] Fix post-commit after intel#11738 (intel#11825)

84df5b9

[SYCL] Fix convert to sycl::vec<bool, N> (intel#11822)

9dfaf27

Since vectors of bools are backed by byte-sized storage, we must ensure that conversion results from other vector types are correctly brought into the expected range of bools, e.g., `(char)0` or `(char)1`.

[SYCL][E2E] Fix itt stubs linking for E2E test (intel#11814)

022b634

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

[ESIMD] Implement unified memory API - part 3 - slm_block_load() (int…

4b48702

…el#11665) This patch also implements support for 8- and 16-bit data types in slm_block_load(). --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

[SYCL][Test E2E] Add 2 extra characters to escaping function (intel#1…

07b15fe

…1815) Fixes generating filters like this one: `DeviceName:{{gfx90a:sramecc\+:xnack\-}},DriverVersion:{{HIP 50422.80}}`

[Driver][SYCL] Disable parallel for range rounding at -O0 (intel#11799)

2c117d7

When using -O0 to disable optimizations, also set -fsycl-disable-range-rounding to further disable optimizations to improve debugability.

[NFC] Warning cleanup (intel#11831)

516d2a2

Unused variable - clean it up.

[SYCL][Matrix tests][E2E] remove xfail as SPR test passes (intel#11839)

edd58d9

[SYCL] Add E2E for prefetch & improve interface (intel#11834)

b5d69df

[SYCL][ESIMD][E2E] Require O2 for InlineAsm global test (intel#11841)

73d4d25

This started causing a hang on a newer GPU driver with O0, we have an internal tracker for this. Force O2 for now. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[SYCL][Graph] Update spec supported features

b402905

The following features defined in the specification as unsupported, have working implementations upstream. * intel#11418 * intel#11505 * intel#11556

EwanC force-pushed the ewan/update_supported_features branch from 0278898 to b402905 Compare November 10, 2023 08:40

EwanC closed this Nov 10, 2023

EwanC mentioned this pull request Nov 13, 2023

[SYCL][Graph] Update spec supported features #341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Update spec supported features #338

[SYCL][Graph] Update spec supported features #338

EwanC commented Oct 30, 2023

mfrancepillois Oct 30, 2023

EwanC Oct 30, 2023

Bensuo Oct 30, 2023

julianmi Oct 30, 2023

EwanC Oct 31, 2023

		the Explicit API. Empty nodes can be used instead of barriers when a user is
		building a graph with the explicit API.

[SYCL][Graph] Update spec supported features #338

[SYCL][Graph] Update spec supported features #338

Conversation

EwanC commented Oct 30, 2023

mfrancepillois Oct 30, 2023

Choose a reason for hiding this comment

EwanC Oct 30, 2023

Choose a reason for hiding this comment

Bensuo Oct 30, 2023

Choose a reason for hiding this comment

julianmi Oct 30, 2023

Choose a reason for hiding this comment

EwanC Oct 31, 2023

Choose a reason for hiding this comment