[SYCL][Graph] Node Profiling #353

mfrancepillois · 2024-01-25T16:02:35Z

This PR:

adds a new Sycl API to query the profiling information of a node ext_oneapi_get_profiling_info(ext::node Node).
Implements node profiling support.
Throws an exception if the profiling is queried for partitioned graphs (i.e. graph with an host-task)
Adds new pi entry points for node profiling.
Updates specs.
Adds tests.
Updates symbols.

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

sycl/include/sycl/event.hpp

sycl/source/detail/event_impl.hpp

sycl/source/detail/event_impl.cpp

sycl/source/detail/event_impl.hpp

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

sycl/source/detail/event_impl.hpp

sycl/source/detail/event_impl.cpp

sycl/test-e2e/Graph/graph_common.hpp

sycl/test-e2e/Graph/invalid_profiling.cpp

sycl/test-e2e/Graph/nodes_profiling_info.cpp

sycl/test-e2e/Graph/invalid_profiling.cpp

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

sycl/include/sycl/detail/pi.h

sycl/test-e2e/Graph/invalid_profiling.cpp

sycl/test-e2e/Graph/nodes_profiling_info.cpp

Bensuo

LGTM, just some minor comments

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

sycl/include/sycl/event.hpp

sycl/plugins/hip/pi_hip.cpp

…77231) This patch tries to simplify `X | Y` by replacing occurrences of `Y` in `X` with 0. Similarly, it tries to simplify `X & Y` by replacing occurrences of `Y` in `X` with -1. Alive2: https://alive2.llvm.org/ce/z/cNjDTR Note: As the current implementation is too conservative in the one-use checks, I cannot remove other existing hard-coded simplifications if they involves more than two instructions (e.g, `A & ~(A ^ B) --> A & B`). Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang| |--|--|--|--|--|--|--| |+0.01%|-0.00%|+0.00%|-0.02%|+0.01%|+0.02%|-0.01%| Fixes #76554.

Some time ago, I did a similar patch for local variables. Initializing global variables can fail as well: ```c++ constexpr int a = 1/0; static_assert(a == 0); ``` ... would succeed in the new interpreter, because we never saved the fact that `a` has not been successfully initialized.

The three commits from "[RFC] compiler-rt builtins cleanup and refactoring" rewrote lots of code in compiler-rt builtins. - 082b89b: [builtins] Reformat builtins with clang-format - 0ba22f5: [builtins] Use single line C++/C99 comment style - 84da0e1: [builtins] Use aliases for function redirects

After this change, all current compiler-rt:* labels on GitHub are covered.

* Print `ReturnLoc`, `ReturnVal`, and `ThisPointeeLoc` if applicable. * For entries in `LocToVal` that correspond to declarations, print the names of the declarations next to them. I've removed the FIXME because all relevant fields are now being dumped. I'm not sure we actually need the capability for the caller to specify which fields to dump, so I've simply deleted this part of the comment. Some examples of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/17d0978f-b86d-4555-8a61-d1f2021f8d59) ![image](https://github.com/llvm/llvm-project/assets/29098113/021dbb24-5fe2-4720-8a08-f48dcf4b88f8)

Make the wmma intrinsic type signatures to be canonical. We need a type signature as long as the type is not fixed. However, when an argument's type matches a previous argument's type, we do not need the signature for this argument. This patch fixes three general cases: 1. add missing signatures 2. remove signatures for matching arguments 3. reorer the signatures -- return type signature should always appear first

…0108) This re-applies 30155fc with a fix for clangd. ### Description clang don't evaluate the object argument of `static operator()` and `static operator[]` currently, for example: ```cpp #include <iostream> struct Foo { static int operator()(int x, int y) { std::cout << "Foo::operator()" << std::endl; return x + y; } static int operator[](int x, int y) { std::cout << "Foo::operator[]" << std::endl; return x + y; } }; Foo getFoo() { std::cout << "getFoo()" << std::endl; return {}; } int main() { std::cout << getFoo()(1, 2) << std::endl; std::cout << getFoo()[1, 2] << std::endl; } ``` `getFoo()` is expected to be called, but clang don't call it currently (17.0.6). This PR fixes this issue. Fixes #67976, reland #68485. ### Walkthrough - **clang/lib/Sema/SemaOverload.cpp** - **`Sema::CreateOverloadedArraySubscriptExpr` & `Sema::BuildCallToObjectOfClassType`** Previously clang generate `CallExpr` for static operators, ignoring the object argument. In this PR `CXXOperatorCallExpr` is generated for static operators instead, with the object argument as the first argument. - **`TryObjectArgumentInitialization`** `const` / `volatile` objects are allowed for static methods, so that we can call static operators on them. - **clang/lib/CodeGen/CGExpr.cpp** - **`CodeGenFunction::EmitCall`** CodeGen changes for `CXXOperatorCallExpr` with static operators: emit and ignore the object argument first, then emit the operator call. - **clang/lib/AST/ExprConstant.cpp** - **`‎ExprEvaluatorBase::handleCallExpr‎`** Evaluation of static operators in constexpr also need some small changes to work, so that the arguments won't be out of position. - **clang/lib/Sema/SemaChecking.cpp** - **`Sema::CheckFunctionCall`** Code for argument checking also need to be modify, or it will fail the test `clang/test/SemaCXX/overloaded-operator-decl.cpp`. - **clang-tools-extra/clangd/InlayHints.cpp** - **`InlayHintVisitor::VisitCallExpr`** Now that the `CXXOperatorCallExpr` for static operators also have object argument, we should also take care of this situation in clangd. ### Tests - **Added:** - **clang/test/AST/ast-dump-static-operators.cpp** Verify the AST generated for static operators. - **clang/test/SemaCXX/cxx2b-static-operator.cpp** Static operators should be able to be called on const / volatile objects. - **Modified:** - **clang/test/CodeGenCXX/cxx2b-static-call-operator.cpp** - **clang/test/CodeGenCXX/cxx2b-static-subscript-operator.cpp** Matching the new CodeGen. ### Documentation - **clang/docs/ReleaseNotes.rst** Update release notes. --------- Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com> Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

…79999) Here's an example of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/63cd509e-c2a7-4794-b758-ea73812ff09f)

This patch replaces the template trick with a constexpr function that is more readable. Once C++20 is available in our code base, we can remove the constexpr function in favor of std::bit_ceil.

Enable, test, and document the support for fusing rounded range kernels. This mostly worked already – we just have to query the original kernel's global size, and use that to compute the private memory size used for internalization. --------- Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>

This patch folds: ``` ((bitcast X to int) <s 0 ? -X : X) -> fabs(X) ((bitcast X to int) >s -1 ? X : -X) -> fabs(X) ((bitcast X to int) <s 0 ? X : -X) -> -fabs(X) ((bitcast X to int) >s -1 ? -X : X) -> -fabs(X) ``` Alive2: https://alive2.llvm.org/ce/z/rGepow

Updates the commit tag for the OCK.

…ts (intel#12529) This will make the two tests run in the presence of either CPU OR GPU and not requiring both to be present to run.

See llvm/llvm-project#79261 for details. It shows that clang-repl uses a different target triple with clang so that it may be problematic if the calng-repl reads the generated BMI from clang in a different target triple. While the underlying issue is not easy to fix, this patch tries to make this test green to not bother developers.

…l#12524) Show detailed error messages when users try to fuse kernels with incompatible ND-ranges, showing different errors for each different scenario. Also combine the validation and fusion logic to reduce the number of ND-ranges list traversals. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>

…80079) Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited. ### Code ``` define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) { %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer) %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64> %ret = add <vscale x 8 x i64> %sa, %y ret <vscale x 8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/xsoa5xPrd) ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v12, 0 vmerge.vvm v24, v12, v8, v0 vwadd.wv v8, v16, v24 ret ``` ### After this patch ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m4, tu, mu vwadd.wv v16, v16, v8, v0.t vmv8r.v v8, v16 ret ```

The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases. Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit. Also refactor the implementation a bit: * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`. * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments. * `produceSubViewErrorMsg` does not have to be templatized.

…ocOrder (#80015) Previously we called ignoreCSRForAllocationOrder on every alias of every CSR which was expensive on targets like AMDGPU which define a very large number of overlapping register tuples. On such targets it is simpler and faster to call ignoreCSRForAllocationOrder once for every physical register. Differential Revision: https://reviews.llvm.org/D146735

…2521) oneapi-src/unified-runtime#1070 and intel#11952 introduced a new variant of the `enableCUDATracing` function that takes a context pointer parameter, replacing the parameterless variant of that function. The older variant will be removed from UR once this PR is merged.

Improved joint_matrix layout test coverage. The test framework that the cuda backend tests use has been updated to support all possible `joint_matrix` gemm API combinations, including all matrix layouts. the gemm header is backend agnostic; hence all backends could use this test framework in the future. This test framework can also act as an example to show how to deal with different layout combinations when computing a general GEMM. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Split out from #78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm/llvm-project#79248

…v_pulldown

Original commit: KhronosGroup/SPIRV-LLVM-Translator@fd22f8e

Add support for load/store operations for a cooperative matrix such that original matrix shape is known and implementations are able to reason about how to deal with the out of bounds. CapabilityCooperativeMatrixCheckedInstructionsINTEL = 6192 CooperativeMatrixLoadCheckedINTEL = 6193 CooperativeMatrixStoreCheckedINTEL = 6194 Original commit: KhronosGroup/SPIRV-LLVM-Translator@b62cb55

The goal of the PR is to add API to SPIR-V LLVM Translator to query error message by an error code as discussed in intel#2298 A need and possible application is a way to generate human-readable error info by error codes returned by other SPIRV Translator API calls, including getSpirvReport(). Original commit: KhronosGroup/SPIRV-LLVM-Translator@afe1971

Map @llvm.frexp intrinsic to OpenCL Extended Instruction frexp builtin. The difference in signatures and return values is covered by extracting/combining values from and into composite type. LLVM IR: { float %fract, i32 %exp } @llvm.frexp.f32.i32(float %val) SPIR-V: { float %fract } ExtInst frexp (float %val, i32 %exp) Original commit: KhronosGroup/SPIRV-LLVM-Translator@e8b2018

…ntel#2288) The translator failed assertion with V->user_empty() during regularize function when shl i1 or lshr i1 result is used. E.g. %2 = shl i1 %0 %1 store %2, ptr addrspace(1) @G.1, align 1 Instruction shl i1 is converted to lshr i32 which arithmetic have the same behavior. Original commit: KhronosGroup/SPIRV-LLVM-Translator@239fbd4

…#2331) Add support for checked matrix construct instruction. Specification draft: https://github.com/intel/llvm/blob/2fa153ee852ea3d7d64df097f1f494cddacee90e/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc Original commit: KhronosGroup/SPIRV-LLVM-Translator@a1b1f49

Replace some deprecated 'startswith' and 'endswith' with 'starts_with' and 'ends_with' to clear some warnings when building SYCL compiler. --------- Signed-off-by: jinge90 <ge.jin@intel.com>

This reverts commit 3d4c6c7. Due to | * 6e6aa44 2024-01-31 Revert "[Clang][Sema] fix outline member function template with defau… (#80144) ekeane@nvidia.com

We currently support -O3 for Linux compilations, expand this to also be available on Windows. This also better aligns with our existing product offerings.

The compiler was crashing when the user requested fp-accuracy for the functions in a call of the form f1(f2(f3 ...), where f1, f2 and f3 were fpbuiltin but the innermost function didn't have an fpbuiltin. The current builtinID was used instead of getting the builtinID from the current function. that created a crash in the compiler. This patch fixes the issue and renames the function EmitFPBuiltinIndirectCall to MaybeEmitFPBuiltinofFD .

intel#12297) We want to change the signature of `piMemGetNativeHandle` for reasons explained here oneapi-src/unified-runtime#1199 Corresponding UR PR: oneapi-src/unified-runtime#1226 A previous PR added a new entry point intel#12199 but it was decided that it is better to modify the existing entry point

…:intel::math (intel#12571) Signed-off-by: jinge90 <ge.jin@intel.com>

LLVM: llvm/llvm-project@178719e SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@a1b1f49

Problems found by Gregory (thanks!): 1) There were some duplicated tests, remove those 2) We didn't test non-LSC mask on Gen12 3) We get an ambiguous call because we had an old function that didn't have VS, but the new functions have default VS=1, so we don't need the old one. 4) When we pass a simd_view for the vals, we got a template match failure. This is the same issue we hit in the compile-time tests where even if we have a simd_view overload the compiler can't infer N, so we need to provide T,N anyway, so add that in the tests. I tested this on Gen12. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…12579) Adding `EXCLUDE_FROM_ALL` to the `add_subdirectory` for the OneAPI Construction Kit, in order to to avoid building its components unless they are required by the SYCL toolchain.

The FPGA emulator is currently affected by the same issue as the CPU runtime. Signed-off-by: John Pennycook <john.pennycook@intel.com>

…ntel#12548) We have flakyness in nightly testing results. Having more variety would helpfully provide some insights on conditions when it happens. The task is only executed once a day, so extra resources needed shouldn't affect the load on the runners much.

oneapi-src/unified-runtime#1302

mfrancepillois · 2024-02-02T10:57:04Z

Upstream PR intel#12592

mfrancepillois added the Graph Implementation Related to DPC++ implementation and testing label Jan 25, 2024

mfrancepillois requested review from EwanC, reble, Bensuo and julianmi January 25, 2024 16:02

mfrancepillois mentioned this pull request Jan 25, 2024

[EXP][CMDBUF] Support for Node Profiling Bensuo/unified-runtime#5

Closed

EwanC reviewed Jan 25, 2024

View reviewed changes

EwanC reviewed Jan 26, 2024

View reviewed changes

EwanC approved these changes Jan 29, 2024

View reviewed changes

Bensuo reviewed Jan 29, 2024

View reviewed changes

kazutakahirata and others added 20 commits January 30, 2024 22:12

[AMDGPU] Use StringRef::consume_front (NFC)

292b508

[RISCV] Use disjoint flag in or_is_add.

8a98091

[NFC] Add compiler-rt:* to .github/new-prs-labeler.yml (#79872)

9594746

After this change, all current compiler-rt:* labels on GitHub are covered.

[clang][dataflow] In the CFG visualization, mark converged blocks. (#…

82324bc

…79999) Here's an example of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/63cd509e-c2a7-4794-b758-ea73812ff09f)

[ADT] Use a constexpr version of llvm::bit_ceil (NFC) (#79709)

b49b3dd

This patch replaces the template trick with a constexpr function that is more readable. Once C++20 is available in our code base, we can remove the constexpr function in favor of std::bit_ceil.

[SYCL] [NATIVECPU] Update OneAPI Construction Kit tag (intel#12543)

565490d

Updates the commit tag for the OCK.

[SYCL][Joint matrix tests] Fix test execution env setting for two tes…

6ec040e

…ts (intel#12529) This will make the two tests run in the presence of either CPU OR GPU and not requiring both to be present to run.

nikic and others added 28 commits February 1, 2024 12:58

[IndVars] Add tests for #79861 (NFC)

c105848

[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

178719e

Split out from #78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm/llvm-project#79248

Merge commit '178719e86043c1e830a7e24de027a84f6f8ea28f' into llvmspir…

4bdd647

…v_pulldown

Update LLVM version from 18 to 19 (intel#2315)

b0c60b0

Original commit: KhronosGroup/SPIRV-LLVM-Translator@fd22f8e

Updates UR branch

96812b9

Merge branch 'sycl-upstream' into maxime/profiling_v2

f589d9b

[SYCL][NFC] Fix some 'startswith/endswith' related to SYCL (intel#12573)

f7a360d

Replace some deprecated 'startswith' and 'endswith' with 'starts_with' and 'ends_with' to clear some warnings when building SYCL compiler. --------- Signed-off-by: jinge90 <ge.jin@intel.com>

Revert "Update add-ir-annotations tests after 5518a9d"

21e703a

This reverts commit 3d4c6c7. Due to | * 6e6aa44 2024-01-31 Revert "[Clang][Sema] fix outline member function template with defau… (#80144) ekeane@nvidia.com

[Driver] Allow for -O3 on Windows using clang-cl (intel#12504)

0af4ac7

We currently support -O3 for Linux compilations, expand this to also be available on Windows. This also better aligns with our existing product offerings.

[SYCL][libdevice] Add sqrt with rounding mode supported in sycl::ext:…

6c1dde4

…:intel::math (intel#12571) Signed-off-by: jinge90 <ge.jin@intel.com>

LLVM and SPIRV-LLVM-Translator pulldown (WW05)

0dc97ec

LLVM: llvm/llvm-project@178719e SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@a1b1f49

[SYCL] [NATIVECPU] Add OCK subdirectory with EXCLUDE_FROM_ALL (intel#…

71eee2c

…12579) Adding `EXCLUDE_FROM_ALL` to the `add_subdirectory` for the OneAPI Construction Kit, in order to to avoid building its components unless they are required by the SYCL toolchain.

Merge branch 'sycl-upstream' into maxime/profiling_v2

262b44a

[SYCL] Disable dynamic_address_cast test on FPGA (intel#12561)

9b2e77a

The FPGA emulator is currently affected by the same issue as the CPU runtime. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Updates function name

46bce9c

[UR][CL] Handle INTERMEDIATE binary type (intel#12562)

faad41d

oneapi-src/unified-runtime#1302

Merge branch 'sycl-upstream' into maxime/profiling_v2

30ab2fe

mfrancepillois closed this Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Node Profiling #353

[SYCL][Graph] Node Profiling #353

mfrancepillois commented Jan 25, 2024 •

edited

Loading

Bensuo left a comment

mfrancepillois commented Feb 2, 2024

[SYCL][Graph] Node Profiling #353

[SYCL][Graph] Node Profiling #353

Conversation

mfrancepillois commented Jan 25, 2024 • edited Loading

Bensuo left a comment

Choose a reason for hiding this comment

mfrancepillois commented Feb 2, 2024

mfrancepillois commented Jan 25, 2024 •

edited

Loading