-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit Update with Indices implementation #356
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be checking somewhere if the backend device supports update, reported by UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_SUPPORT_EXP
, and error if that's not available and a user tries to update
814c91f
to
e665f05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR!
I think you should rebase it on the latest version of the sycl branch (to avoid confusion with regard to nodes and their copies in exec_graph (Commit intel@5337a8a#diff-660b1c0efae10a9e21cba6738e34672b1a922f53eb2d466d8e0908d3512f042a)).
But then we will need to use %{l0_leak_check}
instead of UR_L0_LEAKS_DEBUG=1
in your tests.
9233010
to
96751d0
Compare
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The patch stack adds a new base class -> 1. Add DbgRecord base class for DPValue and the not-yet-added DPLabel class. 2. Add the DPLabel class. 3. Enable dbg.label conversion and add support to passes. Patches 1 and 2 are NFC. In the near future we also will rename DPValue to DbgVariableRecord and DPLabel to DbgLabelRecord, at which point we'll overhaul the function names too. The name DPLabel keeps things consistent for now.
…#82373) It turns out there's a pathway for Functions to be inserted into modules without having the "New" debug-info flag set correctly, which this patch fixes. Sadly there isn't a Module::insert method to instrument out there, everyone touches the list directly. This fix exposes a path where such functions are produced in the outliner in the wrong mode; requiring a fix there to correctly drop RemoveDIs-mode debug-info. This is covered by test/DebugInfo/AArch64/ir-outliner.ll
See docs at - https://gustedt.gitlabpages.inria.fr/c23-library/#stdckdint - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf (Ch7.10) Compiler header: - https://github.com/llvm/llvm-project/blob/450462cbaceddf57812ce15b5135b17f65a77654/clang/lib/Headers/stdckdint.h - New version of GCC (https://github.com/gcc-mirror/gcc/blob/cd503b0616462445381a8232fb753239d319af76/gcc/ginclude/stdckdint.h) also provides this.
fix the issue that `__has_builtin` is undefined on some non-clang targets.
Current we only support `C` as the remainder, but we can also limit with a constant numerator. Proofs: https://alive2.llvm.org/ce/z/QB95gU Closes #82303
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.
C23 added the wb and uwb suffixes to generate a bit-precise integer value. These values can be larger than what is representable in intmax_t or uintmax_t. We were asserting that an enumerator constant could not have a value larger than unsigned long long but that's now a possibility. This patch turns the assertion into a "value too large" diagnostic. Note, we do not yet implement WG14 N3029 and so the behavior of this patch will cause the enumerator to be cast to unsigned long long, but this behavior may change in the future. GCC selects __uint128_t as the underlying type for such an enumeration and we may want to match that behavior in the future. This patch has several FIXME comments related to this and the release notes call out the possibility of a change in behavior in the future. Fixes llvm/llvm-project#69352
…nnamed common block definitions (#81770) This patch adds assembly file `z_AIX_asm.S` that contains the 32- and 64-bit XCOFF version of microtasking routines and unnamed common block definitions. This code has been run through the libomp LIT tests and a user package successfully.
When PSHUFB is used as a LUT (for CTPOP, BITREVERSE etc.), its the source operand that is constant and the index operand the variable. As long as the indices don't set the MSB (which zeros the output element), then the common known bits from the source operand can be used directly, even though the shuffle mask isn't constant. Further helps to improve CTPOP reduction codegen
…ual (#80562) (#79887) When the offset of a PT_INTERP segment equals the offset of a PT_LOAD segment, we consider that the parent of the PT_LOAD segment is the PT_INTERP segment. In `layoutSegments`, we place both segments to be after the current `Offset`, ignoring the PT_LOAD alignment. This scenario is possible with fixed section addresses, but doesn't happen with default linker layouts (.interp precedes other sections and is part of a PT_LOAD segment containing the ELF header and program headers). ``` % cat a.s .globl _start; _start: ret .rodata; .byte 0 .tdata; .balign 4096; .byte 0 % clang -fuse-ld=lld a.s -o a -nostdlib -no-pie -z separate-loadable-segments -Wl,-Ttext=0x201000,--section-start=.interp=0x202000,--section-start=.rodata=0x202020,-z,nognustack % llvm-objcopy a a2 % llvm-readelf -l a2 # incorrect offset(PT_LOAD) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R 0x8 INTERP 0x001001 0x0000000000202000 0x0000000000202000 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x000200 0x000200 R 0x1000 LOAD 0x001000 0x0000000000201000 0x0000000000201000 0x000001 0x000001 R E 0x1000 //// incorrect offset LOAD 0x001001 0x0000000000202000 0x0000000000202000 0x000021 0x000021 R 0x1000 LOAD 0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x001000 RW 0x1000 TLS 0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x000001 R 0x1000 GNU_RELRO 0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x001000 R 0x1000 ``` The same issue occurs for PT_TLS/PT_GNU_RELRO if we PT_TLS's alignment is smaller and we place the PT_LOAD after PT_TLS/PT_GNU_RELRO segments (not linker default, but possible with a `PHDRS` linker script command). Fix #79887: when two segments have the same offset, order the one with a larger alignment first. In the previous case, the PT_LOAD segment will go before the PT_INTERP segment. In case of equal alignments, it doesn't matter which segment is treated as the parent segment.
…l during debugger launch. (#82051) This fixes an issue where the error is lost if a command while executing `launchCommands` when launching the debugger. This should fix #82048
This patch adds support for expected InstallAPI inputs. InstallAPI accepts a well defined filelist of headers and how those headers represent a single library. InstallAPI captures header files to determine linkable symbols to then compare against what was compiled in a binary dylib and generate TBD files.
When MergeFunctions creates new thunk functions, it needs to copy over the debug info format kind from the original function, otherwise we'll mix debug info formats and run into assertions. This was exposed by a downstream change that runs MergeFunctions before inlining, which caused assertions when inlining attempted to inline thunks created by merging, and the added test covers both scenarios where merging creates thunks.
This updates the signal handle thread coordinating to use a user signal bit on the SignalHandlerEvent to coordinate shutdown instead of closing the event handle. Closing the event handle is racy as the handle may be closed before the signal handler thread resolves the handle value in _zx_object_wait_many() and we would like to make this an explicit error. Using the user signal bit 1 instead and then closing the event object after the signal handler thread is joined cannot race as the wait will terminate whether the signal is raised before or after the wait begins.
…tWidth/getScalarSizeInBits. NFC. Noticed on #82241 - we don't need to use the IntegerType just for the scalar width, and we were calling it 3 times in different forms - we can just call Type::getScalarSizeInBits once and reuse.
…on and ref compute (intel#12800)
…ove warning messages (intel#12718) This PR adds a '--ignore-device-selectors' CLI option to sycl-ls that prints all platforms available in the user's system, irrespective of the DPCPP filter environment variables like ONEAPI_DEVICE_SELECTOR.
Started to fail in post commit after intel#12719. This looks like a pre-existing bug in the test so I'm going to temporarily disable the test instead of reverting. Will work on the fix with highest priority after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work
sycl/test-e2e/Graph/Explicit/update_with_indices_multiple_exec_graphs.cpp
Outdated
Show resolved
Hide resolved
Introduce logging ability to `SpecConstantsPass` through `LLVM_DEBUG` macro so that we can enhance testing. --------- Signed-off-by: Marcos Maronas <marcos.maronas@intel.com>
This PR adds the property interface functions get_property and has_property to host_sampled_image_accessor, host_unsampled_image_accessor, sampled_image_accessor and unsampled_image_accessor runtime classes.
Add a new workflow to run SYCL-CTS when new nightly build is available. This should help us to handle new failures in advance, monitor new suites and so on. To start with running only reduced scope of tests (cts_exclude_filter is big enough) on unused "debug" runner (linux+cpu).
This commit generalizes two helper functions in group_algorithm.hpp to make it so they can also handle non-uniform groups. --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…2464) - Creation / destruction of unsampled image arrays - Fetching / writing of unsampled image arrays - `sycl::ext::oneapi::experimental::image_type::array` enum value added - `sycl::ext::oneapi::experimental::image_descriptor::array_size` member added - `sycl::ext::oneapi::experimental::image_descriptor::verify()` member function added Correlated UR PR: [[Bindless][Exp] Add Support For Image Arrays intel#1274](oneapi-src/unified-runtime#1274)
LLVM: llvm/llvm-project@815644b SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@747ef0f
…#12817) **Problem:** USM/usm_leak_check.cpp fails on multi-card PVC because the number of zeMemAlloc* and zeMemFree calls doesn't match what the test expects. This is mainly because of the recent changes to USM memory allocation, like a different memory pool logic. **Proposed Fix** Since the objective of this test is to just check for USM leaks, I don't think that we even have to check the output of UR_L0_LEAKS_DEBUG=1. It is my understanding that UR_L0_LEAKS_DEBUG=1 will abort upon finding a mismatch in the number of alloc/free calls. So, this PR removes explicit checks for zeMemAlloc* and zeMemFree calls.
We decided that simple "range" kernels are hard to implement with the free function kernel syntax. This commit removes them from the extension specification. See the "Resolved issues" section for the rationale.
…ntel#12719)" (intel#12823) This commit reverts the functional change because device filtering accesses plugins directly and not through the get_platforms() interface resulting in inconsistent device numbering. Test bug-fixes from the PR are not being reverted though.
…common.hpp (intel#12796) Because VERBOSE_PRINT is also used inside file user_types_common.hpp
…l#12822) A declaration of get_mip_level_mem_handle incorrectly took sycl::device as a parameter, which should have been a sycl::queue. This is now fixed.
- make relaxed fence a no op to satisfy the SYCL spec. - make acquire/release/acq_rel use the lighter acq_rel fence for sm_70 instead of the seq_cst fence. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
…ntel#12780) If the OpenCL platform doesn't support USM, don't set PI_USM_INDIRECT_ACCESS exec info. This will avoid SYCL program to fail when they don't use USM. If the program do need USM support, the runtime will fail on other API calls (like memory allocation). --------- Signed-off-by: Victor Lomuller <victor@codeplay.com>
This PR adds mechanisms to change the default queue of the current device both via the device extension and a free function. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
36db5bc
to
d7dee11
Compare
- Experimental implementation of explicit update with indices - New scheduler command for updating a command buffer command - PI equivalents for new UR APIs - E2E and Unit Tests
d7dee11
to
b195409
Compare
Closing in favour of upstream PR: intel#12840 |
Implements spec PR - intel#12486