Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Update with Indices implementation #356

Closed
wants to merge 755 commits into from

Conversation

Bensuo
Copy link
Collaborator

@Bensuo Bensuo commented Feb 14, 2024

Implements spec PR - intel#12486

  • Implementation of explicit update using indices
  • E2E tests for each update type: accessor/pointer/scalar/nd_range
  • Tests for errors and other edge cases

@Bensuo Bensuo added the Graph Implementation Related to DPC++ implementation and testing label Feb 14, 2024
Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking somewhere if the backend device supports update, reported by UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_SUPPORT_EXP, and error if that's not available and a user tries to update

sycl/include/sycl/detail/pi.h Outdated Show resolved Hide resolved
sycl/plugins/unified_runtime/CMakeLists.txt Outdated Show resolved Hide resolved
sycl/source/detail/handler_impl.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@mfrancepillois mfrancepillois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR!
I think you should rebase it on the latest version of the sycl branch (to avoid confusion with regard to nodes and their copies in exec_graph (Commit intel@5337a8a#diff-660b1c0efae10a9e21cba6738e34672b1a922f53eb2d466d8e0908d3512f042a)).
But then we will need to use %{l0_leak_check} instead of UR_L0_LEAKS_DEBUG=1 in your tests.

OCHyams and others added 17 commits February 20, 2024 16:00
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class

    -> 1. Add DbgRecord base class for DPValue and the not-yet-added
          DPLabel class.
       2. Add the DPLabel class.
       3. Enable dbg.label conversion and add support to passes.

Patches 1 and 2 are NFC.

In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
…#82373)

It turns out there's a pathway for Functions to be inserted into modules
without having the "New" debug-info flag set correctly, which this patch
fixes. Sadly there isn't a Module::insert method to instrument out
there, everyone touches the list directly.

This fix exposes a path where such functions are produced in the
outliner in the wrong mode; requiring a fix there to correctly drop
RemoveDIs-mode debug-info. This is covered by
test/DebugInfo/AArch64/ir-outliner.ll
fix the issue that `__has_builtin` is undefined on some non-clang
targets.
Current we only support `C` as the remainder, but we can also limit
with a constant numerator.

Proofs: https://alive2.llvm.org/ce/z/QB95gU

Closes #82303
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
C23 added the wb and uwb suffixes to generate a bit-precise integer
value. These values can be larger than what is representable in intmax_t
or uintmax_t.

We were asserting that an enumerator constant could not have a value
larger than unsigned long long but that's now a possibility. This patch
turns the assertion into a "value too large" diagnostic.

Note, we do not yet implement WG14 N3029 and so the behavior of this
patch will cause the enumerator to be cast to unsigned long long, but
this behavior may change in the future. GCC selects __uint128_t as the
underlying type for such an enumeration and we may want to match that
behavior in the future. This patch has several FIXME comments related to
this and the release notes call out the possibility of a change in
behavior in the future.

Fixes llvm/llvm-project#69352
…nnamed common block definitions (#81770)

This patch adds assembly file `z_AIX_asm.S` that contains the 32- and
64-bit XCOFF version of microtasking routines and unnamed common block
definitions. This code has been run through the libomp LIT tests and a
user package successfully.
When PSHUFB is used as a LUT (for CTPOP, BITREVERSE etc.), its the source operand that is constant and the index operand the variable. As long as the indices don't set the MSB (which zeros the output element), then the common known bits from the source operand can be used directly, even though the shuffle mask isn't constant.

Further helps to improve CTPOP reduction codegen
…ual (#80562)

(#79887) When the offset of a PT_INTERP segment equals the offset of a
PT_LOAD segment, we consider that the parent of the PT_LOAD segment is
the PT_INTERP segment. In `layoutSegments`, we place both segments to be
after the current `Offset`, ignoring the PT_LOAD alignment.

This scenario is possible with fixed section addresses, but doesn't
happen with default linker layouts (.interp precedes other sections and
is part of a PT_LOAD segment containing the ELF header and program
headers).

```
% cat a.s
.globl _start; _start: ret
.rodata; .byte 0
.tdata; .balign 4096; .byte 0
% clang -fuse-ld=lld a.s -o a -nostdlib -no-pie -z separate-loadable-segments -Wl,-Ttext=0x201000,--section-start=.interp=0x202000,--section-start=.rodata=0x202020,-z,nognustack
% llvm-objcopy a a2
% llvm-readelf -l a2   # incorrect offset(PT_LOAD)
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R   0x8
  INTERP         0x001001 0x0000000000202000 0x0000000000202000 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000200 0x000200 R   0x1000
  LOAD           0x001000 0x0000000000201000 0x0000000000201000 0x000001 0x000001 R E 0x1000
//// incorrect offset
  LOAD           0x001001 0x0000000000202000 0x0000000000202000 0x000021 0x000021 R   0x1000
  LOAD           0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x001000 RW  0x1000
  TLS            0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x000001 R   0x1000
  GNU_RELRO      0x002000 0x0000000000203000 0x0000000000203000 0x000001 0x001000 R   0x1000
```

The same issue occurs for PT_TLS/PT_GNU_RELRO if we PT_TLS's alignment
is smaller and we place the PT_LOAD after PT_TLS/PT_GNU_RELRO segments
(not linker default, but possible with a `PHDRS` linker script command).

Fix #79887: when two segments have the same offset, order the one with a
larger alignment first. In the previous case, the PT_LOAD segment will
go before the PT_INTERP segment. In case of equal alignments, it doesn't
matter which segment is treated as the parent segment.
…l during debugger launch. (#82051)

This fixes an issue where the error is lost if a command while executing
`launchCommands` when launching the debugger.

This should fix #82048
This patch adds support for expected InstallAPI inputs. InstallAPI
accepts a well defined filelist of headers and how those headers
represent a single library.

InstallAPI captures header files to determine linkable symbols to then
compare against what was compiled in a binary dylib and generate TBD
files.
When MergeFunctions creates new thunk functions, it needs to copy over
the debug info format kind from the original function, otherwise we'll
mix debug info formats and run into assertions. This was exposed by a
downstream change that runs MergeFunctions before inlining, which caused
assertions when inlining attempted to inline thunks created by merging,
and the added test covers both scenarios where merging creates thunks.
This updates the signal handle thread coordinating to use a user signal
bit on the SignalHandlerEvent to coordinate shutdown instead of closing
the event handle. Closing the event handle is racy as the handle may be
closed before the signal handler thread resolves the handle value in
_zx_object_wait_many() and we would like to make this an explicit error.
Using the user signal bit 1 instead and then closing the event object
after the signal handler thread is joined cannot race as the wait will
terminate whether the signal is raised before or after the wait begins.
…tWidth/getScalarSizeInBits. NFC.

Noticed on #82241 - we don't need to use the IntegerType just for the scalar width, and we were calling it 3 times in different forms - we can just call Type::getScalarSizeInBits once and reuse.
fineg74 and others added 5 commits February 22, 2024 14:18
…ove warning messages (intel#12718)

This PR adds a '--ignore-device-selectors' CLI option to sycl-ls that
prints all platforms available in the user's system, irrespective of the
DPCPP filter environment variables like ONEAPI_DEVICE_SELECTOR.
Started to fail in post commit after
intel#12719. This looks like a pre-existing
bug in the test so I'm going to temporarily disable the test instead of
reverting. Will work on the fix with highest priority after that.
Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work

sycl/test-e2e/Graph/Explicit/update_nd_range.cpp Outdated Show resolved Hide resolved
sycl/test-e2e/Graph/Explicit/update_nd_range.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
maarquitos14 and others added 18 commits February 23, 2024 08:24
Introduce logging ability to `SpecConstantsPass` through `LLVM_DEBUG`
macro so that we can enhance testing.

---------

Signed-off-by: Marcos Maronas <marcos.maronas@intel.com>
This PR adds the property interface functions get_property and
has_property to host_sampled_image_accessor,
host_unsampled_image_accessor, sampled_image_accessor and
unsampled_image_accessor runtime classes.
Add a new workflow to run SYCL-CTS when new nightly build is available.
This should help us to handle new failures in advance, monitor new
suites and so on.

To start with running only reduced scope of tests (cts_exclude_filter is
big enough) on unused "debug" runner (linux+cpu).
This commit generalizes two helper functions in group_algorithm.hpp to
make it so they can also handle non-uniform groups.

---------

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…2464)

- Creation / destruction of unsampled image arrays
- Fetching / writing of unsampled image arrays
- `sycl::ext::oneapi::experimental::image_type::array` enum value added
- `sycl::ext::oneapi::experimental::image_descriptor::array_size` member
added
- `sycl::ext::oneapi::experimental::image_descriptor::verify()` member
function added

Correlated UR PR: [[Bindless][Exp] Add Support For Image Arrays
intel#1274](oneapi-src/unified-runtime#1274)
)

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…#12817)

**Problem:**
USM/usm_leak_check.cpp fails on multi-card PVC because the number of
zeMemAlloc* and zeMemFree calls doesn't match what the test expects.
This is mainly because of the recent changes to USM memory allocation,
like a different memory pool logic.

**Proposed Fix**
Since the objective of this test is to just check for USM leaks, I don't
think that we even have to check the output of UR_L0_LEAKS_DEBUG=1. It
is my understanding that UR_L0_LEAKS_DEBUG=1 will abort upon finding a
mismatch in the number of alloc/free calls. So, this PR removes explicit
checks for zeMemAlloc* and zeMemFree calls.
We decided that simple "range" kernels are hard to implement with the
free function kernel syntax. This commit removes them from the extension
specification. See the "Resolved issues" section for the rationale.
…ntel#12719)" (intel#12823)

This commit reverts the functional change because device filtering
accesses plugins directly and not through the get_platforms() interface
resulting in inconsistent device numbering. Test bug-fixes from the PR
are not being reverted though.
…common.hpp (intel#12796)

Because VERBOSE_PRINT is also used inside file user_types_common.hpp
…l#12822)

A declaration of get_mip_level_mem_handle incorrectly took sycl::device
as a parameter, which should have been a sycl::queue. This is now fixed.
- make relaxed fence a no op to satisfy the SYCL spec.
- make acquire/release/acq_rel use the lighter acq_rel fence for sm_70
instead of the seq_cst fence.

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
…ntel#12780)

If the OpenCL platform doesn't support USM, don't set
PI_USM_INDIRECT_ACCESS exec info. This will avoid SYCL program to fail
when they don't use USM. If the program do need USM support, the runtime
will fail on other API calls (like memory allocation).

---------

Signed-off-by: Victor Lomuller <victor@codeplay.com>
This PR adds mechanisms to change the default queue of the current
device both via the device extension and a free function.

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
- Experimental implementation of explicit update with indices
- New scheduler command for updating a command buffer command
- PI equivalents for new UR APIs
- E2E and Unit Tests
@Bensuo
Copy link
Collaborator Author

Bensuo commented Feb 27, 2024

Closing in favour of upstream PR: intel#12840

@Bensuo Bensuo closed this Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Implementation Related to DPC++ implementation and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.