Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Specify API for whole graph updates #363

Closed
wants to merge 861 commits into from

Conversation

fabiomestre
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@Bensuo Bensuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a minor comment

sdesmalen-arm and others added 22 commits March 26, 2024 11:40
…aming functions. (#85388)

Similar to how we protected FP/fixed-vector arguments and results from
calls, we should do the same for arguments/results from locally-streaming
functions such that those are not spilled/filled as ZPR registers.

This may cause a small regression (additional spills/fills), which is
addressed by #85386.
Testing with MSVC link.exe showed that it respects such options, while
LLD currently discards them.
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGClass.cpp
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExpr.cpp
…es (#86595)

Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.

The external `__tgt_rtl` functions still use the global method, but will
be removed later.
…ed .def files. (#86564)

It's similar to #86535, but for export specified in .def files.
…r (#86486)

This attribute tells the compiler that the variable must have its exit-time
destructor run, so it makes sense that it would silence the warning telling
users that an exit-time destructor is required.

Fixes llvm/llvm-project#68686
llvm-project/clang/lib/Sema/SemaDecl.cpp:11653:20:
error: unused variable 'OldMVKind' [-Werror,-Wunused-variable]
  MultiVersionKind OldMVKind = OldFD->getMultiVersionKind();
                   ^
1 error generated.
This patch removes APIs that creating NUW neg. It is a trivial case
because `sub nuw 0, X` always gets simplified into zero.
I believe there is no optimization opportunities in the real-world
applications that we can take advantage of the nuw flag.

Motivated by
llvm/llvm-project#84792 (comment).

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u
Fixed the printing of templated argument list and added test case.
…NFC. (#86259)

This patch passes APInt by const reference in m_SpecificInt instead of
by value. Specifically, it refactors `m_SpecificInt(uint64_t V)` to
avoid APInt construction and dangling reference.

I believe it is safe to pass the APInt by const reference into
`m_SpecificInt` even if it is a temporary.
See also https://en.cppreference.com/w/cpp/language/lifetime
> All temporary objects are destroyed as the last step in evaluating the
[full-expression](https://en.cppreference.com/w/cpp/language/expressions#Full-expressions)
that (lexically) contains the point where they were created

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=7edf459b95ab2be33b70ec67faf87b3b8cc84f09&stat=instructions:u
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
Need to include initial sext/zext/trunc nodes to the list of the demoted
root values to correctly calculate the cost and handle the
vectorization.
…tadata (#86431)

Similar to #80829 for GlobalISel.
Added a new variant of the CHECK() function that takes a custom message
as a parameter. This is useful for more meaninful error messages when
the compiler is expected to crash.

Fixes #78931
…(#84864)

This change:
- Updates the existing Clang User's Manual section on SPGO so that it
describes how to use llvm-profgen to perform SPGO on Windows. This is
new functionality implemented in #83972.
- Fixes a minor typo in the existing llvm-profgen invocation example.
- Adds an LLVM release note on this new functionality in llvm-profgen.
This matches the CMake targets and reduces the number of headers that
need to be included in multiple targets.
Using remove() on DeclContext::lookup_result list invalidates iterators.

This assertion failure was one (fortunate) symptom:
```
clang/include/clang/AST/DeclBase.h:1337: reference clang::DeclListNode::iterator::operator*() const: Assertion `Ptr && "dereferencing end() iterator"' failed.
```
…ich are needed to authenticate signed pointers (#67454)" (#86674)

This reverts commit 8bd1f91.

It appears that the commit broke msan bots.
These tests show invalid tbaa.struct metadata that is currently accepted
in preparation for a change to the IR Verifier that will then reject it.

PR: llvm/llvm-project#86167
- Revamped lowering conversion pattern for `tosa.reshape` to handle previously unsupported combinations of dynamic dimensions in input and output tensors. The lowering strategy continues to rely on pairs `tensor.collapse_shape` + `tensor.expand_shape`, which allow for downstream fusion with surrounding `linalg.generic` ops.

- Fixed bug in canonicalization pattern `ReshapeOp::fold()` in `TosaCanonicalizations.cpp`. The input and result types being equal is not a sufficient condition for folding. If there is more than 1 dynamic dimension in the input and result types, a productive reshape could still occur.

- This work exposed the fact that bufferization does not properly handle a `tensor.collapse_shape` op producing a 0D tensor from a dynamically shaped one due to a limitation in `memref.collapse_shape`. While the proper way to address this would involve releasing the `memref.collapse_shape` restriction and verifying correct bufferization, this is left as possible future work. For now, this scenario is avoided by casting the `tosa.reshape` input tensor to a static shape if necessary (see `inferReshapeInputType()`.

- An extended set of tests are intended to cover relevant conversion paths. Tests are named using pattern `test_reshape_<rank>_{up|down|same}_{s2s|s2d|d2s|d2d}_{explicit|auto}[_empty][_identity]`, where:
	
  - `<rank>` is the input rank (e.g., 3d, 6d)
  - `{up|down|same}` indicates whether the reshape increases, decreases, or retains the input rank.
  - `{s2s|s2d|d2s|d2d}` indicates whether reshape converts a statically shaped input to a statically shaped result (`s2s`), a statically shaped input to a dynamically shaped result (`s2d`), etc.
  - `{explicit|auto}` is used to indicate that all values in the `new_shape` attribute are >=0 (`explicit`) or that a -1 placeholder value is used (`auto`).
  - `empty` is used to indicate that `new_shape` includes a component set to 0.
  - `identity` is used when the input and result shapes are the same.
aelovikov-intel and others added 25 commits April 1, 2024 08:13
problem discovered when using -fsanitize=address on the KernelCompiler
tests
…compiler (intel#13216)

Author: Marcos Maronas <marcos.maronas@intel.com>

Co-authored-by: Marcos Maronas <marcos.maronas@intel.com>
Has been deprecated for some time with a planned removal during the ABI
breaking window.
…el#13191)

It's been unused since intel#12532 awaiting
ABI breaking window for the final removal.
…mbinations(intel#13218)


Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Any user code relying on those included implicitly is wrong and we don't
consider it a breaking change, just a bug fix.

Reverts intel#11326.

We agreed that this isn't an ABI break as the code relying on this
wasn't standard conforming. However, ABI break is next week, so I'll be
merging it then. Meanwhile, adding `abi-break` label to ease
documentation update later.
…ault (intel#13183)

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
…nabled (intel#13241)

Try to reland PR: intel#5698

Author: Sergei Kanaev, s-kanaev
…intel#13229)

* fix missed scope for `equal_vec` in Vulkan interop `mipmaps.cpp` test.
…#13224)

- Fix uninitialized pointer values leading to trying to call
piExtCommandBufferReleaseCommand on backends which don't support update,
which may return as an unsupported feature. Initializing pointers to 0
prevents this call from happening.
Not used, was faiting for an ABI breaking window to be removed.
Also reverts `isnan` check to non-preview implementation to avoid
unnecessary dependecy of a data type on builtins.
After intel#13173 , we are not able to push
container images.
See
https://github.com/intel/llvm/actions/runs/8485593107/job/23250649681

```
------
 > pushing ghcr.io/intel/llvm/ubuntu2204_base:2f03ef85fee5e867c8250d535f561f2e52e5260c with docker:
------
ERROR: denied: installation not allowed to Write organization package
Error: buildx failed with: ERROR: denied: installation not allowed to Write organization package
```

We need to update the docker images, so need to write packages.

Push permission tested through non PR workflow run here:
https://github.com/intel/llvm/actions/runs/8516878870
…w tests (intel#13231)

Clean this up and add some tests.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
To try and alleviate improper dependency information generated when
-fsycl is enabled, an additionl dependency compilation step was created.
This additional step, while creating the needed information causes
issues when we use -save-temps. These issues being assosiated with bad
dependencies for the integration header.

Clean up these pathways to be more in line with a typical 'device' and
'host' only compilation, relying on the dependency information from the
device side.

Additional changes can be made when we can clean up the dependency
generation from the host compilation step when we are building the
temporary file. Removing the append file step would clear this up as
well.
:handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy

* Throws synchronously with error code `invalid` if `graph` contains any node
which is not a kernel command or host task, e.g.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove host-task mention from this exception

|Updates the executable graph node inputs & outputs from a topologically
identical modifiable graph. A topologically identical graph is one with the
same structure of nodes and edges, and the nodes added in the same order to
both graphs. Equivalent nodes in topologically identical graphs each have the
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes do not target devices. Only graphs do

@fabiomestre fabiomestre changed the base branch from pablo/explicit-update to sycl April 2, 2024 18:13
@fabiomestre fabiomestre closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.