[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicated cmd-list #357

Currently, `phaseParity` argument of `nvgpu.mbarrier.try_wait.parity` is index. This can cause a problem if it's passed any value different than 0 or 1. Because the PTX instruction only accepts even or odd phase. This PR makes phaseParity argument i1 to avoid misuse. Here is the information from PTX doc: ``` The .parity variant of the instructions test for the completion of the phase indicated by the operand phaseParity, which is the integer parity of either the current phase or the immediately preceding phase of the mbarrier object. An even phase has integer parity 0 and an odd phase has integer parity of 1. So the valid values of phaseParity operand are 0 and 1. ``` See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-mbarrier-try-wait

…81239) This function will be useful when we change the behavior of record-type prvalues so that they directly initialize the associated result object. See also the comment here for more details: https://github.com/llvm/llvm-project/blob/9e73656af524a2c592978aec91de67316c5ce69f/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L354 As part of this patch, we document and assert that synthetic fields may not have reference type. There is no practical use case for this: A `StorageLocation` may not have reference type, and a synthetic field of the corresponding non-reference type can serve the same purpose.

llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545. This is #78606 rebased and with the addition of DPValue handling. Note the addition of --try-experimental-debuginfo-iterators in the tests and some shuffling of code in MemoryTaggingSupport.cpp.

The strictfp attribute has the requirement that "LLVM will not introduce any new floating-point instructions that may trap". The llvm.is.fpclass intrinsic is documented as "The function never raises floating-point exceptions", and the fcmp instruction may raise one, so we can't transform the former into the latter in functions with the strictfp attribute.

…#81585) This reverts commit a034e65. Some protobuf users reported that this patch caused a significant compile-time regression because `TailDuplicator` works poorly with a specific pattern. We will reland it once the codegen issue is fixed.

…ugprone-unused-local-non-trivial-variable (#81563)

…` (NFC)

…(#81482) Use templates instead. Part of <llvm/llvm-project#62629>.

This patch adds full support for linking SystemZ (ELF s390x) object files. Support should be generally complete: - All relocation types are supported. - Full shared library support (DYNAMIC, GOT, PLT, ifunc). - Relaxation of TLS and GOT relocations where appropriate. - Platform-specific test cases. In addition to new platform code and the obvious changes, there were a few additional changes to common code: - Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and R_PLT_GOTREL) needed to support certain s390x relocations. I chose not to use a platform-specific name since nothing in the definition of these relocs is actually platform-specific; it is well possible that other platforms will need the same. - A couple of tweaks to TLS relocation handling, as the particular semantics of the s390x versions differ slightly. See comments in the code. This was tested by building and testing >1500 Fedora packages, with only a handful of failures; as these also have issues when building with LLD on other architectures, they seem unrelated. Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>

The motivation here was a suggestion over in Compiler Explorer. You can use `-mllvm` already to do this but since gfortran supports `-masm`, I figured I'd try to add it. This is done by flang expanding `-masm` into `-mllvm x86-asm-syntax=`, then passing that to fc1. Which then collects all the `-mllvm` options and forwards them on. The code to expand it comes from clang `Clang::AddX86TargetArgs` (there are some other places doing the same thing too). However I've removed the `-inline-asm` that clang adds, as fortran doesn't have inline assembly. So `-masm` for flang purely changes the style of assembly output. ``` $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu <...> pushq %rbp $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=att <...> pushq %rbp $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=intel <...> push rbp ``` The test is adapted from `clang/test/Driver/masm.c` by removing the clang-cl related lines and changing the 32 bit triples to 64 bit triples since flang doesn't support 32 bit targets.

…(#80991) Although in a normal implementation the assumption is reasonable, it seems that some esoteric implementation are not returning a T&. This should be handled correctly and the values be propagated. --------- Co-authored-by: martinboehme <mboehme@google.com>

… (#80966) The 1-D case directly maps to LLVM intrinsics. The n-D case will be handled by unrolling to 1-D first (in a later patch). Depends on: #80965

Without this I would hit errors with libstdc++-12 like: /usr/include/c++/12/bits/stl_iterator_base_funcs.h:230:5: note: candidate template ignored: substitution failure [with _InputIterator = llvm::const_set_bits_iterator_impl<llvm::BitVector>]: argument may not have 'void' type next(_InputIterator __x, typename ^

…oading directives (#81081) This patch adds support for the depend clause in a number of OpenMP directives/constructs related to offloading. Specifically, it adds the handling of the depend clause when it is used with the following constructs - target - target enter data - target update data - target exit data

Fix crash raised in comments for 5c9f768

…1500) Adds a test to help document Linalg Ops that are currently not supported by the vectoriser (i.e. the logic to vectorise these is missing). The list is not exhaustive.

@krzysz00

Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V conversion doesn't handle them at all at the moment. Sometimes it's preferable to treat multidim vectors as linearized 1D. Add pass to do this. Only constants and simple elementwise ops are supported for now. @krzysz00 I've extracted yours result type conversion code from LegalizeToF32 and moved it to common place. Also, add ConversionPattern class operating on traits.

This fixes a crash when lowering an extract_subvector like: t0:v1i64 = extract_subvector t1:v2i64, 1 Whilst we never need a vslidedown with M1 on scalable vector types, we might need to do it for v1i64/v1f64, since the smallest container type for it is nxv1i64/nxv1f64. The lowering code is still correct for this case, but the assertion was too strict. The actual invariant we're relying on is that ContainerSubVecVT's LMUL <= M1, not < M1. Hence why we handled v2i32 fine, because its container type was nxv1i32 and MF2.

Allocate storage and initialize it with the given APValue contents.

…#80735) zOS doesn't support aligned allocation, so mark these testcases as unsupported. Continuation of https://reviews.llvm.org/D102798

Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make sure them not initialized as zero. FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?

… (#81602) In a few places we test whether sets (i.e. sorted ranges) intersect by computing the set_intersection and then testing whether it is empty. For this purpose it should be more efficient to use a std:vector instead of a std::set to hold the result of the set_intersection, since insertion is simpler.

Just emit their satisfaction state, which is what the current interpreter does as well.

'serial', 'parallel', and 'kernel' constructs are all considered 'Compute' constructs. This patch creates the AST type, plus the required infrastructure for such a type, plus some base types that will be useful in the future for breaking this up. The only difference between the three is the 'kind'( plus some minor clause legalization rules, but those can be differentiated easily enough), so rather than representing them as separate AST nodes, it seems to make sense to make them the same. Additionally, no clause AST functionality is being implemented yet, as that fits better in a separate patch, and this is enough to get the 'naked' constructs implemented. This is otherwise an 'NFC' patch, as it doesn't alter execution at all, so there aren't any tests. I did this to break up the review workload and to get feedback on the layout.

…tional. (#81600) It's not interesting for majority of downstream users.

…default" This reapplies commit bdde5f9 by undoing the revert bc66e0c. The previous reapplication 5c9f768 was reverted due to a crash (reproducer in comments for 5c9f768) which was fixed in #81595. As noted in the original commit, this commit may break downstream tests. If this commit is breaking your downstream tests, please see comment 12 in [0], which documents the kind of variation in tests we'd expect to see from this change and what to do about it. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Historically TableGen has used `A.swap(B)` to move containers without the expense of copying them. Perhaps this predated rvalue references. In any case `A = std::move(B)` seems like a more direct way to implement this when only A is required after the operation.

Apparently, some compilers [correctly] warn that the variable that was created prior to this change is unused. This reemoves the variable.

…1142) This uses https://pygithub.readthedocs.io/en/stable/github_objects/Repository.html?highlight=get_collaborator_permission#github.Repository.Repository.get_collaborator_permission. Which does https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user and returns the top level "permission" key. This is less detailed than the user/permissions key but should be fine for this use case. When a review is submitted we check: * If it's an approval. * Whether we have already left a merge on behalf comment (by looking for a hidden HTML comment). * Whether the author has permissions to merge their own PR. * Whether the reviewer has permissions to merge. If needed we leave a comment tagging the reviewer. If the reviewer also doesn't have merge permission, then it asks them to find someone else who does.

This patch changes how the macro __ARM_ARCH is defined to match its defintion in the ACLE. In ACLE 5.4.1, __ARM_ARCH is defined as equal to the major architecture version for ISAs up to and including v8. From v8.1 onwards, its definition is changed to include minor versions, such that for an architecture vX.Y, __ARM_ARCH = X*100 + Y. Before this patch, LLVM defined __ARM_ARCH using only the major architecture version for all architecture versions. This patch adds functionality to define __ARM_ARCH correctly for architectures greater than or equal to v8.1.

The load combine replaces a number of original loads with one new loads and also replaces the output chains of the original loads with the output chain of the new load. This is incorrect if the original load is retained (due to multi-use), as it may get incorrectly reordered. Fix this by using makeEquivalentMemoryOrdering() instead, which will create a TokenFactor with both chains. Fixes llvm/llvm-project#80911.

The failure rate is too high. See https://discourse.llvm.org/t/rfc-future-of-windows-pre-commit-ci/76840

Extra X86 tests for llvm/llvm-project#77790.

…teConstantInternal. This makes is easier to extend the code to support vector types.

…han c_int128_t as PowerPC only supports up to c_int64_t. (#81222) PowerPC only supports up to `c_int64_t`. Add macro `__powerpc__` and preprocess it for setting `c_intmax_t` in `iso_c_binding` intrinsic module.

Some extra `<>` and a missing full stop.

…altime clocks (#81331) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.

This seems to be a trick to avoid copying a RegUnitSet, but it can be done more simply using std::move.

…n DXIL.td (#81184) - Specify overload types of DXIL Operation as list of types instead of a string. - Add supported DXIL type record definitions to `DXIL.td` leveraging `LLVMType` to avoid duplicate definitions. - Spell out DXIL Operation Attribute specification string. - Make corresponding changes to process the records in DXILEmitter.cpp

This adds a layer between `SounceBreakpoint`/`FunctionBreakpoint` and `BreakpointBase` to have better separation and encapsulation so we are not directly operating on `SBBreakpoint`. I basically moved the `SBBreakpoint` and the methods that requires it from `BreakpointBase` to `Breakpoint`. This allows adding support for data watchpoint easier by sharing the logic inside `BreakpointBase`.

build-llvm/tools/clang/docs/LanguageExtensions.rst:2768: WARNING: Title underline too short.

linalg.mmt4d was added a while back (https://reviews.llvm.org/D105244), but there are virtually no tests in-tree. In the spirit of documenting through test, this PR adds a few basic examples.

Summary: The RPC interface needs to handle an entire warp or wavefront at once. This is currently done by using a compile time constant indicating the size of the buffer, which right now defaults to some value on the client (GPU) side. However, there are currently attempts to move the `libc` library to a single IR build. This is problematic as the size of the wave fronts changes between ISAs on AMDGPU. The builitin `__builtin_amdgcn_wavefrontsize()` will return the appropriate value, but it is only known at runtime now. In order to support this, this patch restructures the packet. Now instead of having an array of arrays, we simply have a large array of buffers and slice it according to the runtime value if we don't know it ahead of time. This also somewhat has the advantage of making the buffer contiguous within a page now that the header has been moved out of it.

This PR adds a new attribute to carry over the information from `launch_bounds`. The new attribute `CUDALaunchBoundsAttr` holds 2 to 3 integer attrinbutes and is added to `func.func` operation.

Summary: The GPU `nanosleep` tests would occasionally fail. This was due to the fact that we used integer division to determine how many ticks we had to sleep for. This would then truncate, leaving us with a value just slightly below the requested value. This would then occasionally leave us with a return value of `-1`. This patch just changes the code to round up by 1 so we always sleep for at least the requested value.

…r and i8 fixed vector. (#76548) Instead of only handling vscale x 16 x i1 predicate vectors, handle any scalable i1 vector where the known minimum is divisible by 8. This is used on RISC-V where we have multiple sizes of predicate types.

Added in 26670dc to workaround intel#4885. Windows CI and a local Windows build are happy with this change, so it seems like this has been properly fixed at some point. If this does break somebody, this can be easily reverted. (Also, Linux does the same `#define alloca` in system headers, so I'm not sure why it'd be different on Windows) This is tech debt that caused breakages, see comments on #71709.

This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.

This PR adds a new attribute to carry over the information from `cluster_dims`. The new attribute `CUDAClusterDimsAttr` holds 3 integer attributes and is added to `func.func` operation.

Summary: Recent patches have added solutions to the remaining sources of divergence. This patch simply removes the last occures of things like `has_builtin`, `ifdef` or builtins with feature requirements. The one exception here is `nanosleep`, but I made changes in the `__nvvm_reflect` pass to make usage like this actually work at O0. Depends on llvm/llvm-project#81331

CONFLICT (content): Merge conflict in clang/include/clang/Serialization/ASTBitCodes.h

This PR adds two LLVM intrinsics to MLIR: - llvm.amdgcn.s.setprio which sets the priority of a wave for the GPU scheduler - llvm.amdgcn.sched.barrier which sets a software barrier so that the scheduler cannot move instructions around

Summary: I forgot to remove these because I thought I did it already. This caused the build to fail when actually linked.

…name given a pgo name. (#81547) - Also update the `InstrProf::addFuncWithName` to call the newly added `getCanonicalName`.

…1, x)` for multi-use We previously did this iff the inner `(shl/lshr -1, x)` was one-use. No instructions are added even if the inner `(shl/lshr -1, x)` is multi-use and this canonicalization both makes the resulting instruction easier to analyze and shrinks its dependency chain. Closes #81576

Reverts llvm/llvm-project#81534 llvm/llvm-project#81534 breaks building (Fuchsia) Clang toolchain on Windows. Log: https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8756186536543250705/+/u/clang/install/stdout Builder: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8756186536543250705/overview ``` FAILED: tools/clang/tools/extra/clang-include-fixer/tool/CMakeFiles/clang-include-fixer.dir/ClangIncludeFixer.cpp.obj C:\b\s\w\ir\x\w\cipd\bin\clang-cl.exe /nologo -TP -DCLANG_REPOSITORY_STRING=\"https://llvm.googlesource.com/llvm-project\" -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\b\s\w\ir\x\w\llvm_build\tools\clang\tools\extra\clang-include-fixer\tool -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang\include -IC:\b\s\w\ir\x\w\llvm_build\tools\clang\include -IC:\b\s\w\ir\x\w\recipe_cleanup\tensorflow-venv\store\python_venv-q9i5kpsp0iun0ktmqgab125ti8\contents\Lib\site-packages\tensorflow\include -IC:\b\s\w\ir\x\w\llvm_build\include -IC:\b\s\w\ir\x\w\llvm-llvm-project\llvm\include -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\.. -imsvcC:\b\s\w\ir\x\w\zlib_install_target\include -imsvcC:\b\s\w\ir\x\w\zstd_install\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:__cplusplus /Oi /Brepro /bigobj /permissive- /W4 -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported /Gw -no-canonical-prefixes /O2 /Ob2 -std:c++17 -MT /EHs-c- /GR- -UNDEBUG /showIncludes /Fotools\clang\tools\extra\clang-include-fixer\tool\CMakeFiles\clang-include-fixer.dir\ClangIncludeFixer.cpp.obj /Fdtools\clang\tools\extra\clang-include-fixer\tool\CMakeFiles\clang-include-fixer.dir\ -c -- C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\ClangIncludeFixer.cpp In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\ClangIncludeFixer.cpp:11: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\..\IncludeFixer.h:15: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Sema/ExternalSemaSource.h:15: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/ExternalASTSource.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/DeclBase.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/DeclarationName.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/IdentifierTable.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h:63: C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(151,1): error: redefinition of enumerator 'BI_alloca' 151 | LANGBUILTIN(_alloca, "v*z", "n", ALL_MS_LANGUAGES) | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(15,54): note: expanded from macro 'LANGBUILTIN' 15 | # define LANGBUILTIN(ID, TYPE, ATTRS, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS) | ^ C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h(62,34): note: expanded from macro 'BUILTIN' 62 | #define BUILTIN(ID, TYPE, ATTRS) BI##ID, | ^ <scratch space>(72,1): note: expanded from here 72 | BI_alloca | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(150,1): note: previous definition is here 150 | LIBBUILTIN(alloca, "v*z", "fn", STDLIB_H, ALL_GNU_LANGUAGES) | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(11,61): note: expanded from macro 'LIBBUILTIN' 11 | # define LIBBUILTIN(ID, TYPE, ATTRS, HEADER, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS) | ^ C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h(62,34): note: expanded from macro 'BUILTIN' 62 | #define BUILTIN(ID, TYPE, ATTRS) BI##ID, | ^ <scratch space>(71,1): note: expanded from here 71 | BI_alloca | ^ ```

… value (#81635) Prevents isel errors when trying to lower gc relocate of undef value (which turns into CopyToReg of TargetConstant). Such relocates may occur after DCE (e.g. after GVN removes some dead blocks) if there are not passes like instcombine scheduled after to clean them up. Fixes #80294 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

This completes the unrevert of ef38833.

This CMakeLists.txt is used to build modules without build system support. This was removed in d06ae33. This is used in the documentation how to use modules. Made some minor changes to make it work with the std.compat module using the std module. Note the CMakeLists.txt in the build dir should be removed once build system support is generally available.

…0918) The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper. That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames. And counting all the stack frames can be expensive.

…ectParsed (#70734) This allows you to specify options and arguments and their definitions and then have lldb handle the completions, help, etc. in the same way that lldb does for its parsed commands internally. This feature has some design considerations as well as the code, so I've also set up an RFC, but I did this one first and will put the RFC address in here once I've pushed it... Note, the lldb "ParsedCommand interface" doesn't actually do all the work that it should. For instance, saying the type of an option that has a completer doesn't automatically hook up the completer, and ditto for argument values. We also do almost no work to verify that the arguments match their definition, or do auto-completion for them. This patch allows you to make a command that's bug-for-bug compatible with built-in ones, but I didn't want to stall it on getting the auto-command checking to work all the way correctly. As an overall design note, my primary goal here was to make an interface that worked well in the script language. For that I needed, for instance, to have a property-based way to get all the option values that were specified. It was much more convenient to do that by making a fairly bare-bones C interface to define the options and arguments of a command, and set their values, and then wrap that in a Python class (installed along with the other bits of the lldb python module) which you can then derive from to make your new command. This approach will also make it easier to experiment. See the file test_commands.py in the test case for examples of how this works.

This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>

…dards (#81587)

…onstants (#73056) In this case, a trivial GEP chain has the form: ``` %ptr = getelementptr sameType, %base, constant %val = getelementptr sameType, %ptr, %variable ``` That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain. Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).

…specifier in all language modes (#80171) According to [dcl.type.elab] p4: > If an _elaborated-type-specifier_ appears with the `friend` specifier as an entire _member-declaration_, the _member-declaration_ shall have one of the following forms: > `friend` _class-key_ _nested-name-specifier_(opt) _identifier_ `;` > `friend` _class-key_ _simple-template-id_ `;` > `friend` _class-key_ _nested-name-specifier_ `template`(opt) _simple-template-id_ `;` Notably absent from this list is the `enum` form of an _elaborated-type-specifier_ "`enum` _nested-name-specifier_(opt) _identifier_", which appears to be intentional per the resolution of CWG2363. Most major implementations accept these declarations, so the diagnostic is a pedantic warning across all C++ versions. In addition to the trivial cases previously diagnosed in C++98, we now diagnose cases where the _elaborated-type-specifier_ has a dependent _nested-name-specifier_: ``` template<typename T> struct A { enum class E; }; struct B { template<typename T> friend enum A<T>::E; // pedantic warning: elaborated enumeration type cannot be a friend }; template<typename T> struct C { friend enum T::E; // pedantic warning: elaborated enumeration type cannot be a friend }; ```

This patch fixes: mlir/lib/Target/LLVMIR/AttrKindDetail.h:65:1: error: unused function 'getAttrNameToKindMapping' [-Werror,-Wunused-function]

Change-Id: I7ced7774c80997d21969ab7886fc30c0c1e1cc81

…mbers swapped for big-endian (#79188) The direct lock data structure has bit `0` (the least significant bit) of the first 32-bit word set to `1` to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit `0` of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (`low word`, `high word`). However, this causes problems for big-endian where the memory layout of a 64-bit address is (`high word`, `low word`). If an address of the indirect lock table entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it is treated as a direct lock. This patch defines `struct kmp_base_tas_lock` with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member `poll`, the second word (the low word) is used. This patch also changes places where `poll` is not already explicitly specified for checking/setting tags.

…81655) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR #73176.

The if and the else above this both return so this is unreachable. Delete it and remove the else after return.

… (#81634) This will allow DyadicFloat class to replace NormalFloat class.

why it's crashing on the x86_64 Debian Linux worker.

These are formats supported by PyTorch sparse, so good to make sure that our assemble instructions work on these.

The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue llvm/llvm-project#81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".

…query (#79932) This commit changes DebugNamesDWARFIndex so that it now overrides `GetFullyQualifiedType` and attempts to use DW_IDX_parent, when available, to speed up such queries. When this type of information is not available, the base-class implementation is used. With this commit, we now achieve the 4x speedups reported in [1]. [1]: https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/38

As explained on discourse [0] (comment 12), to get the non-intrinsic form of debug-info records enabled and testing, we're only using it inside of the pass manager in LLVM right now. Things like the textual IR writer and bitcode writing _passes_ are instrumented to convert back to intrinsic-form when writing a module out, but it turns out we missed the ThinLTO bitcode writing pass. That causes uh, all variable location debug-info to be dropped in ThinLTO mode (oops). This patch adds that conversion; it should be low risk as it's identical to what happens in all the other passes. However should this commit turn out to cause trouble, please instead revert d759618 or whichever is the most recent commit to set UseNewDbgInfoFormat to default to true. That'll revert LLVM back to the definitely-correct behaviour. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

…parate constants (#73056)" and follow ups "ninja check-llvm" is failing on tip of tree. This reverts commit ec0aa16. This reverts commit 1b65742.

This implements functionality to handle `DataBreakpointInfo` request and `SetDataBreakpoints` request. If variablesReference is 0 or not provided, interpret name as ${number of bytes}@${expression} to set data breakpoint at the given expression because the spec https://microsoft.github.io/debug-adapter-protocol/specification#Requests_DataBreakpointInfo doesn't say how the client could specify the number of bytes to watch. This is based on top of llvm/llvm-project#80753.

CONFLICT (content): Merge conflict in llvm/test/CodeGen/RISCV/O3-pipeline.ll Also revert 5c9f768. See:KhronosGroup/SPIRV-LLVM-Translator#2357 intel#12698

`DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in llvm/llvm-project@99d60e0, because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because `catchswitch`s are removed in ISel and `catchswitch` BBs are removed as well, so they can't have other instructions. But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or `cleanuppad` BBs don't need to be demoted. That was the reason `DemoteCatchSwitchPHIOnly` option was added, in order not to demote more instructions unnecessarily. The problem is it should have been set to `true` for Wasm EH. (Its default value is `false` for WinEH) And I mistakenly set it to `false` and wasn't aware about this for more than 5 years. This was not the end of the world; it just means we've been demoting more instructions than we should, possibly huting code size. In practice I think it would've had hardly any effect in real performance given that the occurrence of PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many people run other optimizers like Binaryen anyway.

…info by default"" This reverts commit d759618. Causes crashes, see comments in llvm/llvm-project@d759618.

…cros according to ISO/IEC TR 18037:2008 standard, and add fixed point type support detection. (#81255) Fixed point extension standard: https://standards.iso.org/ittf/PubliclyAvailableStandards/c051126_ISO_IEC_TR_18037_2008.zip

Fixes: 7b08b43

…t (#79945) This removes the dependency LLDB API tests have on lldb/third_party/Python/module/unittest2, and instead uses the standard one provided by Python. This does not actually remove the vendored dep yet, nor update the docs. I'll do both those once this sticks. Non-trivial changes to call out: - expected failures (i.e. "bugnumber") don't have a reason anymore, so those params were removed - `assertItemsEqual` is now called `assertCountEqual` - When a test is marked xfail, our copy of unittest2 considers failures during teardown to be OK, but modern unittest does not. See TestThreadLocal.py. (Very likely could be a real bug/leak). - Our copy of unittest2 was patched to print all test results, even ones that don't happen, e.g. `(5 passes, 0 failures, 1 errors, 0 skipped, ...)`, but standard unittest prints a terser message that omits test result types that didn't happen, e.g. `OK (skipped=1)`. Our lit integration parses this stderr and needs to be updated w/ that expectation. I tested this w/ `ninja check-lldb-api` on Linux. There's a good chance non-Linux tests have similar quirks, but I'm not able to uncover those.

Register the LLVM IR translation interface for FIR to avoid warnings about "Unhandled parameter attribute" after #78228.

…347) Covers cases where DeclRefExpr referring to a const-size array decays to a pointer and is used "as a pointer" (e. g. passed to a pointer type parameter). Since std::array<T, N> doesn't implicitly convert to pointer to its element type T* the cast needs to be done explicitly as part of the fixit when we retrofit std::array to code that previously worked with constant size array. std::array::data() method is used for the explicit cast. In terms of the fixit machine this covers the UPC(DRE) case for Array fixit strategy. The emitted fixit inserts call to std::array::data() method similarly to analogous fixit for Span strategy.

…. (#80371) The attribute is now allowed on an assortment of declarations, to suppress warnings related to declarations themselves, or all warnings in the lexical scope of the declaration. I don't necessarily see a reason to have a list at all, but it does look as if some of those more niche items aren't properly supported by the compiler itself so let's maintain a short safe list for now. The initial implementation raised a question whether the attribute should apply to lexical declaration context vs. "actual" declaration context. I'm using "lexical" here because it results in less warnings suppressed, which is the conservative behavior: we can always expand it later if we think this is wrong, without breaking any existing code. I also think that this is the correct behavior that we will probably never want to change, given that the user typically desires to keep the suppressions as localized as possible.

…ave-restore/Zcmp (#81392) PEI previously used fake frame indices for these callee saved registers. These fake frame indices are not register with MachineFrameInfo. This required them to be deleted form CalleeSavedInfo after PEI to avoid breaking later passes. See #79535 Unfortunately, removing the registers from CalleeSavedInfo pessimizes Interprocedural Register Allocation. The RegUsageInfoCollector pass runs after PEI and uses CalleeSavedInfo. This patch replaces #79535 by properly creating fixed stack objects through MachineFrameInfo. This changes the stack size and offsets returned by MachineFrameInfo which requires changes to how RISCVFrameLowering uses that information. In addition to the individual object for each register, I've also create a single large fixed object that covers the entire stack area covered by cm.push or the libcalls. cm.push must always push a multiple of 16 bytes and the save restore libcall pushes a multiple of stack align. I think this leaves holes in the stack where we could spill other registers, but it matches what we did previously. Maybe we can optimize this in the future. The only test changes are due to stack alignment handling after the callee save registers. Since we now have the fixed objects, on the stack the offset is non-zero when an aligned object is processed so the offset gets rounded up, increasing the stack size. I suspect we might need some more updates for RVV related code. There is very little or maybe even no testing of RVV mixed with Zcmp and save-restore.

…(#81639) Fold gc.relocate of undef and null to undef and null respectively. Similar transform is currently done by instcombine, but there is no reason to not include it here as well.

The missing trailing comma confuses the sync script.

Removes audit TODO

The Linux std has more asserts enabled by default, so it complained, even though this worked on Darwin...

In an LTO build, we don't set the ELF attributes to indicate what extensions were compiled with. The target CPU/Attrs in RISCVTargetMachine do not get set for an LTO build. Each function gets a target-cpu/feature attribute, but this isn't usable to set ELF attributs since we wouldn't know what function to use. We can't just once since it might have been compiler with an attribute likes target_verson. This patch adds the ISA as Module metadata so we can retrieve it in the backend. Individual translation units can still be compiled with different strings so we need to collect the unique set when Modules are merged. The backend will need to combine the unique ISA strings to produce a single value for the ELF attributes. This will be done in a separate patch.

Otherwise it breaks some environment like X64 Android that doesn't have f128 functions available in its libc. Followup to #79611.

…… (#81682) …ct LevelType from LevelFormat and properties instead. **Rationale** We used to explicitly declare every possible combination between `LevelFormat` and `LevelProperties`, and it now becomes difficult to scale as more properties/level formats are going to be introduced.

When the parsed command python code is run on 3.9, I get: File ".../lib/python3.9/site-packages/lldb/plugins/parsed_cmd.py", line 124, in translate_value return cls.translators[value_type](value) TypeError: 'staticmethod' object is not callable But this works correctly in Python 3.10 on macOS and Linux. I'm guessing something changed between those versions, and I'll have to do something to work around the difference. But I'm going to skip the test on 3.9 while I figure that out.

…parate constants (#81671) Actually update tests w.r.t llvm/llvm-project@9e5a77f and reland llvm/llvm-project#73056

… (#81550) This PR adds functionality to use shared memory optimization as an op using transform dialect.

CONFLICT (content): Merge conflict in llvm/lib/IR/BasicBlock.cpp

Python3.9 does not allow you to put a reference to a class staticmethod in a table and call it from there. Python3.10 and following do allow this, but we still support 3.9. staticmethod was slightly cleaner, but this will do.

This provides a simple way to implement exponential backoff using a do while loop. Usage example (also see the change to LockFileManager.cpp): ``` ExponentialBackoff Backoff(10s); do { if (tryToDoSomething()) return ItWorked; } while (Backoff.waitForNextAttempt()); return Timeout; ``` Abstracting this out of `LockFileManager` as the module build daemon will need it.

…81571) This introduces a basic outline of installapi as a clang driver option. It captures relevant information as cc1 args, which are common arguments already passed to the linker to encode into TBD file outputs. This is effectively an upstream for what already exists as `tapi installapi` in Xcode toolchains, but directly in Clang. This patch does not handle any AST traversing on input yet. InstallAPI is broadly an operation that takes a series of header files that represent a single dynamic library and generates a TBD file out of it which represents all the linkable symbols and necessary attributes for statically linking in clients. It is the linkable object in all Apple SDKs and when building dylibs in Xcode. `clang -installapi` also will support verification where it compares all the information recorded for the TBD files against the already built binary, to catch possible mismatches like when a declaration is missing a definition for an exported symbol.

…ml and tests. (#80924) Adds support to obj2yaml for PGO Analysis Map. Adds a test to both obj2yaml and yaml2obj.

Fixes CI.

Recently we enabled building the shim for arm64_32 arch. On this arch, sizeof(uptr) == sizeof(unsigned long) == 4 - so this assert will fail in runtime. Need to just remove this assert rdar://122927166 Co-authored-by: Mariusz Borsa <m_borsa@apple.com>

…#80848) Fixes iree-org/iree#16317

…CI (#81694) Use a slightly more idiomatic way of getting vscale. getVScale performs additional constant folding, but I presume computeKnownBits also catches these cases too.

…rOps.cpp (NFC)

…pp (NFC)

…nsformOps.cpp (NFC)

This reverts commit 32e65b0. It seems to break some PowerPC bots. See llvm/llvm-project#81390 (comment).

Fixes #81569.

… (#81705) The getAlign function for a load returns the commonAlignment of the "base align" and the offset stored in the MachinePointerInfo. We're splitting a load here, so we should take the base alignment from the original load without any offset that may already exist in the original load. The new load can then maintain its own alignment using just the base alignment and its own offset. Noticed by inspection.

…#81707) We already have the PtrOff factored into MachinePointerInfo. Any calls to getAlign on the new load with do commonAlignment with the MachinePointerInfo offset and the base alignment.

This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ```

…NFC. (#81704) This patch moves the `isSignBitCheck` helper into ValueTracking to reuse the logic in ValueTracking/InstSimplify. Addresses the comment llvm/llvm-project#80740 (comment).

This is only a code reformatting and rename of variables to the newer format.

…(#78299) Some fixed vector tests in test/CodeGen/RISCV/rvv have multiple run lines that check various configurations of -riscv-v-fixed-length-vector-lmul-max. From what I understand this flag was introduced in the early days of fixed length vector support, but now that fixed vector codegen has matured I'm not sure if it's as relevant today. This patch proposes to remove the various lmul-max run lines from the tests to make them more readable, and any changes to fixed vector codegen easier to review. We have removed them before for the same reason, so this would take care of the remaining test cases: https://reviews.llvm.org/D157973#4593268 (I don't have any strong motivation to remove the actual flag itself, my own personal motivation is just to clean up the tests)

This packs; * `BitmapBytes` * `BitmapMap` * `CondIDMap` into `MCDC::State`.

…712) This patch removes unnecessary `m_c_*` matchers since we always canonicalize `commutive_op Cst, X` into `commutive_op X, Cst`. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=bfc0b7c6891896ee8e9818f22800472510093864&to=d27b058bb9acaa43d3cadbf3cd889e8f79e5c634&stat=instructions:u

…(#81217)

There is nothing in these instruction definitions that depends on wave size so testing both seems like overkill. The corresponding assembler tests do not do it.

If a store is dominated by a condition that ensures that the value being stored in a memory location is already present at that memory location, consider the store a noop. Fixes #63419

This was accidentally split with a comment

…nnot (#81142)" This reverts commit 38c706e. This workflow always fails in cases where it needs to create a comment, due to a permissions issue, see the discussion at: https://discourse.llvm.org/t/rfc-fyi-pull-request-greetings-for-new-contributors/75458/20

… across op_values(). NFC. Fixes static analysis warning.

Add test coverage for types wider than legal

This makes sure the correct flags are used for the clone (i.e. the ones present on the recipe), instead of the ones on the original IR instruction. At the moment, this should not change anything, as flags of replicate recipe should not be dropped before they are cloned at the moment. But that will change in a follow-up patch.

In preparation for ARM64EC support.

… (#81720) This allows to * check if a given ir.Type is a floating point type via isinstance() or issubclass() * get the bitwidth of a floating point type See motivation and discussion in https://discourse.llvm.org/t/add-floattype-to-mlir-python-bindings/76959.

As commented on the PR #81293, the Ampere1-family does not have test cases for the common fusion cases it implements. This adds the Ampere1 targets to the relevant misched-fusion testcases: * addadrp * addr * aes

Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.

…erInfo` (#81542) This patch expands notion of "interesting" in `IdentifierInto` it to also cover ObjC keywords and builtins, which matches notion of "interesting" in serialization layer. What was previously "interesting" in `IdentifierInto` is now called "notable". Beyond clearing confusion between serialization and the rest of the compiler, it also resolved a naming problem: ObjC keywords, notable identifiers, and builtin IDs are all stored in the same bit-field. Now we can use "interesting" to name it and its corresponding type, instead of `ObjCKeywordOrInterestingOrBuiltin` abomination.

Which is causing CI checks to fail. clang/docs/LanguageExtensions.rst:2794:takes no arguments and produces an unsigned long long result. The builtin does clang/docs/LanguageExtensions.rst:2795:not guarantee any particular frequency, only that it is stable. Knowledge of the + echo '*** Trailing whitespace has been found in Clang source files as described above ***'

@test

This patch improves `computeKnownFPClass` by using context-sensitive information from `DomConditionCache`. The motivation of this patch is to optimize the following case found in [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the sign bit of %value is always zero. Then the fabs can be eliminated. This pattern also exists in cpython/duckdb/oiio/openexr. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f82e0809ba12170e2f648f8a1ac01e78ef06c958&to=041218bf5491996edd828cc15b3aec5a59ddc636&stat=instructions:u |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang| |--|--|--|--|--|--|--| |-0.00%|+0.01%|+0.00%|-0.03%|+0.00%|+0.00%|+0.02%|

…8_t` constants (#81267) Adds user defined literal to construct unsigned integer constants. This is useful when constructing constants for non native C++ types like `__uint128_t` or our custom `BigInt` type.

These are unnecessary since C++17.

This is something that is already done as a special case for copysign, this patch extends it to be more generally applied. If we are trying to matrialize a negative constant (notably -0.0, 0x80000000), then there may be no movi encoding that creates the immediate, but a fneg(movi) might. Some of the existing patterns for RADDHN needed to be adjusted to keep them in line with the new immediates.

Expose the API for constructing and inspecting StructTypes from the LLVM dialect. Separate constructor methods are used instead of overloads for better readability, similarly to IntegerType.

We previously would diagnose them as a GNU extension in C mode, but they are now a feature of C23. The -Wgnu-binary-literal warning group no longer controls any diagnostics as this is no longer a GNU extension. The warning group is retained as a noop to help avoid "unknown warning" diagnostics. This also adds the companion compatibility warning which existed for C++ but not for C. Fixes llvm/llvm-project#72017

Its 0th element corresponds to `FalseID` and 1st to `TrueID`. CoverageMappingGen.cpp: `DecisionIDPair` is replaced with `ConditionIDs`

The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.

The Ampere1B core is enabled with a new scheduling/pipeline model, as it provides significant updates over the Ampere1 core; it reduces latencies on many instructions, has some micro-ops reassigned between the XY and X units, and provides modelling for the instructions added since Ampere1 and Ampere1A. As this is the first model implementing the CSSC instructions, we update the UnsupportedFeatures on all other models (that have CompleteModel set). Testcases are added under llvm-mca: these showed the FullFP16 feature missing, so we are adding it in as part of this commit. This *adds tests and additional fixes* compared to the reverted #81338.

…e (#81737) Fix crash mentioned in comments on d759618. The assertion being hit was complaining that we had dangling DPValues; the DPValues attached to the terminator of StartBlock become dangling after the terminator is erased, and they're never "flushed" back onto the new terminator once it's added. Doing that makes the crash go away, but doesn't replicate existing dbg.* behaviour. See the comment in the patch. This change both fixes the crash (because there are now no DPValues left on the terminator to dangle) and replicates existing behaviour (moves those DPValues down to the new block).

We have this special case in getSource() and getRange(), but we were missing it in getExpr() and getLocation().

Using multiclasses for the Real instruction definitions has a couple of benefits: - It avoids repeating information that was already specified when defining the corresponding pseudo, like the row and done bits. - It allows commoning up the Real definitions for architectures which are mostly the same, like GFX11 and GFX12.

This change adds SM 6.2 availability annotation to 16-bit APIs (16-bit types require SM 6.2), and adds Doxygen API documentation.

…on (#81236) Context: Conversion patterns provide a `ConversionPatternRewriter` to modify the IR. `ConversionPatternRewriter` provides the public API. Most function calls are forwarded/handled by `ConversionPatternRewriterImpl`. The dialect conversion uses the listener infrastructure to get notified about op/block insertions. In the current design, `ConversionPatternRewriter` inherits from both `PatternRewriter` and `Listener`. The conversion rewriter registers itself as a listener. This is problematic because listener functions such as `notifyOperationInserted` are now part of the public API and can be called from conversion patterns; that would bring the dialect conversion into an inconsistent state. With this commit, `ConversionPatternRewriter` no longer inherits from `Listener`. Instead `ConversionPatternRewriterImpl` inherits from `Listener`. This removes the problematic public API and also simplifies the code a bit: block/op insertion notifications were previously forwarded to the `ConversionPatternRewriterImpl`. This is no longer needed.

When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836

Throughout the rewrite process, the dialect conversion maintains a list of "block actions" that can be rolled back upon failure. This commit encapsulates the existing block actions into separate classes, making it easier to add additional actions in the future. This commit also renames "block actions" to "IR rewrites". In a subsequent commit, an "operation rewrite" class that allows rolling back movements of single operations is added. This is to support `moveOpBefore` in the dialect conversion. Rewrites have two methods: `commit()` commits an action. It can no longer be rolled back afterwards. `rollback()` undoes a rewrite. It can no longer be committed afterwards.

…default" This reapplies commit bdde5f9 by undoing the revert fd3a0c1. The previous reapplication d759618 was reverted due to a crash (reproducer in comments for d759618) which was fixed in #81737. As noted in the original commit, this commit may break downstream tests. If this commit is breaking your downstream tests, please see comment 12 in [0], which documents the kind of variation in tests we'd expect to see from this change and what to do about it. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

…on (#81240) Add a new rewrite class for "operation movements". This rewrite class can roll back `moveOpBefore` and `moveOpAfter`. `RewriterBase::moveOpBefore` and `RewriterBase::moveOpAfter` is no longer virtual. (The dialect conversion can gather all required information for rollbacks from listener notifications.)

We only support building llvmlibc with modern compilers. https://libc.llvm.org/compiler_support.html#minimum-supported-versions All versions of the these compilers support these builtins; GCC does not support the short variants.

`countl_zero(~x)` *is* `countl_one(x)`

With the new SystemZ port we noticed that -pie executables generated from files containing R_390_TLS_IEENT relocations will have unnecessary relocations in their GOT: 9e8d8: R_390_TLS_TPOFF *ABS*+0x18 This is caused by the config->isPic conditon in addTpOffsetGotEntry: static void addTpOffsetGotEntry(Symbol &sym) { in.got->addEntry(sym); uint64_t off = sym.getGotOffset(); if (!sym.isPreemptible && !config->isPic) { in.got->addConstant({R_TPREL, target->symbolicRel, off, 0, &sym}); return; } It is correct that we need to retain a TPOFF relocation if the target symbol is preemptible or if we're building a shared library. But when building a -pie executable, those values are fixed at link time and there's no need for any remaining dynamic relocation. Note that the equivalent MIPS-specific code in MipsGotSection::build checks for config->shared instead of config->isPic; we should use the same check here. (Note also that on many other platforms we're not even using addTpOffsetGotEntry in this case as an IE->LE relaxation is applied before; we don't have this type of relaxation on SystemZ.)

…nts. (#81746)

…f explicit restoration (#79303) To fix long compile time issue of Schedule optimizer, patch #77280 sets the upper cap on max ISL operations. In case of bailing out when ISL quota is hit, error handling behavior was restored manually. This commit replaces the restoration code with IslMaxOperationsGuard helper and also removes redundant early return.

…t constants." (#81771) Reverts llvm/llvm-project#81746

This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves it to the one inside `CreateBitCast`. It seems too conservative for the use case here.

The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in ``` frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8 frame #1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52 frame #2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384 frame #3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92 frame #4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112 frame #5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148 ``` there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062

…formOps.cpp (NFC)

…p (NFC)

…sorRuntime.cpp (NFC)

…ce.cpp (NFC)

…1312) If we have a long chain of vslide1down instructions to build e.g. a <16 x i8> from scalar, we end up with a critical path going through the entire chain. We can instead build two halves, and then combine them with a vselect. This costs one additional temporary register, but reduces the critical path by roughly half. To avoid needing to change VL, we fill each half with undefs for the elements which will come from the other half. The vselect will at worst become a vmerge, but is often folded back into the final instruction of the sequence building the lower half. A couple notes on the heuristic here: * This is restricted to LMUL1 to avoid quadratic costing reasoning. * This only splits once. In future work, we can explore recursive splitting here, but I'm a bit worried about register pressure and thus decided to be conservative. It also happens to be "enough" at the default zvl of 128. * "8" is picked somewhat arbitrarily as being "long". In practice, our build_vector codegen for 2 defined elements in a VL=4 vector appears to need some work. 4 defined elements in a VL=8 vector seems to generally produce reasonable results. * Halves may not be an optimal split point. I went down the rabit hole of trying to find the optimal one, and decided it wasn't worth the effort to start with. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>

…lue (#79901) I was refactoring something else but ran into this function. It was somewhat confusing to read through and understand, but it boils down to two steps: - First we try `OptionArgParser::ToBoolean`. If that works, then we're good to go. - Second, we try `llvm::to_integer` to see if it's an integer. If it parses to 0 or 1, we're good. - Failing either of the steps above means we cannot parse it into a bool. Instead of having an integer out param and a bool return value, the interface is better served with an optional<bool> -- Either it parses into true or false, or you get back nothing (nullopt).

…cloning (#81693) We recently implemented a new option allowing relinking of bitcode modules via the "-mllvm -relink-builtin-bitcode-postop" option. This implementation relied on llvm::CloneModule() in order to pass copies to modules and preserve the original modules for later relinking. However, cloning modules has been found to be prohibitively expensive, significantly increasing compilation time for large bitcode libraries. In this patch, we shift the relink option implementation to instead link the original modules initially, and reload modules from the file system if relinking is requested. This approach results in significantly reduced overhead. We accomplish this by creating a new ReloadModules() routine that can be called from a BackendConsumer class, to mimic the behavior of ASTConsumer's loadLinkModules(), but without access to the CompilerInstance. Because loading the bitcodes from the filesystem requires access to the FileManager class, we also forward a reference to the CompilerInstance class to the BackendConsumer. This mirrors what is already done for several CompilerInstance members, such as TargetOptions and CodeGenOptions. Finally, we needed to add a const specifier to the FileManager::getBufferForFile() routine to allow it to be called using the const reference returned from CompilerInstance::getFileManager()

This PR adds queries to both nodes and modifiable graphs which enable better mixed usage of both the explicit and record & replay APIs in a single program. It also reworks how subgraphs are handled: previously nodes were merged into the modifiable graph, but this would pose a problem for users querying the graph since they would not see a single subgraph node, and this merging behaviour was an implementation detail. This has been changed so that now subgraph nodes are only merged in the executable graph, and are stored as a single node of type `subgraph` in the modifiable graph. As a consequence of this change all nodes are now also copied when making the executable graph, where previously they were not. - Reworked how subgraphs are handled - Add graph and node queries to the SYCL-Graph spec - Implement graph and node queries - New node_type enum - Explicit nodes now also have associated events (fixes mixed usage issue) - New tests for queries - Update ABI symbols

…ntel#12712) Set `IsNewDbgInfoFormat` to the default value for functions created in the SYCL Kernel Fusion pipeline. This prepares `sycl-fusion` for migration to the new debug info format. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>

oneapi-src/unified-runtime#1343 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…l#12075) This PR adds joint matrix query for CUDA and HIP backends as described in [sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc#query-interface) --------- Co-authored-by: Konrad Kusiak <konradk@login01.chn>

…the test-e2e build for CUDA and HIP backends. (intel#12606) Include the necessary environment paths during the test-e2e build for `CUDA` and `HIP` backends. The absence of the added path leads to the inability to locate libdevice for specific architectures, resulting in a failure. Below is the reported error when expected `CUDA_PATH` is missing ` clang++: error: cannot find libdevice for `sm_50`; provide path to different `CUDA` installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice `

-shared flag is a clang/linux option. On Windows we need to be cognizant of possibly using MSVC compatible driver (e.g. icx) Needs `/clang` passthrough when using non MSVC options

…intel#12682) This commit does a partial revert of intel#12396. This is to avoid an issue where the new friend operators wouldn't accept the arguments as l-value references. --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

…intel#12722) atomic_update() for USM and ACC N=16,32 were lowered to SVM/DWORD atomic intrinsics even though the HW instructions on Gen12 supported only N up to 8 for USM and up to 16 for ACC. GPU had legalization pass for N that split longer vectors to smaller and available in HW. That GPU optimization/legalization workes incorrectly for USM as it splits longer vectors assuming instruction is available for N=16 in case of USM, which is not correct. The patch here implements splitting of N=16 and N=32 cases for atomic_update(usm, ...) to N=8 vectors until GPU fixes the legalization for USM atomic_update. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

1> Add code in CodeGenAction.cpp Basic change add new field "const FileManager &FileMgr" Add new function ReloadModules Code change in function LinkInModules. 2> revert "[DebugInfo][RemoveDIs] Turn on non-instrinsic debug-info by default. CONFLICT (modify/delete): clang/lib/CodeGen/BackendConsumer.h deleted in HEAD and modified in 6d4ffbd. Version 6d4ffbd of clang/lib/CodeGen/BackendConsumer.h left in tree. CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenAction.cpp CONFLICT (modify/delete): clang/lib/CodeGen/LinkInModulesPass.cpp deleted in HEAD and modified in 6d4ffbd. Version 6d4ffbd of clang/lib/CodeGen/LinkInModulesPass.cpp left in tree.

Spec: KhronosGroup/SPIRV-Registry#192 Original commit: KhronosGroup/SPIRV-LLVM-Translator@fc9896b1fff0057

The Headers for this extension were published so we should use them instead: KhronosGroup/SPIRV-Headers@a8af2ce Original commit: KhronosGroup/SPIRV-LLVM-Translator@95d70a9ab4077ed

OpenCL and NonSemantic DebugInfo specifications are flexible in terms of allowing any debug information be replaced with DebugInfoNone, so various of SPIR-V producers follow that and generate it for base types of several debug instructions, leaving SPIR-V consumers to handle this. By default the translator replaces missing debug info with tag: null, which is in most cases correct. Yet, there are situations, where it's not allowed by both LLVM and DWARF, for example for DW_TAG_array_type DWARF spec sets, that DW_AT_type attribute is mandatory. For such cases new transNonNullDebugType wrapper function was added to the translator, generating "DIBasicType(tag: DW_TAG_unspecified_type, name: "SPIRV unknown type")" where DebugInfoNone was used as the type. This function doesn't replace all calls to transDebugInst<DIType> as there are cases, where we can generate null type, for example DWARF doesn't require it for DW_TAG_typedef, hence I'm not changing translation flow in this case. Additionally to this, while DWARF requires type attribute for DW_TAG_pointer_type, LLVM does not, hence I'm not changing translation flow in this case as well. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@ec023805a0ce26f

It should have tested DebugInfoNone base type Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@e0aef72fee42e0a

Small fix but yields around 30% speedup for translation SPIR-V to IR. Original commit: KhronosGroup/SPIRV-LLVM-Translator@513b9578d310282

There was an assumption, that ptr.annotation encoding buffer_location should be used by load or store instructions. But there is no such restriction in the specification. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@7a37ea920f730e0

For now just convert BB with convertFromNewDbgValues, will figure out something smarter a bit later. I've updated several tests with dbg.declare intrinsic adding --experimental-debuginfo-iterators=1 to check if it works. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@0e87aefecf7c500

The SPIR-V Specification allows `OpConstantNull` types to be scalar or vector booleans, integers, or floats. Update an assert for this and add a SPIR-V -> LLVM IR test. Original commit: KhronosGroup/SPIRV-LLVM-Translator@9ec969c1c379bde

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@262395da9234fe4

@AlexeySachkov

…supported (intel#12700) Final PR in the series of intel#12636. Refer to it for a description. After a discussion with @AlexeySachkov we've decided its best to not rewrite USM and syclcompat tests with buffers/accessors. For USM, the reason is obvious and for syclcompat you can reach out to Alexey. Therefore, these tests are handled using if statements or requring aspect to be supported. Once this PR is merged, the behavior of malloc_shared will be to throw if the usm_shared_allocations is not supported which is conformant with the spec.

…#12730) This is a generalization of the existing workarounds: https://github.com/intel/llvm/blob/sycl/sycl/plugins/unified_runtime/CMakeLists.txt#L40-L54 etc.

…s.txt (intel#12714) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0 to resolve identified security vulnerability in 3rd party dependency. Refer to [cryptography's changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst).

LLVM: llvm/llvm-project@16e7d68 SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@262395da9234fe4

despite having a unit test for default context, realized there is not one to affirm the new default configuration.

Some clean-up for SYCL-Graph E2E tests: * Remove redundant `Event` variables that are initialized over loop iterations but never used. * Remove all instances of the no immediate command-list property, and use environment variable instead to test both paths. * Always use FileCheck leak checking rather than `CHECK-NOT: Leak`. * Remove unnecessary threading code from `Inputs/basic_usm.cpp`

Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com

… time properties (intel#12675)

…2680) Improves management of inter-partition dependencies, so that only required dependencies are added. As removing these dependencies can results in multiple executions paths, we have added a map to track all events returned from submitted partitions. All these events are linked to the main event returned to user. Adds tests.

Grad flag was set to 0x3 (meaning Lod + Bias) instead of 0x4. See https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Image_Operands Signed-off-by: Victor Lomuller <victor@codeplay.com>

Bring the fix for MaxRegsPerBlock check from oneapi-src/unified-runtime#1299 to `intel/llvm`. No changes needed other than updating the UR repo hash. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

`LoaderConfig` is created and stored in a local pointer and never released when done using, causing it to be leaked. This patch releases the `LoaderConfig` when finished using it.

Old builtins implementation is going to be removed in the next ABI breaking window and that helper is only used there.

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2. <details> <summary>Changelog</summary> Sourced from <a href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's changelog</a>. <blockquote> 42.0.2 - 2024-01-30 <pre><code> * Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 3.2.1. * Fixed an issue that prevented the use of Python buffer protocol objects in ``sign`` and ``verify`` methods on asymmetric keys. * Fixed an issue with incorrect keyword-argument naming with ``EllipticCurvePrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePrivateKey.exchange`, ``X25519PrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.x25519.X25519PrivateKey.exchange`, ``X448PrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.x448.X448PrivateKey.exchange`, and ``DHPrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.dh.DHPrivateKey.exchange`. .. _v42-0-1: 42.0.1 - 2024-01-24 </code></pre> <ul> <li>Fixed an issue with incorrect keyword-argument naming with <code>EllipticCurvePrivateKey</code> :meth:<code>~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePrivateKey.sign</code>.</li> <li>Resolved compatibility issue with loading certain RSA public keys in :func:<code>~cryptography.hazmat.primitives.serialization.load_pem_public_key</code>.</li> </ul> .. _v42-0-0: </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pyca/cryptography/commit/2202123b50de1b8788f909a3e5afe350c56ad81e"><code>2202123</code></a> changelog and version bump 42.0.2 (<a href="https://redirect.github.com/pyca/cryptography/issues/10268">#10268</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/f7032bdd409838f67fc2b93343f897fb5f397d80"><code>f7032bd</code></a> bump openssl in CI (<a href="https://redirect.github.com/pyca/cryptography/issues/10298">#10298</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10299">#10299</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/002e886f16d8857151c09b11dc86b35f2ac9aec3"><code>002e886</code></a> Fixes <a href="https://redirect.github.com/pyca/cryptography/issues/10294">#10294</a> -- correct accidental change to exchange kwarg (<a href="https://redirect.github.com/pyca/cryptography/issues/10295">#10295</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10296">#10296</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/92fa9f2f606caea5d499c825e832be5bac6f0c23"><code>92fa9f2</code></a> support bytes-like consistently across our asym sign/verify APIs (<a href="https://redirect.github.com/pyca/cryptography/issues/10260">#10260</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/1">#1</a>...</li> <li><a href="https://github.com/pyca/cryptography/commit/6478f7e28be54b51931277235de01b249ceabd96"><code>6478f7e</code></a> explicitly support bytes-like for signature/data in RSA sign/verify (<a href="https://redirect.github.com/pyca/cryptography/issues/10259">#10259</a>) ...</li> <li><a href="https://github.com/pyca/cryptography/commit/4bb8596ae02d95bb054dbcf55e8771379dbe0c19"><code>4bb8596</code></a> fix the release script (<a href="https://redirect.github.com/pyca/cryptography/issues/10233">#10233</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10254">#10254</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/337437dc2e62772bde4ad5544f4b1db9ee7572d9"><code>337437d</code></a> 42.0.1 bump (<a href="https://redirect.github.com/pyca/cryptography/issues/10252">#10252</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/56255de6b2d1a2d2e502b0275231ca81907f33f1"><code>56255de</code></a> allow SPKI RSA keys to be parsed even if they have an incorrect delimiter (<a href="https://redirect.github.com/pyca/cryptography/issues/1">#1</a>...</li> <li><a href="https://github.com/pyca/cryptography/commit/12f038b38af76e36efe8cef09597010c97647e8f"><code>12f038b</code></a> fixes <a href="https://redirect.github.com/pyca/cryptography/issues/10237">#10237</a> -- correct EC sign parameter name (<a href="https://redirect.github.com/pyca/cryptography/issues/10239">#10239</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10240">#10240</a>)</li> <li>See full diff in <a href="https://github.com/pyca/cryptography/compare/42.0.0...42.0.2">compare view</a></li> </ul> </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cryptography&package-manager=pip&previous-version=42.0.0&new-version=42.0.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com>

…tel#12748) Warnings fixed: - deprecated scatter_rgba - deprecated get_cl_code - deprecated lsc_fence - deprecated uchar type usage - deprecated get_access on HOST - deprecated get_pointer - usage of isfinite with -ffast-math - deprecated dpas_argument_type::s1 - deprecated gpu_selector() Also, the memory alloc/free in historgram*.cpp tests were updated to simplify the potential memory leak avoidance. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>

Scheduled drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

…ed cmd-list Update the design doc. Update the UR tag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicated cmd-list #357

[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicated cmd-list #357

Commits on Feb 13, 2024

Commits on Feb 14, 2024

Commits on Feb 15, 2024

Commits on Feb 16, 2024

Commits on Feb 19, 2024

Commits on Feb 20, 2024