Merge from main with loop wrappers + composite support + liboffload #71

I made a small typo when writing a test for MathExtras.h, sorry!

… tests (llvm#87094) This also adds a few tests that were missing.

…89625) This reverts commit 8b2ba6a. The uild errors (see below) were likely due to the same issue PR llvm#88074 fixed. Addressed by following that PR. https://lab.llvm.org/buildbot/#/builders/165/builds/52789 https://lab.llvm.org/buildbot/#/builders/91/builds/25273

When responding to review comments, `return {}` was accidentally replaced by `std::nullptr` instead of `return std::nullptr`.

…lvm#89263) This implements a RISCV specific version of the SHL_ADD node proposed in llvm#88791. If that lands, the infrastructure from this patch should seamlessly switch over the to generic DAG node. I'm posting this separately because I've run out of useful multiply strength reduction work to do without having a way to represent MUL X, 3/5/9 as a single instruction. The majority of this change is moving two sets of patterns out of tablgen and into the post-legalize combine. The major reason for this is that I have an upcoming change which needs to reuse the expansion logic, but it also helps common up some code between zba and the THeadBa variants. On the test changes, there's a couple major categories: * We chose a different lowering for mul x, 25. The new lowering involves one fewer register and the same critical path, so this seems like a win. * The order of the two multiplies changes in (3,5,9)*(3,5,9) in some cases. I don't believe this matters. * I'm removing the one use restriction on the multiply. This restriction doesn't really make sense to me, and the test changes appear positive.

This PR massively reorganizes the Test dialect's source files. It moves manually-written op hooks into `TestOpDefs.cpp`, moves format custom directive parsers and printers into `TestFormatUtils`, adds missing comment blocks, and moves around where generated source files are included for types, attributes, enums, etc. into their own source file. This will hopefully help navigate the test dialect source code, but also speeds up compile time of the test dialect by putting generated source files into separate compilation units. This also sets up the test dialect to shard its op definitions, done in the next PR.

Fixes -Werror build after 40137ff.

…lvm#88156) A recent patch added an error message for whole optional dummy argument usage as optional arguments (third or later) to MAX and MIN when those names required type conversion, since that conversion only works when the optional arguments are present. This check shouldn't care about character lengths. Make it so.

…lvm#88184) When a symbol is known to be a procedure due to its being referenced as a function or subroutine, improve the error messages that appear if the symbol is also used as an object by attaching the source location of its procedural use. Also, for errors spotted in name resolution due to how a given symbol has been used, don't unconditionally set the symbol's error flag (which is otherwise generally a good idea, to prevent cascades of errors), so that more unrelated errors related to usage will appear.

llvm#88188) …er powers The code that folds exponentiation by an integer power can report a spurious overflow warning because it calculates one last unnecessary square of the base value. 10.**(+/-32) exposes the problem -- the value of 10.**64 is calculated but not needed. Rearrange the implementation to only calculate squares that are necessary. Fixes llvm#88151.

Addresses llvm#85984 Signed-off-by: Troy-Butler <squintik@outlook.com> Co-authored-by: Troy-Butler <squintik@outlook.com>

…ilt (llvm#89164) Currently, `check-gwp_asan` is added no matter its dependencies are built or not, this is wrong and will cause cmake error when scudo is not built. This patch includes the target in the dependencies check.

Constant folding had a CHECK on array subscript rank that should more gracefully handle a bad program with a subscript that is a matrix or higher rank. Fixes llvm#88112.

When a statement function in a nested scope has a name that clashes with a name that exists in the host scope, the compiler can handle it correctly (with a portability warning)... unless the host scope acquired the name via USE association. Fix. Fixes llvm#88678.

This replaces the old macros LIBC_COPT_TEST_USE_FUCHSIA and LIBC_COPT_TEST_USE_PIGWEED with LIBC_COPT_TEST_ZXTEST and LIBC_COPT_TEST_GTEST, respectively. These are really not about whether the code is in the Fuchsia build or in the Pigweed build, but just about what test framework is being used. The gtest framework can be used in many contexts, and the zxtest framework is not always what's used in the Fuchsia build. The test/UnitTest/Test.h wrapper header now provides the macro LIBC_TEST_HAS_MATCHERS() for use in `#if` conditionals on use of gmock-style matchers, replacing `#if` conditionals that test the framework selection macros directly.

…vm#89429) When the characteristics of a procedure depend on a procedure that hasn't yet been defined, the compiler currently emits an unconditional error message. This includes the case of a procedure whose characteristics depend, perhaps indirectly, on itself. However, in the case where the characteristics of a procedure are needed to resolve a generic, we should not emit an error for a hitherto undefined procedure -- either the call will resolve to another specific procedure, in which case the error is spurious, or it won't, and then an error will issue anyway. Fixes llvm#88677.

Rewrite `LLVM_PARALLEL_{}_JOBS` and `LLVM_RAM_PER_{}_JOB` documentation.

The standard defines C_LOC as being PURE (actually SIMPLE now in F'2023); characterize it appropriately. Fixes llvm#88747.

The intrinsic function OUT_OF_RANGE() lacks support in lowering and the runtime. This patch obviates a need for any such support by implementing OUT_OF_RANGE() via rewriting in semantics. This rewriting of OUT_OF_RANGE() calls replaces the existing code that folds OUT_OF_RANGE() calls with constant arguments. Some changes and fixes were necessary outside of OUT_OF_RANGE()'s folding code (now rewriting code), whose testing exposed some other issues worth fixing. - The common::RealDetails<> template class was recoded in terms of a new base class with a constexpr constructor, so that the the characteristics of the various REAL kinds could be queried dynamically as well. This affected some client usage. - There were bugs in the code that folds TRANSFER() when the type of X or MOLD was REAL(10) -- this is a type that occupies 16 bytes per element in execution memory but only 10 bytes (was 12) in the data of std::vector<Scalar<>> in a Constant<>. - Folds of REAL->REAL conversions weren't preserving infinities.

Trying to address the build failure on the `clang-ve-ninja`bot, which appears hard to repro locally. The target isn't needed currently (there are unit tests exercising the new functionality). Removing it for now to green-ify the build bot.

It has been using isZeroValue(), which is for floats, not integers.

…Control. Updates ExecutionSession to use the ExecutorProcessControl object's TaskDispatcher rather than having a separate dispatch function. This gives the TaskDispatcher a global view of all tasks to be executed, and provides a single point to wait on for tasks to complete when shutting down the JIT.

A recent patch had three declared but unused variables in it, triggering a warning in some build bots. Remove them.

…rProcessControl." This reverts commit 6094b3b. Multiple bots are broken.

This patch introduces HWASan memaccess intrinsics that assume a fixed shadow (with the offset provided by --hwasan-mapping-offset=...), with and without short granule support. The behavior of HWASan is not meaningfully changed by this patch; future work ("Optimize outlined memaccess for fixed shadow on Aarch64": llvm#88544) will make HWASan use these intrinsics. We currently only support lowering the LLVM IR intrinsic to AArch64. The test case is adapted from hwasan-check-memaccess.ll.

…e tests to fail on multiple bots. (llvm#89689) Update the check lines added in llvm#87247 after 14e6f63 updated the output causing the tests to fail. This should hopefully unbreak the bots failing due to these two tests failing.

…ties (llvm#89119) Made the createReadOrMaskedRead and isValidMaskedInputVector utility functions - to be accessible outside of the CU. Needed by the IREE new TopK implementation.

…mbine (llvm#89263)" This reverts commit 5a7c80c. Noticed failures with the following command: $ llc -mtriple=riscv64 -mattr=+m,+xtheadba -verify-machineinstrs < test/CodeGen/RISCV/rv64zba.ll I think I know the cause and will likely reland with a fix tomorrow.

…ith fix. This re-applies 6094b3b, which was reverted in a28557a due to broken bots. As far as I can tell all failures were due to a missing #include <deque>, which has been adedd in this commit.

…m#88829) This patch adds the clang portion of an AIX-specific option to inform the compiler that it can use a faster access sequence for the local-dynamic TLS model (formally named aix-small-local-dynamic-tls). This patch mainly references Amy's work on small local-exec TLS support.

llvm#73393 introduced a mandatory column field. Update test for that.

…d..." This reverts commit 1effa19 while I investigate the test failure at https://lab.llvm.org/buildbot/#/builders/285/builds/888.

Addresses issue llvm#87243. The current code incorrectly checks the validity of ```obj``` twice when it should be checking the new ```str_obj``` pointer. Signed-off-by: Troy-Butler <squintik@outlook.com> Co-authored-by: Troy-Butler <squintik@outlook.com>

Move the one method that uses it out of line. This is primarily to reduce the number of files to rebuild when changing PatternMatch.h.

This patch fixes: third-party/unittest/googletest/include/gtest/gtest.h:1379:11: error: comparison of integers of different signs: 'const int' and 'const unsigned long' [-Werror,-Wsign-compare]

These functions have been deprecated since: commit 5ac1295 Author: Kazu Hirata <kazu@google.com> Date: Sun Dec 17 15:52:50 2023 -0800

…VExtension tblgen information. (llvm#89335) Instead of using RISCVISAInfo's extension information, use the extension found in tblgen after llvm#89326. We still need to use RISCVISAInfo code to get the sorting rules for the ISA string. The ISA string we generate now is not quite the same extension we had before. No implied extensions are included in the generate string unless they are explicitly listed in RISCVProcessors.td. This primarily affects Zicsr being implied by F, V implying Zve*, and Zvl*b implying a smaller Zvl*b. All of these implication should be picked up when the string is used by the frontend. The benefit is that we get a more manageable ISA string for humans to deal with. This is a step towards generating RISCVISAInfo's extension list from tblgen.

When speculating a store based on a preceding load/store, we need to ensure that the speculated store does not have a higher alignment (which might only be guaranteed by the branch condition). There are various ways in which this could be strengthened (we could get or enforce the alignment), but for now just do the simple check against the preceding load/store. Fixes llvm#89672.

…89041) No reason for this to not be one. This gets rid of a few const_casts.

Both calls to parseVTypeToken were proceeded by check for an Identifier token and a call to getIdentifier. Sync those into the parseVTypeToken to reduce repetition.

…'uint32_t' This patch tries to use DeclID in the code bases to avoid use the raw type 'uint32_t'. It is problematic to use the raw type 'uint32_t' if we want to change the type of DeclID some day.

…vm#89348) Currently, when inferring noundef, we only check that the return value is not undef/poison. However, we fail to account for the possibility that a poison-generating return attribute will convert the value to poison, and then violate the noundef attribute, resulting in immediate UB. For the relevant return attributes (align, nonnull and range), check whether we can trivially re-prove the relevant property, otherwise do not infer noundef. This fixes the FunctionAttrs side of llvm#88026.

As the title suggested.

The record table has a constant key length, so we don't need to serialize or deserialize it for every key-data pair. Omitting the key length saves 0.06% of the indexed MemProf file size. Note that it's OK to change the format because Version2 is still under development.

This commit changes `OpBuilder::tryFold` to behave more similarly to `Operation::fold`. Concretely, this ensures that even an in-place fold returns `success`. This is necessary to fix a bug in the dialect conversion that occurred when an in-place folding made an operation legal. The dialect conversion infrastructure did not check if the result of an in-place folding legalized the operation and just went ahead and tried to apply pattern anyways. The added test contains a simplified version of a breakage we observed downstream.

…h fixes (llvm#89596) I reverted llvm#89213 beause it was causing buildbots to fail with assertion failures. Embarrassingly, it turns out I had been running tests locally in `Release` mode, i.e. with `assert()` compiled away. This PR re-lands llvm#89213 with fixes for the failing assertions.

…lvm#89075) This adds patterns to convert from the Linalg matmul and batch_matmul ops to the transposed variants. By default the LHS matrix is transposed. Our work enabling a lowering path from linalg.matmul to ArmSME has revealed the current lowering results in non-contiguous memory accesses for the A matrix and very poor performance. These patterns provide a simple option to fix this.

…m ASTRecordReader.h As the title suggested.

The attribute name "HLSLSemantics" is confusing, because semantics aren't always the annotation that are applied to specific variables. The name for this attribute needs to be less specific. This PR changes the attribute name from HLSLSemantic to HLSLAnnotation, and changes the associated function and variable names to support this conceptual change. The HLSLAnnotation attribute will never be output in ast-dump due to it being parsed for the attribute that it represents. There is no functional change, so there are no accompanying tests.

…5605) Attribute `optnone` must turn off all optimizations including fast-math ones. Actually AST nodes in the 'optnone' function still had fast-math flags. This change implements fixing FP options before function body is parsed.

User functions may be declared with an interface that is a specific intrinsic. In such case, there is no result type available from the procedure symbol (at least without using evaluate::Probe), and FunctionRef::GetType() returned nullopt. This caused lowering to crash. The result type of specific intrinsic procedures is always a lengthless intrinsic type, so it is fully defined in the template argument of FunctionRef. Use it.

Add some more details about how calls are lowered and what APIs are available.

This adds - `emitc.global` and `emitc.get_global` ops to model global variables similar to how `memref.global` and `memref.get_global` work. - translation of those ops to C++ - lowering of `memref.global` and `memref.get_global` into those ops --------- Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>

… in symbol graphs (llvm#89277) rdar://125622225

Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 4 overloaded variants under this name) was handling element regions in a highly inefficient manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would produce 2^N calls. This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. Fixes llvm#89045

Add test coverage for additional cases not covered by current tests with multiple inductions and truncates.

…fs (llvm#89640) If -mllvm -add-linkage-names-to-external-call-origins is true then add DW_AT_linkage_name attributes to DW_TAG_subprogram DIEs referenced by DW_AT_call_origin attributes that would otherwise be omitted. A debugger may use DW_TAG_call_origin attributes to determine whether any frames in a callstack are missing due to optimisations (e.g. tail calls). For example, say a() calls b() tail-calls c(), and you stop in your debugger in c(): The callstack looks like this: c() a() Looking "up" from c(), call site information can be found in a(). This includes a DW_AT_call_origin referencing b()'s subprogram DIE, which means the call at this call site was to b(), not c() where we are currently stopped. This indicates b()'s frame has been lost due to optimisation (or is misleading due to ICF). This patch makes it easier for a debugger to check whether the referenced DIE describes the target function or not, for example by comparing the referenced function name to the current frame. There's already an option to apply DW_AT_linkage_name in a targeted manner: -dwarf-linkage-names=Abstract, which limits adding DW_AT_linkage_names to abstract subprogram DIEs (this is default for SCE tuning). The new flag shouldn't affect non-SCE-tuned behaviour whether it is enabled or not because the non-SCE-tuned behaviour is to always add linkage names to subprogram DIEs.

…lvm#89563) The `LP64 eqv:` should say that the equivalent is `AUTH_ABS64` rather than `ABS64` when trying to emit an AUTH absolute reloc with ILP32.

…lvm#88492) Multivalue feature of WebAssembly has been standardized for several years now. I think it makes sense to be able to enable it in the feature section by default for our clang/llvm-produced binaries so that the multivalue feature can be used as necessary when necessary within our toolchain and also when running other optimizers (e.g. wasm-opt) after the LLVM code generation. But some WebAssembly toolchains, such as Emscripten, do not provide both mulvalue-returning and not-multivalue-returning versions of libraries. Also allowing the uses of multivalue in the features section does not necessarily mean we generate them whenever we can to the fullest, which is a different code generation / optimization option. So this makes the lowering of multivalue returns conditional on the use of 'experimental-mv' target ABI. This ABI is turned off by default and turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to `clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`. But the purpose of this PR is not tying the multivalue lowering to this specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI we currently have, and it is still experimental, meaning it is not very well optimized or tuned for performance. (e.g. it does not have the limitation of the max number of multivalue-lowered values, which can be detrimental to performance.) We may change the name of this ABI, or improve it, or add a new multivalue ABI in the future. Also I heard that WASI is planning to add their multivalue ABI soon. So the plan is, whenever any one of multivalue ABIs is enabled, we enable the lowering of multivalue returns in the backend. We currently have only 'experimental-mv' in the repo so we only check for that in this PR. Related past discussions: llvm#82714 WebAssembly/tool-conventions#223 (comment)

Previously, the LocalDeclID and GlobalDeclID are defined as: ``` using LocalDeclID = DeclID; using GlobalDeclID = DeclID; ``` This is more or less concerning that we may misuse LocalDeclID and GlobalDeclID without understanding it. There is also a FIXME saying this. This patch tries to turn LocalDeclID into a class to improve the type safety here.

…8916) Currently we check `Subtarget->hasReferenceTypes()` to decide whether to run `RefTypeMem2Local` pass: https://github.com/llvm/llvm-project/blob/6133878227efc30355c02c2f089e06ce58231a3d/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L491-L495 This works fine when `-mattr=+reference-types` is given in the command line (of `llc` or of `wasm-ld` in case of LTO). This also works fine if the backend is called by Clang, because Clang's feature set will be passed to the backend when creating a `TargetMachine`: https://github.com/llvm/llvm-project/blob/ac791888bbbe58651e597cf7a4b2276424b77a92/clang/lib/CodeGen/BackendUtil.cpp#L549-L550 https://github.com/llvm/llvm-project/blob/ac791888bbbe58651e597cf7a4b2276424b77a92/clang/lib/CodeGen/BackendUtil.cpp#L561-L562 But if the backend compilation is called by `llc`, a `TargetMachine` is created here: https://github.com/llvm/llvm-project/blob/bf1ad1d267b1f911cb9846403d2c3d3250a40870/llvm/tools/llc/llc.cpp#L554-L555 And if the backend is called by `wasm-ld`'s LTO, a `TargetMachine` is created here: https://github.com/llvm/llvm-project/blob/ac791888bbbe58651e597cf7a4b2276424b77a92/llvm/lib/LTO/LTOBackend.cpp#L513 At this point, in the both places, the created `TargetMachine` only has access to target features given by the command line with `-mattr=` and doesn't have access to bitcode functions' `target-features` attribute. We later gather the target features used by functions and store that info in the `TargetMachine` in `CoalesceFeaturesAndStripAtomics`, https://github.com/llvm/llvm-project/blob/ac791888bbbe58651e597cf7a4b2276424b77a92/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L202-L206 but this runs in the pass pipeline driven by the pass manager, so this has not run by the time we check `Subtarget->hasReferenceTypes()` in `WebAssemblyPassConfig::addISelPrepare`. So currently `RefTypeMem2Local` would not run on those functions with `"target-features"="+reference-types"` attributes if the backend is called by `llc` or `wasm-ld`. So this makes `RefTypeMem2Local` pass run unconditionally, and checks `target-featurs` function attribute to decide whether to run the pass on each function. This allows the pass to run with `wasm-ld` + LTO and `llc`, even if `-mattr=+reference-types` is not explicitly given in the command line again, as long as `+reference-types` is in the function's `target-features` attribute. This also covers the case we give the target features by the command line like `llc -mattr=+reference-types` and not in the bitcode function's attribute, because attributes given in the command line will be stored in the function's attributes anyway: https://github.com/llvm/llvm-project/blob/bd28889732e14ac6baca686c3ec99a82fc9cd89d/llvm/lib/CodeGen/CommandFlags.cpp#L673-L674 https://github.com/llvm/llvm-project/blob/bd28889732e14ac6baca686c3ec99a82fc9cd89d/llvm/lib/CodeGen/CommandFlags.cpp#L732-L733 With this PR, - `lto0.test_externref_emjs` - `thinlto0.test_externref_emjs`, - `lto0.test_externref_emjs_dynlink`, - `thinlto0.test_externref_emjs_dynlnk` pass. These currently fail but don't get checked in the CI. I think they used to pass but started to fail after llvm#83196, because we used to run mem2reg even with `-O0` before that. (`ltoN` (N > 0) tests are not affected because they run mem2reg anyway so they don't need `RefTypeMem2Local`)

llvm#75960 added a bazel rule for generating enums for the async dialects, but there are no enums defined, and no cmake rule for that. Delete this rule.

Reverts llvm#85528. This was committed without tests, despite reviewers requesting tests to be added. The post-commit discussion leans towards revert, which would be consistent with the policy.

…lvm#89527)" Breaks on EXPENSIVE_CHECKS builds which still use the static ReadKeyDataLength implementation in several locations

I misunderstood what is the function looking up

Both arrays and trivial scalars are supported. Both cases must use by-ref reductions because both are boxed. My understanding of the standards are that OpenMP says that this should follow the rules of the intrinsic reduction operators in fortran, and fortran says that unallocated allocatable variables can only be referenced to allocate them or test if they are already allocated. Therefore we do not need a null pointer check in the combiner region.

…lvm#88856) This makes it possible to specify `--@llvm-project//mlir:enable_cuda=true` on the bazel command line and get a build that includes NVIDIA GPU support in MLIR.

@ftynse

…9717) More targeted than a blanket "apply everywhere" pattern. Follow up to llvm#89075 to address @ftynse's feedback.

Succsessor of b8e3b2a. This patch also converts the type alias GlobalDeclID to a class to improve the readability and type safety.

This change will only affect MLIR integration tests to be run on AArch64. When originally introduced, these tests would run with `lli`. Those tests has since been updated to use `mlir-cpu-runner` instead, see e.g.: * https://reviews.llvm.org/D155405 * https://reviews.llvm.org/D146917 This patch removes all the leftover `lli` configuration in LIT that's currently not needed (and is unlikely to be needed any time soon).

…huffles that would be better separate On AVX+ targets a broadcast load can be treated as free.

Broadcast shuffles can be free is fed from a one-use load

…e-use load AVX1+ can handle 32/64-bit broadcast loads, AVX2+ can handle all broadcast loads (we should be able to improve isLegalBroadcastLoad to handle more of this type matching).

…vm#89072) The original code has an invalid use of UZP1 because the result vector type does not match its input vector types. Rather than insert extra nop casts I figure it would be better to use CONCAT_VECTORS because that's the operation we're performing. NOTE: This is a step to enable more asserts in verifyTargetSDNode.

…llvm#89050) Existing sub-ranges are correctly updated because new IMPLICIT_DEF is added, but there is missing sub-range for IMPLICIT_DEF itself. Because of missing sub-range in live-intervals for IMPLICIT_DEF, register allocator does not know that IMPLICIT_DEF rewrites its virtual sub-registers and can end up assigning overlapping physical registers to them. This results in deleting instructions that were defined by sub-registers overwritten by IMPLICIT_DEF as they are now dead.

Add similar isel patterns for lt, gt and hi comparison types.

Results of icmp don't need extending after truncating their operands, as the result will always be i1. Skip them during extending. Fixes llvm#79742 Fixes llvm#85185

…huffleCost calls. Ensure the getShuffleCost arguments/instruction args are populated - minor extension to llvm#88743 to help improve shuffle costs for certain corner cases (e.g. shuffles of loads)

…lvm#88282)

…m#85060) Some options take the maximum unsigned integer value as default, but they are being dumped to a string as integers. This makes -dump-config write invalid '-1' values for these options. This change fixes this issue by using utostr if the option is unsigned. Fixes llvm#60217

Coverity (a static analysis tool) reported that the emitted 'Features' variable inside emitComputeAvailableFeatures in TableGen might be unitialized. Silence this warning by adding brackets for the default initialization. Adapt test cases to take additional brackets into account.

@src

This patch simplifies `fcmp (select Cond, C1, C2), C3` patterns in ceres: Alive2: https://alive2.llvm.org/ce/z/fWh_sD ``` define i1 @src(double %x) { %cmp1 = fcmp ord double %x, 0.000000e+00 %sel = select i1 %cmp1, double 0xFFFFFFFFFFFFFFFF, double 0.000000e+00 %cmp2 = fcmp oeq double %sel, 0.000000e+00 ret i1 %cmp2 } define i1 @tgt(double %x) { %cmp1 = fcmp uno double %x, 0.000000e+00 ret i1 %cmp1 } ```

Adding test case related to llvm#89060 It shows that after argument copy elison the scheduler may reorder a load of the input argument and a store to the same fixed stack entry (the fixed stack entry that is reused for the local variable).

…lvm#89712) This is a fix for miscompiles reported in llvm#89060 After argument copy elison the IR value for the eliminated alloca is aliasing with the fixed stack object. This patch is making sure that we mark the fixed stack object as being aliased with IR values to avoid that for example schedulers are reordering accesses to the fixed stack object. This could otherwise happen when there is a mix of MemOperands refering the shared fixed stack slow via both the IR value for the elided alloca, and via a fixed stack pseudo source value (as would be the case when lowering the arguments).

llvm#89608) …lastprivate OpenMP 5.2 standard (Section 5.3) defines privatization for list items. Section 3.2.1 in the standard defines list items to exclude variables that are part of other variables. This patch adds the restriction to firstprivate and lastprivates, it was previously added for privates. Fixes llvm#67227 Note: The specific checks that are added here are explicitly called out in OpenMP 4.0 (https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf) Sections 2.14.3.4 and 2.14.3.5 but in later standards have become implicit through other definitions.

…OfShiftedLogic DAGCombiner is trying to fold shl over binops, and in the process combining it with another shl. However it needs to be more careful to ensure that the sum of the shift counts fits in the type used for the shift amount. For example, X86 is using i8 as shift amount type. So we need to make sure that the sum of the shift amounts isn't greater than 255. Fix will be applied in a later commit. This only pre-commits the test case to show that we currently get the wrong result. Bug was found when testing the C23 BitInt feature.

…89616) Ensure that the sum of the shift amounts does not overflow the shift amount type when combining shifts in combineShiftOfShiftedLogic. Solves a miscompile bug found when testing the C23 BitInt feature. Targets like X86 that only use an i8 for shift amounts after legalization seems to be extra susceptible for bugs like this as it isn't legal to shift more than 255 steps.

…Node. NFC. Also just get the value type from the SDValue instead of passing it separately.

…of a declaration (llvm#89494) Since [6163aa9](llvm@6163aa9#diff-3a7ef0bff7d2b73b4100de636f09ea68b72eda191b39c8091a6a1765d917c1a2), we have introduced an optimization that almost always destroys TemplateIdAnnotations at the end of a function declaration. This doesn't always work properly: a lambda within a default template argument could also result in such deallocation and hence a use-after-free bug while building a type constraint on the template parameter. This patch adds another flag to the parser to tell apart cases when we shouldn't do such cleanups eagerly. A bit complicated as it is, this retains the optimization on a highly templated function with lots of generic lambdas. Note the test doesn't always trigger a conspicuous bug/crash even with a debug build. But a sanitizer build can detect them, I believe. Fixes llvm#67235 Fixes llvm#89127

Ignore incoming values with constant false masks when trying to simplify VPBlendRecipes. As a follow-on optimization, we should also be able to drop all incoming values with false masks by creating a new VPBlendRecipe with those operands dropped. PR: llvm#89384

This will save later code from commuting it.

) Implement helper functions to identify leaf, composite, and combined constructs.

llvm#78295 dropped private headers in top level directory from libcxx.imp. This PR re-adds them to libcxx.imp.

…or (llvm#89735) We are almost ready to enable the use of debug records everywhere in LLVM by default; part of the prep-work for this means ensuring that every tool supports them. Every tool in the `llvm/` project supports them, front-ends that use the `DIBuilder` will support them, and as far as I can tell, the only other tool in the LLVM repo that needs to support them but doesn't is `mlir-translate`. This patch trivially unblocks them by converting from debug records to debug intrinsics before translating a module.

As well as flipping the sense of the bit, GFX12 moved it from bit 0 to bit 1 in the encoded simm16 operand.

No need to try to vectorize single gather/buildvector with alternate opcode graph, it is not profitable. In other cases, need to use last instruction for inserting the vectorized code.

…orrect size (llvm#83124)" (llvm#89036) When in-place new-ing a local variable of an array of trivial type, the generated code calls 'memset' with the correct size of the array, earlier it was generating size (squared of the typedef array + size). The cause: typedef TYPE TArray[8]; TArray x; The type of declarator is Tarray[8] and in SemaExprCXX.cpp::BuildCXXNew we check if it's of typedef and of constant size then we get the original type and it works fine for non-dependent cases. But in case of template we do TreeTransform.h:TransformCXXNEWExpr and there we again check the allocated type which is TArray[8] and it stays that way, so ArraySize=(Tarray[8] type, alloc Tarray[8*type]) so the squared size allocation. ArraySize gets calculated earlier in TreeTransform.h so that if(!ArraySize) condition was failing. fix: I changed that condition to if(ArraySize). fixes llvm#41441 --------- Co-authored-by: erichkeane <ekeane@nvidia.com>

This change adds the z/OS personality function to the list of known EH personality functions. It enables removing of the EH data/labels if the personality function is not invoked.

Fixes llvm#87394. PR: llvm#89160

llvm#89148) This patch finalizes the std::ranges::range_adaptor_closure class template from https://wg21.link/P2387R3. // [range.adaptor.object], range adaptor objects template<class D> requires is_class_v<D> && same_as<D, remove_cv_t<D>> class range_adaptor_closure { }; The current implementation of __range_adaptor_closure was introduced in ee44dd8 and has served as the foundation for the range adaptors in libc++ for a while. This patch keeps its implementation, with the exception of the following changes: - __range_adaptor_closure now includes the missing constraints `is_class_v<D> && same_as<D, remove_cv_t<D>>` to restrict the type of class that can inherit from it. (https://eel.is/c++draft/ranges.syn) - The operator| of __range_adaptor_closure no longer requires its first argument to model viewable_range. (https://eel.is/c++draft/range.adaptor.object#1) - The _RangeAdaptorClosure concept is refined to exclude cases where T models range or where T has base classes of type range_adaptor_closure<U> for another type U. (https://eel.is/c++draft/range.adaptor.object#2)

This commit implements runtime verification for LinalgStructuredOps using the existing `RuntimeVerifiableOpInterface`. The verification checks that the runtime sizes of the operands match the runtime sizes inferred by composing the loop ranges with the op's indexing maps.

…ries (llvm#89642) For ASan, users already manually have to pass in the path to the lib, and for other libraries they have to pass in the path to the libpath. With LLVM's unreliable name of the lib (due to LLVM_ENABLE_PER_TARGET_RUNTIME_DIR confusion and whatnot), it's useful to be able to opt in to just explicitly passing the paths to the libs everywhere. Follow-up of sorts to https://reviews.llvm.org/D65543, and to llvm#87866.

…ombine (llvm#89263)" Changes since original commit: * Rebase over improved test coverage for theadba * Revert change to use TargetConstant as it appears to prevent the uimm2 clause from matching in the XTheadBa patterns. * Fix an order of operands bug in the THeadBa pattern visible in the new test coverage. Original commit message follows: This implements a RISCV specific version of the SHL_ADD node proposed in llvm#88791. If that lands, the infrastructure from this patch should seamlessly switch over the to generic DAG node. I'm posting this separately because I've run out of useful multiply strength reduction work to do without having a way to represent MUL X, 3/5/9 as a single instruction. The majority of this change is moving two sets of patterns out of tablgen and into the post-legalize combine. The major reason for this is that I have an upcoming change which needs to reuse the expansion logic, but it also helps common up some code between zba and the THeadBa variants. On the test changes, there's a couple major categories: * We chose a different lowering for mul x, 25. The new lowering involves one fewer register and the same critical path, so this seems like a win. * The order of the two multiplies changes in (3,5,9)*(3,5,9) in some cases. I don't believe this matters. * I'm removing the one use restriction on the multiply. This restriction doesn't really make sense to me, and the test changes appear positive.

The comment is misleading because `propertiesAttr` is not actually ignored when the operation isn't unregistered.

…#89780) Reverts llvm#89342 due to build failure

Complete support for rsqrt.approx with rsqrt.approx.f64 ([PTX ISA 9.7.3.17. Floating Point Instructions: rsqrt.approx.ftz.f64](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-rsqrt-approx-ftz-f64)). Additionally, add support for folding `sqrt` into `rsqrt`, with an optional flag to disable.

Reduce diffs in llvm#88899

… Legalizer (llvm#88469) It does not make sense to scalarize G_FREEZE as it leads to the generation of pairs of G_UNMERGE_VALUES and G_BUILD_VECTORs which are difficult to optimize especially when operations like G_TRUNC operate before G_FREEZE but after G_UNMERGE_VALUES. Instead, it is better to legalize G_FREEZE like any other vector type would be, as it gets lowered to a COPY during instruction selection anyways. This is an issue that was encountered when looking at the TSVC benchmark, where the legalization of G_FREEZE would cause generation of unnecessary MOVs that adversely affected the performance.

…d test to use FDIV Use of FDIV allows us to show a definite cost improvement with llvm#88899

In case the first element of a zip/uzp mask is undef, the isZIPMask and isUZPMask functions have a 50% chance of picking the wrong "WhichResult", meaning they don't match a zip/uzp where they could. This patch alters the matching code to first check for the first non-undef element, to try and get WhichResult correct.

…vm#89660) We increment `NumOfCSPGOFunc` and `NumOfPGOFunc` in `PGOUseFunc::readCounters()` already. We should do the same in `PGOUseFunc::populateCoverage`. https://github.com/llvm/llvm-project/blob/83bc7b57714dc2f6b33c188f2b95a0025468ba51/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1331

Newer version allow `pure`, `elemental` and `recursive` on device subprogram.

) This patch exports the `std::ranges::range_adaptor_closure` class template implemented in llvm#89148 from the C++ Modules file.

When trying to express a time before the epoch (e.g. "one nanosecond before 00:01:40 on 1900-01-01") the date would be shown as: 1900-01-01 00:01:39.-00000001 After this patch, that time would be correctly shown as: 1900-01-01 00:01:39.999999999

This patch adds the PFM counter definitions for Intel alderlake CPUs.

The clang-tidy selection has been made automatic recently so this is not longer needed. Thanks to Louis for spotting this.

…llvm#89156) Motivation: LLDB is able to report errors about these scenarios whereas LLVM's DWARF parser only gives a boolean success/fail. I want to migrate LLDB to using LLVM's DWARFUnitHeader class, but I don't want to lose some of the error reporting, so I'm adding it to the LLVM class first.

CMake has landed experimental support for using the Standard modules. This will be part of the CMake 3.30 release. This updates the build instructions to use modules with CMake. The changes have been tested locally. --------- Co-authored-by: Will Hawkins <whh8b@obs.cr>

) This opens up a door for reusing reassociation optimizations on target-specific binary operations with non-standard operand list. This is effectively a NFC.

This PR adds following options to the AddDebugInfo pass. 1. IsOptimized flag. 2. Level of debug info to generate. 3. Name of the source file This enables us to remove the hard coded values from the code. It also allows us to test the pass with different options. The tests have been modified to take advantage of that. The calling convention flag and producer name have also been improved.

Test that ld.lld --debug-names (llvm#86508) built per-module index can be consumed by lldb. This has uncovered a bug during the development of the lld feature.

Resolves llvm#88065 Added macros and functions.

…nerateAwaitSuspendWrapper (llvm#89731) Fixes llvm#89723

…89789) The interesting bit is the zext folding. This is the first case where we end up with a profitable fold of shNadd (zext x), y to shNadd.uw x, y. See zext_mul68 from rv64zba.ll. The test differences are cases where we can legally fold (only because there's no one use check). These are not profitable or harmful, but we can't a oneuse check without breaking the zext_mul68 case. Note that XTHeadBa doesn't appear to have the equivalent patterns so this only shows up in Zba.

This test records the current behavior of HWASan, which doesn't utilize the fixed shadow intrinsics of llvm@365bddf It is intended to be updated in future work ("Optimize outlined memaccess for fixed shadow on Aarch64"; llvm#88544)

This adds a new test fixture class FEnvSafeTest (usable as a base class for other fixtures) that ensures each test doesn't perturb the `fenv_t` state that the next test will start with. It also provides types and methods tests can use to explicitly wrap code under test either to check that it doesn't perturb the state or to save and restore the state around particular test code. All the fenv and math tests are updated to use this so that none can affect another. Expectations that code under test and/or tests themselves don't perturb state can be added later.

Testing with the get_info() returning a local_info revealed some issues in the reverse lookup. This needed an additional quirk. Also the skipping when not in the current continuation optimization was wrong. It prevented merging two sys_info objects.

…dSize (llvm#89824) PortableMemInfoBlock::{serialize,deserialize} take Schema into account, allowing us to serialize/deserialize a subset of the fields. However, PortableMemInfoBlock::serializedSize does not. That is, it assumes that all fields are always serialized and deserialized. In other words, if we choose to serialize/deserialize a subset of the fields, serializedSize would claim more storage than we actually need. This patch fixes the problem by teaching serializedSize to take Schema into account. For now, this patch has no effect on the actual indexed MemProf profile because we serialize/deserialize all fields, but that might change in the future. Aside from check-llvm, I tested this patch by verifying that llvm-profdata generates bit-wise identical files for each version for a large raw MemProf file I have.

) Sergey Malsov has left Intel. I would like to nominate Will Huhn to replace him as an Intel representative in the LLVM security group. Will is a security champion for the Intel compiler team. I believe he will be a valuable addition to the LLVM security group as a second representative from Intel. He has more security-specific expertise than me. I regularly consult with Will about topics the LLVM security group is considering, and it will be useful to have him more directly involved.

llvm#89530) Using `compare` is the next most common roundabout way to express `starts_with` before it was added to the standard. In this case, using `starts_with` is a readability improvement. Extend existing `modernize-use-starts-ends-with` to cover this case. ``` // The following will now be replaced by starts_with(). string.compare(0, strlen("prefix"), "prefix") == 0; string.compare(0, 6, "prefix") == 0; string.compare(0, prefix.length(), prefix) == 0; string.compare(0, prefix.size(), prefix) == 0; ```

Implement base Calling Convention functionality. Implement stack load/store register operations. Implement call lowering.

…y with correct size (llvm#83124)" (llvm#89036)" This reverts commit 74cab54.

This introduces a new file, RISCVISAUtils.cpp and moves the rest of RISCVISAInfo to the TargetParser library. This will allow us to generate part of RISCVISAInfo.cpp using tablegen.

Almost NFC, instrumentation is as correct as it was before. We need InstrumentationList grouped by origin instruction, so we used stable_sort. However these objects already grouped because we never interleave sequences of `insertShadowCheck` of different instrunction. Pointer sort has artifact that it was deppendent on allocator behavior, so we could inserted checks in a different order. There is no test, as I failed to reproduce this with `opt`. My guess is that for reproducer we need to increase fragmentation in the allocator.

…9497)

It follows the interface defined here: riscv-non-isa/rvv-intrinsic-doc#293

Make SymbolFileCTF::ParseFunctions resilient against not being able to resolve the argument or return type of a function. ResolveTypeUID can fail for a variety of reasons so we should always check its result. The type that caused the crash was `_Bool` which we didn't recognize as a basic type. This commit also fixes the underlying issue and adds a test. rdar://126943722

In ELF, relocatable files generated for x86-32 and some code models of x86-64 (medium, large) may reference the special symbol `_GLOBAL_OFFSET_TABLE_` that is not used in the IR. In an LTO link, if there is no regular relocatable file referencing the special symbol, the linker may not define the symbol and lead to a spurious "undefined symbol" error. Fix llvm#61101: record that `_GLOBAL_OFFSET_TABLE_` is used in the IR symbol table. Note: The `PreservedSymbols` mechanism (https://reviews.llvm.org/D112595) that just sets `FB_used` is not applicable. The `getRuntimeLibcallSymbols` for extracting lazy runtime library symbols is for symbols that are "always" potentially used, but linkers don't have the code model information to make a precise decision. Pull Request: llvm#89463

…vm#89818) After llvm#89563, we do not use else after return in code corresponding to `R_AARCH64_AUTH_ABS64` reloc in `getRelocType`. This patch removes use of else after return in other places in `getRelocType`.

So that aligned with other targets.

Swapping the operands of a select is not valid if one hand is more poisonous that the other, because the negation zero contains poison elements. Fix this by adding an extra parameter to isKnownNegation() to forbid poison elements. I've implemented this using manual checks to avoid needing four variants for the NeedsNSW/AllowPoison combinations. Maybe there is a better way to do this... Fixes llvm#89669.

…89701) We're replacing the select with the false value here, but it may be more poisonous if m_Not contains poison elements. Fix this by introducing a m_NotForbidPoison matcher and using it here. Fixes llvm#89500.

…VFeatures.td. There is no implies rule in RISCVISAInfo.cpp so this makes them consistent. Soon RISCVFeatures.td will be used to generate RISCVISAInfo.cpp so it won't be possible to mismatch.

…vm#89706) Fixes llvm#71939.

Zabha and Zacas are both documented as depending on Zaamo. I'm hesitant to make them imply Zaamo instead. So remove the implication and replace with a check that either A or Zaamo is enabled.

These tests break when regenerated due to symbol conflicts.

This patch adds known ptxas versions up to 12.4, to have tests targeting them. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

…vm#89777) We are currently using `PREFIX-DAG` and `PREFIX-NOT` within a single `PREFIX` test in a mixed way, but `-DAG` and `-NOT` do not work that way. For example: Result: ``` 1 2 3 ``` Test file: ```c // CHECK-DAG: 3 // CHECK-DAG: 1 // CHECK-NOT: 2 ``` This does not work. The last line `CHECK-NOT: 2` does not trigger any error, because we've already covered all three lines (1~3) while matching `CHECK-DAG: 3` and `CHECK-DAG: 1`, and FileCheck tries to check the line `CHECK-NOT: 2` _after_ the line `3`. Actually, we have ```c // BLEEDING-EDGE-NOT:#define __wasm_reference_types__ 1{{$}} ``` even though reference-types is enabled in 'bleeding-edge' config, and this has not triggered any error. This section (https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive) explains the interactions between `CHECK-DAG` and `CHECK-NOT`s: > As a result, the surrounding `CHECK-DAG:` directives cannot be reordered, i.e. all occurrences matching `CHECK-DAG:` before `CHECK-NOT:` must not fall behind occurrences matching `CHECK-DAG:` after `CHECK-NOT:`. So in order to test the 'include' lists and 'not-include' lists, we have to run the tests twice with different prefixes. This splits `GENERIC` and `BLEEDING-EDGE` tests in two configs (`***-INCLUDE` and `***`) to test them correctly. This also adds some spaces after colons, sorts the feature lists, and adds `1{{$}}` to the `MVP` tests to make them consistent with `GENERIC` and `BLEEDING-EDGE` tests.

This tidies up `wasm-target-features.c` cosmetically: - Sorts the feature tests alphabetically - Adds a space after colons

…v.masked.strided.store (llvm#89874) According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT is the scalar element VT for strided store and the MemoryVT is the same as the store value's VT for unit-stride store. After combining `riscv.masked.strided.store` to `masked.store`, we just use the scalar element VT to construct `masked.store`, which is wrong. With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store` to truncated `masked.store` because `TLI.canCombineTruncStore` returns true. So, we should use the store value's VT as the MemoryVT. This fixes llvm#89833.

@cor3ntin

Fixes llvm#89374 Solution suggested by @cor3ntin

Implements the core/target-agnostic components of Memory Model Relaxation Annotations. RFC: https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5

llvm-project/llvm/lib/IR/Verifier.cpp:4854:14: error: unused variable 'IsLeaf' [-Werror,-Wunused-variable] const auto IsLeaf = [](const Metadata *CurMD) { ^ 1 error generated.

Split vector and scalar regalloc has been enabled by default for 5 months now since d0a39e6, and shipped with 18.1.0. I haven't heard of any issues with it so far, so this proposes to remove the flag to reduce the number of configurations we have to support.

…more fixes. This re-applies 6094b3b, which was reverted in e7efd37 (and before that in 1effa19) due to bot failures. The test failures were fixed by having SelfExecutorProcessControl use an InPlaceTaskDispatcher by default, rather than a DynamicThreadPoolTaskDispatcher. This shouldn't be necessary (and indicates a concurrency issue elsewhere), but InPlaceTaskDispatcher is a less surprising default, and better matches the existing behavior (compilation on current thread by default), so the change seems reasonable. I've filed llvm#89870 to investigate the concurrency issue as a follow-up. Coding my way home: 6.25133S 127.94177W

The vast majority of the following (very common) opcodes were always called with identical arguments: - `GIM_CheckType` for the root - `GIM_CheckRegBankForClass` for the root - `GIR_Copy` between the old and new root - `GIR_ConstrainSelectedInstOperands` on the new root - `GIR_BuildMI` to create the new root I added overloaded version of each opcode specialized for the root instructions. It always saves between 1 and 2 bytes per instance depending on the number of arguments specialized into the opcode. Some of these opcodes had between 5 and 15k occurences in the AArch64 GlobalISel Match Table. Additionally, the following opcodes are almost always used in the same sequence: - `GIR_EraseFromParent 0` + `GIR_Done` - `GIR_EraseRootFromParent_Done` has been created to do both. Saves 2 bytes per occurence. - `GIR_IsSafeToFold` was *always* called for each InsnID except 0. - Changed the opcode to take the number of instructions to check after `MI[0]` The savings from these are pretty neat. For `AArch64GenGlobalISel.inc`: - `AArch64InstructionSelector.cpp.o` goes down from 772kb to 704kb (-10% code size) - Self-reported MatchTable size goes from 420380 bytes to 352426 bytes (~ -17%) A smaller match table means a faster match table because we spend less time iterating and decoding. I don't have a solid measurement methodology for GlobalISel performance so I don't have precise numbers but I saw a few % of improvements in a simple testcase.

llvm#89619)

llvm-project/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp:684:8: error: unused variable 'ConcurrentCompilationSettingDefaulted' [-Werror,-Wunused-variable] bool ConcurrentCompilationSettingDefaulted = !SupportConcurrentCompilation; ^ 1 error generated.

With `nsw`/`nuw`, the `trunc` is non-zero if its operand is non-zero. Proofs: https://alive2.llvm.org/ce/z/iujmk6 Closes llvm#89643

RST is powerful but usually too powerful for 90% of what we need it for. Markdown is easier to edit and can be previewed easily without building the entire website. This copies what llvm does already, making myst_parser optional if you only want man pages. Previously we had Markdown enabled in 8b95bd3 but that got reverted. That did this in a different way but I've gone with the standard llvm set this time. I intend the first Markdown pages to be the remote protocol extension docs, as they are not in any set format right now.

A follow-up to llvm#71709, addressing the static analysis finding reported in https://github.com/llvm/llvm-project/pull/71709/files#r1576846306

…tablegen files (llvm#88378) Introduce a mechanism to share data between the ARM and AArch64 backends and TargetParser, to reduce duplication of code. This is similar to the current RISC-V implementation. The target tablegen file (in this case `ARM.td` or `AArch64.td`) is processed during building of `TargetParser` to generate the following files in the build tree: - `build/include/llvm/TargetParser/ARMTargetParserDef.inc` - `build/include/llvm/TargetParser/AArch64TargetParserDef.inc` For now, the use of these generated files is limited to files _outside_ of `TargetParser`. The main reason for this is that the modifications to `TargetParser` will require additional data added to the tablegen files, which I want to split into separate PRs.

Fixes the failure at https://lab.llvm.org/buildbot/#/builders/131/builds/62928, and add comments about unused variable and update debugging output. Coding my way home: 6.44615S, 128.16704W

…to website (llvm#89718) This document has never been on the website, unlike GDB's protocol docs. It will be useful to have both available online to compare. Markdown is easier to edit and preview in many editors (including Github itself), so I've chosen that over RST. Plus, building the website takes minutes and I lose the will to make nice edits when I have to deal with that. The standard dialiect lacks some things notably multi-line table cells, so I've converted large tables into bullet point lists so that we still get text wrapping. This is a downside but I think the simplicity of Markdown outweighs this. I have applied the plain text markers where I've noticed it and escaped some HTML characters. There may be more changes needed but, it's Markdown, so it's in theory a lot easier for someone to fix it!

…9632) This patch adds new tests mostly checking SPIR-V validation of pointer and primitive types.

…r regalloc (llvm#88295) This patch splits off part of the work to move vsetvli insertion to post regalloc in llvm#70549. The doLocalPostpass operates outside of RISCVInsertVSETVLI's dataflow, so we can move it to its own pass. We can then move it to post vector regalloc which should be a smaller change. A couple of things that are different from llvm#70549: - This manually fixes up the LiveIntervals rather than recomputing it via createAndComputeVirtRegInterval. I'm not sure if there's much of a difference with either. - For the postpass it's sufficient enough to just check isUndef() in hasUndefinedMergeOp, i.e. we don't need to lookup the def in VNInfo. Running on llvm-test-suite and SPEC CPU 2017 there aren't any changes in the number of vsetvlis removed. There are some minor scheduling diffs as well as extra spills and less spills in some cases (caused by transient vsetvlis existing between RISCVInsertVSETVLI and RISCVCoalesceVSETVLI when vec regalloc happens), but they are minor and should go away once we finish moving the rest of RISCVInsertVSETVLI. We could also potentially turn off this pass for unoptimised builds.

The previous state was leading to inconsistencies. Some targets would get the options and some wouldn't. As an example, the `MEMORY_COPTS` definitions would only apply to the `:string_memory_utils` target but not to the `:memcpy` target. This patch makes sure definitions are applied throughout the LLVM libc targets as `local_defines`. This ensures that the preprocessor definitions don't propagate to depending targets outside of LLVM libc, and that all libc targets have consistent preprocessor definitions.

…9773) With GFX12 architected SGPRs the workgroup ids are trivially available in any function called from a compute entrypoint.

…#89827) This is missing e.g. on Windows. With this change, it's possible to make the libcxx std module work on mingw-w64 (although that requires a few fixes to those headers). In the regular cstdlib header, we have _LIBCPP_USING_IF_EXISTS flagged on every single reexported function (since a9c9183), but the modules seem to only have _LIBCPP_USING_IF_EXISTS set on a few individual functions, so far.

This patch adds test coverage for commutable RVV instructions added in llvm#88379. For each kind of instruction, I add two tests (one for unmasked and one for masked). These tests don't cover all the SEWs/LMULs as I think it's not worthy because there is no difference when handling instructions with different SEWs/LMULs. As the tests shown, we can't eliminate two equal instructions if there is a use of `V0`. This may be fixed in the future. Reviewers: asb, jacquesguan, topperc, lukel97, preames Reviewed By: lukel97 Pull Request: llvm#89889

… (C1 - C2 * C0) + X * C2` (llvm#76285) Since `DivRemPairPass` runs after `ReassociatePass` in the optimization pipeline, I decided to do this simplification in `InstCombine`. Alive2: https://alive2.llvm.org/ce/z/Jgsiqf Fixes llvm#76128.

…ion). Fixes the bot failure at https://lab.llvm.org/buildbot/#/builders/272/builds/14788. Coding my way home: 6.48551S, 128.21109W

We've recently seen the libclc llvm-link invocations become so long that they exceed the character limits on certain platforms. Using a 'response file' should solve this by offloading the list of inputs into a separate file, and using special syntax to pass it to llvm-link. Note that neither the response file nor syntax aren't specific to Windows but we restrict it to that platform regardless. We have the option of expanding it to other platforms in the future.

… shuffles (llvm#88899) Refactor to be closer to foldShuffleOfCastops - sibling patch to llvm#88743 that can be used to address some of the issues identified in llvm#88693

…the same (llvm#89737) This PR fixes the issue llvm#88908 Attached test case is updated to check that OpSConvert/OpUConvert is not generated when input and result types are identical.

…R opcode inserted by IRTranslator (llvm#89890) Translating global values, IRTranslator pass can sometimes generates code patterns that require additional efforts during pre-legalization. This PR addresses this problem to support G_PTRTOINT instruction used in initialization of GV.

…llvm#89611) It turned out that `hlfir::genVariableBox` didn't add lower bounds to the boxes it created. Using a shapeshift instead of only a shape adds the lower bounds information to the thread-local copy of the box. Fixes llvm#89259

See RFC at https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations I previously did the same for the AbstractResult pass llvm#88867

Missed from llvm#88378, only showed up in the sanitizer builds.

Noticed in llvm#89897

…y `ArrayRef<const Value *> Args` argument. NFC.

I have a tutorial at EuroLLVM 2024 ([Zero to Hero: Programming Nvidia Hopper Tensor Core with MLIR's NVGPU Dialect](https://llvm.swoogo.com/2024eurollvm/session/2086997/zero-to-hero-programming-nvidia-hopper-tensor-core-with-mlir's-nvgpu-dialect)). For that, I implemented tutorial codes in Python. The focus is the nvgpu dialect and how to use its advanced features. I thought it might be useful to upstream this. The tutorial codes are as follows: - **Ch0.py:** Hello World - **Ch1.py:** 2D Saxpy - **Ch2.py:** 2D Saxpy using TMA - **Ch3.py:** GEMM 128x128x64 using Tensor Core and TMA - **Ch4.py:** Multistage performant GEMM using Tensor Core and TMA - **Ch5.py:** Warp Specialized GEMM using Tensor Core and TMA I might implement one more chapter: - **Ch6.py:** Warp Specialized Persistent ping-pong GEMM This PR also introduces the nvdsl class, making IR building in the tutorial easier.

Also cleanup to avoid the memory noise by using return values in the trivial cases.

Precommit tests for llvm#83860.

…#89607) On AArch64, rdvl can accept a nagative value, while cntd/cntw/cnth can't. As we do support VScale with a negative multiply value, so we did not limit the negative value and instead took the hit of having the extra patterns according PR88108. Also add NoUseScalarIncVL to avoid affecting patterns works for -mattr=+use-scalar-inc-vl Fix llvm#84620

Summary: The AMDGPU toolchain simply took the short name to get the link job instead of using the common utilities that respect options like `-fuse-ld`. Any linker that isn't `ld.lld` will fail, however we should be able to override it.

Simplify callers which don't have their own DemandedElts mask. Noticed while reviewing llvm#88801

This commit enhances the LLVM dialect's Mem2Reg interfaces to support partial stores to memory slots. To achieve this support, the `getStored` interface method has to be extended with a parameter of the reaching definition, which is now necessary to produce the resulting value after this store.

Add bindings for LLVM pointer type.

This changes the handling of anonymous TagDecls to the following rules: - If the TagDecl is embedded in the declaration for some VarDecl (this is the only possibility for RecordDecls), then pretend the child decls belong to the VarDecl - If it's an EnumDecl proceed as we did previously, i.e., embed it in the enclosing DeclContext. Additionally this fixes a few issues with declaration fragments not consistently including "{ ... }" for anonymous TagDecls. To make testing these additions easier this patch fixes some text declaration fragments merging issues and updates tests accordingly. rdar://121436298

Reverts d3f6c2c, since ARMTargetDefEmitter.cpp has to be in llvm-min-tblgen too.

This function will break up a construct into constituent leaf and composite constructs, e.g. if OMPD_c_d_e and OMPD_d_e are composite constructs, then OMPD_a_b_c_d_e will be broken up into the list {OMPD_a, OMPD_b, OMPD_c_d_e}.

This can occur if the virtual address space is (almost) entirely mapped or heavily fragmented.

…vector size difference (llvm#88380) Add separate messages about passing arguments or returning parameters with scalable types. --------- Co-authored-by: Sander de Smalen <sander.desmalen@arm.com>

This patch updates the definition of `omp.wsloop` to enforce the restrictions of a loop wrapper operation. Related tests are updated but this PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.

@RKSimon

) Fixes llvm#82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue llvm#82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.

to reflect that there are three variants.

…m#89211) This patch updates verifiers for `omp.ordered`, `omp.ordered.region`, `omp.cancel` and `omp.cancellation_point`, which check for a parent `omp.wsloop`. After transitioning to a loop wrapper-based approach, the expected direct parent will become `omp.loop_nest` instead, so verifiers need to take this into account. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.

This patch makes changes to the `scf.parallel` to `omp.parallel` + `omp.wsloop` lowering pass in order to introduce a nested `omp.loop_nest` as well, and to follow the new loop wrapper role for `omp.wsloop`. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.

…9214) This patch introduces minimal changes to the MLIR to LLVM IR translation of `omp.wsloop` to support the loop wrapper approach. There is `omp.loop_nest` related translation code that should be extracted and shared among all loop operations (e.g. `omp.simd`). This would possibly also help in the addition of support for compound constructs later on. This first approach is only intended to keep things running after the transition to loop wrappers and not to add support for other use cases enabled by that transition. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.

This patch updates lowering from PFT to MLIR of workshare loops to follow the loop wrapper approach. Unit tests impacted by this change are also updated. As the last patch of the stack, this should compile and pass unit tests.

Semantics usually fold SHAPE into an array constructor, but sometimes it cannot (like when the source is a function result that cannot be duplicated in expression analysis). Add lowering handling for shape.

…lane.mask (llvm#89068) When SVE is available we can lower calls to get.active.lane.mask using the SVE whilelo instruction, however in practice since vXi1 types are not legal for NEON we often end up expanding the predicate into a vector of integers, e.g. v4i1 -> v4i32. This usually happens when we have to keep the predicate live out of the block, for example when the predicate is the incoming value to a PHI node in a tail-folded vector loop. Currently in such cases the intrinsic call has a cost of 1, which is far too low when considering the extra instructions required to expand the predicate. This patch fixes that by basing the cost on the number of lane moves required for expansion. This is required for a follow-on patch that adds the cost of the intrinsic call to the vectorisation cost model, so that we can teach the vectoriser to make better choices.

…nctionDefinition (llvm#89801) In the lambda function within clang::Sema::InstantiateFunctionDefinition, the return value of a function that may return null is now checked before dereferencing to avoid potential null pointer dereference issues which can lead to crashes or undefined behavior in the program.

On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. Fixes: SWDEV-438421

…lvm#89485)

…th r2 and pre-R2 (llvm#89881) About unsigned max/min, ANDi is available for all ISA revisions in extend before slt insn. So that we can reduce one instruction.

…ructions (llvm#89867) Since the requirement is EEW=32, it's impossible that EGW=128 needs LMUL=8.

Commits on Apr 29, 2024

Merge branch 'main' into amd-trunk-dev

skatrak committed Apr 29, 2024

Configuration menu

View commit details

Copy full SHA for 3c30af4

Browse repository at this point

Copy the full SHA

3c30af4 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge from main with loop wrappers + composite support + liboffload #71

Merge from main with loop wrappers + composite support + liboffload #71

Commits on Apr 22, 2024

Commits on Apr 23, 2024

Commits on Apr 24, 2024

Commits on Apr 29, 2024

Commits on May 3, 2024