-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Node Profiling #353
Commits on Jan 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 292b508 - Browse repository at this point
Copy the full SHA 292b508View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a98091 - Browse repository at this point
Copy the full SHA 8a98091View commit details -
[InstCombine] Simplify and/or by replacing operands with constants (#…
…77231) This patch tries to simplify `X | Y` by replacing occurrences of `Y` in `X` with 0. Similarly, it tries to simplify `X & Y` by replacing occurrences of `Y` in `X` with -1. Alive2: https://alive2.llvm.org/ce/z/cNjDTR Note: As the current implementation is too conservative in the one-use checks, I cannot remove other existing hard-coded simplifications if they involves more than two instructions (e.g, `A & ~(A ^ B) --> A & B`). Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang| |--|--|--|--|--|--|--| |+0.01%|-0.00%|+0.00%|-0.02%|+0.01%|+0.02%|-0.01%| Fixes #76554.
Configuration menu - View commit details
-
Copy full SHA for f2816ff - Browse repository at this point
Copy the full SHA f2816ffView commit details -
[clang][Interp] Add inline descriptor to global variables (#72892)
Some time ago, I did a similar patch for local variables. Initializing global variables can fail as well: ```c++ constexpr int a = 1/0; static_assert(a == 0); ``` ... would succeed in the new interpreter, because we never saved the fact that `a` has not been successfully initialized.
Configuration menu - View commit details
-
Copy full SHA for 5bb99ed - Browse repository at this point
Copy the full SHA 5bb99edView commit details -
[NFC] Update .git-blame-ignore-revs for compiler-rt builtins (#79803)
Configuration menu - View commit details
-
Copy full SHA for 6f35f1d - Browse repository at this point
Copy the full SHA 6f35f1dView commit details -
[NFC] Add compiler-rt:* to .github/new-prs-labeler.yml (#79872)
After this change, all current compiler-rt:* labels on GitHub are covered.
Configuration menu - View commit details
-
Copy full SHA for 9594746 - Browse repository at this point
Copy the full SHA 9594746View commit details -
[clang][dataflow] Extend debug output for
Environment
. (#79982)* Print `ReturnLoc`, `ReturnVal`, and `ThisPointeeLoc` if applicable. * For entries in `LocToVal` that correspond to declarations, print the names of the declarations next to them. I've removed the FIXME because all relevant fields are now being dumped. I'm not sure we actually need the capability for the caller to specify which fields to dump, so I've simply deleted this part of the comment. Some examples of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/17d0978f-b86d-4555-8a61-d1f2021f8d59) ![image](https://github.com/llvm/llvm-project/assets/29098113/021dbb24-5fe2-4720-8a08-f48dcf4b88f8)
Configuration menu - View commit details
-
Copy full SHA for c83ec84 - Browse repository at this point
Copy the full SHA c83ec84View commit details -
[AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087)
Make the wmma intrinsic type signatures to be canonical. We need a type signature as long as the type is not fixed. However, when an argument's type matches a previous argument's type, we do not need the signature for this argument. This patch fixes three general cases: 1. add missing signatures 2. remove signatures for matching arguments 3. reorer the signatures -- return type signature should always appear first
Configuration menu - View commit details
-
Copy full SHA for 3564666 - Browse repository at this point
Copy the full SHA 3564666View commit details -
[clang] static operators should evaluate object argument (reland) (#8…
…0108) This re-applies 30155fc with a fix for clangd. ### Description clang don't evaluate the object argument of `static operator()` and `static operator[]` currently, for example: ```cpp #include <iostream> struct Foo { static int operator()(int x, int y) { std::cout << "Foo::operator()" << std::endl; return x + y; } static int operator[](int x, int y) { std::cout << "Foo::operator[]" << std::endl; return x + y; } }; Foo getFoo() { std::cout << "getFoo()" << std::endl; return {}; } int main() { std::cout << getFoo()(1, 2) << std::endl; std::cout << getFoo()[1, 2] << std::endl; } ``` `getFoo()` is expected to be called, but clang don't call it currently (17.0.6). This PR fixes this issue. Fixes #67976, reland #68485. ### Walkthrough - **clang/lib/Sema/SemaOverload.cpp** - **`Sema::CreateOverloadedArraySubscriptExpr` & `Sema::BuildCallToObjectOfClassType`** Previously clang generate `CallExpr` for static operators, ignoring the object argument. In this PR `CXXOperatorCallExpr` is generated for static operators instead, with the object argument as the first argument. - **`TryObjectArgumentInitialization`** `const` / `volatile` objects are allowed for static methods, so that we can call static operators on them. - **clang/lib/CodeGen/CGExpr.cpp** - **`CodeGenFunction::EmitCall`** CodeGen changes for `CXXOperatorCallExpr` with static operators: emit and ignore the object argument first, then emit the operator call. - **clang/lib/AST/ExprConstant.cpp** - **`ExprEvaluatorBase::handleCallExpr`** Evaluation of static operators in constexpr also need some small changes to work, so that the arguments won't be out of position. - **clang/lib/Sema/SemaChecking.cpp** - **`Sema::CheckFunctionCall`** Code for argument checking also need to be modify, or it will fail the test `clang/test/SemaCXX/overloaded-operator-decl.cpp`. - **clang-tools-extra/clangd/InlayHints.cpp** - **`InlayHintVisitor::VisitCallExpr`** Now that the `CXXOperatorCallExpr` for static operators also have object argument, we should also take care of this situation in clangd. ### Tests - **Added:** - **clang/test/AST/ast-dump-static-operators.cpp** Verify the AST generated for static operators. - **clang/test/SemaCXX/cxx2b-static-operator.cpp** Static operators should be able to be called on const / volatile objects. - **Modified:** - **clang/test/CodeGenCXX/cxx2b-static-call-operator.cpp** - **clang/test/CodeGenCXX/cxx2b-static-subscript-operator.cpp** Matching the new CodeGen. ### Documentation - **clang/docs/ReleaseNotes.rst** Update release notes. --------- Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com> Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Configuration menu - View commit details
-
Copy full SHA for ee01a2c - Browse repository at this point
Copy the full SHA ee01a2cView commit details -
[clang][dataflow] In the CFG visualization, mark converged blocks. (#…
…79999) Here's an example of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/63cd509e-c2a7-4794-b758-ea73812ff09f)
Configuration menu - View commit details
-
Copy full SHA for 82324bc - Browse repository at this point
Copy the full SHA 82324bcView commit details -
[ADT] Use a constexpr version of llvm::bit_ceil (NFC) (#79709)
This patch replaces the template trick with a constexpr function that is more readable. Once C++20 is available in our code base, we can remove the constexpr function in favor of std::bit_ceil.
Configuration menu - View commit details
-
Copy full SHA for b49b3dd - Browse repository at this point
Copy the full SHA b49b3ddView commit details -
[SYCL][Fusion] Enable fusion of rounded-range kernels (intel#12492)
Enable, test, and document the support for fusing rounded range kernels. This mostly worked already – we just have to query the original kernel's global size, and use that to compute the private memory size used for internalization. --------- Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for a3e2315 - Browse repository at this point
Copy the full SHA a3e2315View commit details -
[InstCombine] Fold select with signbit idiom into fabs (#76342)
This patch folds: ``` ((bitcast X to int) <s 0 ? -X : X) -> fabs(X) ((bitcast X to int) >s -1 ? X : -X) -> fabs(X) ((bitcast X to int) <s 0 ? X : -X) -> -fabs(X) ((bitcast X to int) >s -1 ? -X : X) -> -fabs(X) ``` Alive2: https://alive2.llvm.org/ce/z/rGepow
Configuration menu - View commit details
-
Copy full SHA for f292f90 - Browse repository at this point
Copy the full SHA f292f90View commit details -
[SYCL] [NATIVECPU] Update OneAPI Construction Kit tag (intel#12543)
Updates the commit tag for the OCK.
Configuration menu - View commit details
-
Copy full SHA for 565490d - Browse repository at this point
Copy the full SHA 565490dView commit details -
[SYCL][Joint matrix tests] Fix test execution env setting for two tes…
…ts (intel#12529) This will make the two tests run in the presence of either CPU OR GPU and not requiring both to be present to run.
Configuration menu - View commit details
-
Copy full SHA for 6ec040e - Browse repository at this point
Copy the full SHA 6ec040eView commit details -
[NFC] [clang-repl] Fix test failures due to incosistent target settings
See llvm/llvm-project#79261 for details. It shows that clang-repl uses a different target triple with clang so that it may be problematic if the calng-repl reads the generated BMI from clang in a different target triple. While the underlying issue is not easy to fix, this patch tries to make this test green to not bother developers.
Configuration menu - View commit details
-
Copy full SHA for d71831a - Browse repository at this point
Copy the full SHA d71831aView commit details -
[SYCL][Fusion] Improve error messages on incompatible ND-ranges (inte…
…l#12524) Show detailed error messages when users try to fuse kernels with incompatible ND-ranges, showing different errors for each different scenario. Also combine the validation and fusion logic to reduce the number of ND-ranges list traversals. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 7d492f8 - Browse repository at this point
Copy the full SHA 7d492f8View commit details -
[RISCV][Isel] Remove redundant vmerge for the scalable vwadd(u).wv (#…
…80079) Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited. ### Code ``` define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) { %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer) %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64> %ret = add <vscale x 8 x i64> %sa, %y ret <vscale x 8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/xsoa5xPrd) ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v12, 0 vmerge.vvm v24, v12, v8, v0 vwadd.wv v8, v16, v24 ret ``` ### After this patch ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m4, tu, mu vwadd.wv v16, v16, v8, v0.t vmv8r.v v8, v16 ret ```
Configuration menu - View commit details
-
Copy full SHA for dc5dca1 - Browse repository at this point
Copy the full SHA dc5dca1View commit details -
[mlir][memref]
memref.subview
: Verify result strides (#79865)The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases. Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit. Also refactor the implementation a bit: * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`. * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments. * `produceSubViewErrorMsg` does not have to be templatized.
Configuration menu - View commit details
-
Copy full SHA for db49319 - Browse repository at this point
Copy the full SHA db49319View commit details -
[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAll…
…ocOrder (#80015) Previously we called ignoreCSRForAllocationOrder on every alias of every CSR which was expensive on targets like AMDGPU which define a very large number of overlapping register tuples. On such targets it is simpler and faster to call ignoreCSRForAllocationOrder once for every physical register. Differential Revision: https://reviews.llvm.org/D146735
Configuration menu - View commit details
-
Copy full SHA for f852503 - Browse repository at this point
Copy the full SHA f852503View commit details -
Revert "[mlir][memref]
memref.subview
: Verify result strides" (#80116)Reverts llvm/llvm-project#79865 I think there is a bug in the stride computation in `SubViewOp::inferResultType`. (Was already there before this change.) Reverting this commit for now and updating the original pull request with a fix and more test cases.
Configuration menu - View commit details
-
Copy full SHA for 96c907d - Browse repository at this point
Copy the full SHA 96c907dView commit details -
Turn on LLVM_USE_SPLIT_DWARF by default for Linux Debug build (intel#…
…12527) split-dwarf feature can help reducing compile time and build footprint See examples from: https://www.productive-cpp.com/improving-cpp-builds-with-split-dwarf/ Locally measured size reduction using debug build shows around 20% reduction for static linked build. Footprint reduction using after compile.py: 48G -> 37G (23%) after check-all: 170G -> 140G (18%) Debugability should not be affected. Should help with compile time, especially incremental build as well. -gsplit-dwarf not yet supported on windows, so not turn it on for now.
Configuration menu - View commit details
-
Copy full SHA for 2f20e37 - Browse repository at this point
Copy the full SHA 2f20e37View commit details -
[SYCL] Ensure that RTDeviceBinaryImage instances have a unique image …
…ID (intel#12526) **Problem:** Currently, the image id of an RTDeviceBinaryImage instance is simply the pointer value of the underlying pi_device_binary (in [getImageID(](https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.hpp#L221))). However, consider the following scenario: 1) We create a device image 2) Put into cache 3) Destroy the image (when it goes out of scope) 4) Create another image that _happens to be created at the same memory address_ (thus having same image ID) This causes two instances of RTDeviceBinaryImage to share the same image id, which ends up causing a collision in the KernelProgramCache. **Solution (Proposed in this PR)** Have a counter in RTDeviceBinaryImage that increments upon instantiation of this class. The counter value is added to the image id to ensure that no two instances have the same ID. **Alternative Solutions** 1. Remove the entry from the KernelProgramCache when the image is destroyed. This solution would require more work as the KernelProgramCache, currently, [does not support arbitrary element-wise eviction](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/KernelProgramCache.md#in-memory-cache-eviction) (eviction follows a LRU strategy when cache size exceeds the threshold). Moreover, I expect this to have additional performance overhead of having to lock the cache and evicting. The proposed solution is much more simpler.
Configuration menu - View commit details
-
Copy full SHA for 04ff5b8 - Browse repository at this point
Copy the full SHA 04ff5b8View commit details -
[clang][Interp] Support arbitrary precision constants (#79747)
Add (de)serialization support for them, like we do for Floating values.
Configuration menu - View commit details
-
Copy full SHA for 64a849a - Browse repository at this point
Copy the full SHA 64a849aView commit details -
Add support of param type for transform.structured.tile_using_forall …
…(#72097) Make transform.structured.tile_using_forall be able to take param type tile sizes. Examples: ``` %tile_sizes = transform.param.constant 16 : i64 -> !transform.param<i64> transform.structured.tile_using_forall %matmul tile_sizes [%tile_sizes : !transform.param<i64>, 32] ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op) ``` ``` %c10 = transform.param.constant 10 : i64 -> !transform.any_param %c20 = transform.param.constant 20 : i64 -> !transform.any_param %tile_sizes = transform.merge_handles %c10, %c20 : !transform.any_param transform.structured.tile_using_forall %matmul tile_sizes *(%tile_sizes : !transform.any_param) ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op) ```
Configuration menu - View commit details
-
Copy full SHA for d439f36 - Browse repository at this point
Copy the full SHA d439f36View commit details -
[SME] Stop RA from coalescing COPY instructions that transcend beyond…
… smstart/smstop. (#78294) This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that expands to a 'nop', but which stops the register allocator from coalescing a COPY node when its use/def crosses a SMSTART or SMSTOP instruction. For example: %0:fpr64 = COPY killed $d0 undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY ADJCALLSTACKDOWN 0, 0 MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0 $d0 = COPY killed %0 BL @use_f64, csr_aarch64_aapcs If the COPY would be coalesced, that would lead to: $d0 = COPY killed %0 being replaced by: $d0 = COPY killed %2.dsub which means the whole ZPR reg would be live upto the call, causing the MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register: str q0, [sp] // 16-byte Folded Spill smstop sm ldr z0, [sp] // 16-byte Folded Reload bl use_f64 which would be incorrect for two reasons: 1. The program may load more data than it has allocated. 2. If there are other SVE objects on the stack, the compiler might use the 'mul vl' addressing modes to access the spill location. By disabling the coalescing, we get the desired results: str d0, [sp, #8] // 8-byte Folded Spill smstop sm ldr d0, [sp, #8] // 8-byte Folded Reload bl use_f64
Configuration menu - View commit details
-
Copy full SHA for dd73666 - Browse repository at this point
Copy the full SHA dd73666View commit details -
[RISCV][MC] Add MC layer support for the experimental zabha extension…
… (#80005) This patch implements the zabha (Byte and Halfword Atomic Memory Operations) v1.0-rc1 extension. See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc.
Configuration menu - View commit details
-
Copy full SHA for 89f87c3 - Browse repository at this point
Copy the full SHA 89f87c3View commit details -
[mlir][transform] Add elementwise criteria to
match.structured.body
…… (#79626) As far as I am aware, there is no simple way to match on elementwise ops. I propose to add an `elementwise` criteria to the `match.structured.body` op. Although my only hesitation is that elementwise is not only determined by the body, but also the indexing maps. So if others find this too awkward, I can implement a separate match op instead.
Configuration menu - View commit details
-
Copy full SHA for 488f88b - Browse repository at this point
Copy the full SHA 488f88bView commit details -
[mlir][ArmSME] Support 2-way widening outer products (#78975)
This patch introduces support for 2-way widening outer products. This enables the fusion of 2 'arm_sme.outerproduct' operations that are chained via the accumulator into a 2-way widening outer product operation. Changes: - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants. These map to instruction variants added in SME2 and use different intrinsics. Intrinsics are already implemented for widening variants from SME1. - Adds the following operations: - fmopa_2way, fmops_2way - smopa_2way, smops_2way - umopa_2way, umops_2way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Adds a pass 'arm-sme-outer-product-fusion' that fuses 'arm_sme.outerproduct' operations. For a detailed description of these operations see the 'arm_sme.fmopa_2way' description. The reason for introducing many operations rather than one is the signed/unsigned variants can't be distinguished with types (e.g., ui16, si16) since 'arith.extui' and 'arith.extsi' only support signless integers. A single operation would require this information and an attribute (for example) for the sign doesn't feel right if floating-point types are also supported where this wouldn't apply. Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32) introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but no subtract variant. Whilst these are not supported in this patch, it felt simpler to have separate ops for add/subtract given this.
Configuration menu - View commit details
-
Copy full SHA for 95ef8e3 - Browse repository at this point
Copy the full SHA 95ef8e3View commit details -
[mlir][vector] Disable transpose -> shuffle lowering for scalable vec…
…tors (#79979) vector.shuffle is not supported for scalable vectors (outside of splats)
Configuration menu - View commit details
-
Copy full SHA for 88610b7 - Browse repository at this point
Copy the full SHA 88610b7View commit details -
[mlir][memref]
memref.subview
: Verify result stridesThe `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases. Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit. Also refactor the implementation a bit: * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`. * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments. * `produceSubViewErrorMsg` does not have to be templatized. * Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering. Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
Configuration menu - View commit details
-
Copy full SHA for ce7cc72 - Browse repository at this point
Copy the full SHA ce7cc72View commit details -
[SYCL][COMPAT] Force device function to be inlined (intel#12550)
Due to the way the inliner works, the launched function may become very large and go above the inline threshold. This results with a short kernel which only call one function. The patch adds an always_inline on the call site to force the user function to be inline in the SYCL kernel to reduce overhead. Signed-off-by: Victor Lomuller <victor@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for e121c88 - Browse repository at this point
Copy the full SHA e121c88View commit details -
Merge from 'main' to 'sycl-web' (46 commits)
CONFLICT (content): Merge conflict in clang/lib/Basic/Targets/NVPTX.cpp CONFLICT (content): Merge conflict in clang/test/Driver/cuda-cross-compiling.c
Configuration menu - View commit details
-
Copy full SHA for 9bf5d5c - Browse repository at this point
Copy the full SHA 9bf5d5cView commit details -
Configuration menu - View commit details
-
Copy full SHA for db1fbd6 - Browse repository at this point
Copy the full SHA db1fbd6View commit details -
[GitHub][workflows] Add buildbot information comment to first merged …
…PR from a new contributor (#78292) This change adds a comment to the first PR from a new contributor that is merged, which tells them what to expect post merge from the build bots. How they will be notified, where to ask questions, that you're more likely to be reverted than in other projects, etc. The information overlaps with, and links to https://llvm.org/docs/MyFirstTypoFix.html#myfirsttypofix-issues-after-landing-your-pr. So that users who simply read the email are still aware, and know where to follow up if they do get reports. To do this, I have added a hidden HTML comment to the new contributor greeting comment. This workflow will look for that to tell if the author of the PR was a new contributor at the time they opened the merge. It has to be done this way because as soon as the PR is merged, they are by GitHub's definition no longer a new contributor and I suspect that their author association will be "contributor" instead. I cannot 100% confirm that without a whole lot of effort and probably breaking GitHub's terms of service, but it's fairly cheap to work around anyway. It seems rare / almost impossible to reopen a PR in llvm at least, but in case it does happen the buildbot info comment has its own hidden HTML comment. If we find this we will not post another copy of the same information.
Configuration menu - View commit details
-
Copy full SHA for 44ba4c7 - Browse repository at this point
Copy the full SHA 44ba4c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 24a8041 - Browse repository at this point
Copy the full SHA 24a8041View commit details -
[BDCE] Fix clearing of poison-generating flags
If the demanded bits of an instruction are full, we don't have to recurse to its users, but we may still have to clear flags on the instruction itself. Fixes llvm/llvm-project#80113.
Configuration menu - View commit details
-
Copy full SHA for b210cbb - Browse repository at this point
Copy the full SHA b210cbbView commit details -
[mlir][IR] Add
RewriterBase::moveBlockBefore
and fix bug in `moveOp……Before` (#79579) This commit adds a new method to the rewriter API: `moveBlockBefore`. This op is utilized by `inlineRegionBefore` and covered by dialect conversion test cases. Also fixes a bug in `moveOpBefore`, where the previous op location was not passed correctly. Adds a test case to `test-strict-pattern-driver.mlir`.
Configuration menu - View commit details
-
Copy full SHA for da784a2 - Browse repository at this point
Copy the full SHA da784a2View commit details -
Revert "[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreC…
…SRForAllocOrder (#80015)" This reverts commit f852503. It was supposed to speed things up but llvm-compile-time-tracker.com showed a slight slow down.
Configuration menu - View commit details
-
Copy full SHA for 942cc9a - Browse repository at this point
Copy the full SHA 942cc9aView commit details -
[ValueTracking] Merge
cannotBeOrderedLessThanZeroImpl
into `compute……KnownFPClass` (#76360) This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` to improve the signbit inference. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 50e80e0 - Browse repository at this point
Copy the full SHA 50e80e0View commit details -
[AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034)
PAL uses ELF REL (not RELA) relocations which can only store a 32-bit addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI which require the upper 32 bits of a 64-bit address calculation to be correct. This means that it is not safe to fold an arbitrary offset into a GlobalAddressSDNode, so stop doing that. In practice this is mostly a problem for small negative offsets which do not work as expected because PAL treats the 32-bit addend as unsigned.
Configuration menu - View commit details
-
Copy full SHA for c2c650f - Browse repository at this point
Copy the full SHA c2c650fView commit details -
[clang][AMDGPU] Remove trialing whitespace in doc
Added by f2a78e6. Wouldn't normally bother but it's showing up in some CI checks, just want to reduce the noise.
Configuration menu - View commit details
-
Copy full SHA for 0217d2e - Browse repository at this point
Copy the full SHA 0217d2eView commit details -
[SYCL][Bindless] Unique sampler addressing modes per dimension (intel…
…#12109) Add the ability to specify unique addressing modes per dimension to the bindless_image_sampler Corresponding CUDA adapter UR PR: oneapi-src/unified-runtime#1168 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b897152 - Browse repository at this point
Copy the full SHA b897152View commit details -
Configuration menu - View commit details
-
Copy full SHA for fbbc822 - Browse repository at this point
Copy the full SHA fbbc822View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ff2327 - Browse repository at this point
Copy the full SHA 7ff2327View commit details -
[mlir] Fix debug output for passes that modify top-level operation. (…
…#80022) Make it so that when the top-level (root) operation itself is being modified, it is also used as the root for debug output in PatternApplicator. Fix #80021
Configuration menu - View commit details
-
Copy full SHA for 78e0cca - Browse repository at this point
Copy the full SHA 78e0ccaView commit details -
[mlir][EmitC] Add
verbatim
op (#79584)The `verbatim` operation produces no results and the value is emitted as is followed by a line break ('\n' character) during translation. Note: Use with caution. This operation can have arbitrary effects on the semantics of the emitted code. Use semantically more meaningful operations whenever possible. Additionally this op is *NOT* intended to be used to inject large snippets of code. This operation can be used in situations where a more suitable operation is not yet implemented in the dialect or where preprocessor directives interfere with the structure of the code. Co-authored-by: Marius Brehler <marius.brehler@iml.fraunhofer.de>
Configuration menu - View commit details
-
Copy full SHA for e624648 - Browse repository at this point
Copy the full SHA e624648View commit details -
[SPIR-V] Improve how lowering of formal arguments in SPIR-V Backend i…
…nterprets a value of 'kernel_arg_type' (#78730) The goal of this PR is to tolerate differences between description of formal arguments by function metadata (represented by "kernel_arg_type") and LLVM actual parameter types. A compiler may use "kernel_arg_type" of function metadata fields to encode detailed type information, whereas LLVM IR may utilize for an actual parameter a more general type, in particular, opaque pointer type. This PR proposes to resolve this by a fallback to LLVM actual parameter types during the lowering of formal function arguments in cases when the type can't be created by string content of "kernel_arg_type", i.e., when "kernel_arg_type" contains a type unknown for the SPIR-V Backend. An example of the issue manifestation is https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/KernelArgTypeInOpString.ll, where a compiler generates for the following kernel function detailed `kernel_arg_type` info in a form of `!{!"image_kernel_data*", !"myInt", !"struct struct_name*"}`, and in LLVM IR same arguments are referred to as `@foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData)`. Both definitions are correct, and the resulting LLVM IR is correct, but lowering stage of SPIR-V Backend fails to generate SPIR-V type. ``` typedef int myInt; typedef struct { int width; int height; } image_kernel_data; struct struct_name { int i; int y; }; void kernel foo(__global image_kernel_data* in, __global struct struct_name *outData, myInt out) {} ``` ``` define spir_kernel void @foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData) ... !kernel_arg_type !7 ... { entry: ret void } ... !7 = !{!"image_kernel_data*", !"myInt", !"struct struct_name*"} ``` The PR changes a contract of `SPIRVType *getArgSPIRVType(...)` in a way that it may return `nullptr` to signal that the metadata string content is not recognized, so corresponding comments are added and a couple of checks for `nullptr` are inserted where appropriate.
Configuration menu - View commit details
-
Copy full SHA for 5a07774 - Browse repository at this point
Copy the full SHA 5a07774View commit details -
[X86] i256-add - replace i386 triple X32 check prefixes with X86 and …
…add gnux32 triple tests
Configuration menu - View commit details
-
Copy full SHA for 53b9d47 - Browse repository at this point
Copy the full SHA 53b9d47View commit details -
[X86] mmx-arith.ll - replace X32 check prefixes with X86 + strip cfi …
…noise We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 8d450b4 - Browse repository at this point
Copy the full SHA 8d450b4View commit details -
[X86] v4f32-immediate.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 00a6817 - Browse repository at this point
Copy the full SHA 00a6817View commit details -
[X86] v2f32.ll - replace X32 check prefixes with X86 (and add common …
…CHECK prefix) We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 929503e - Browse repository at this point
Copy the full SHA 929503eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f5fcb5 - Browse repository at this point
Copy the full SHA 3f5fcb5View commit details -
[OpenMPIRBuilder] Do not call host runtime for GPU teams codegen (#79…
…984) Patch ensures that host runtime functions are not called for handling OpenMP teams clause on the device. GPU code for pragma `omp target teams distribute parallel do` will require only one call to OpenMP loop-worksharing GPU runtime. Support for it will be added later. This patch does not include changes required for handling `omp target teams` for the host side.
Configuration menu - View commit details
-
Copy full SHA for b437014 - Browse repository at this point
Copy the full SHA b437014View commit details -
[BDCE] Also drop poison-generating metadata
The comment was incorrect: !range also applies to calls, and we do need to drop it in some cases.
Configuration menu - View commit details
-
Copy full SHA for cb6240d - Browse repository at this point
Copy the full SHA cb6240dView commit details -
[AsmParser] Add missing globals declarations in incomplete IR mode (#…
…79855) If `-allow-incomplete-ir` is enabled, automatically insert declarations for missing globals. If a global is only used in calls with the same function type, insert a function declaration with that type. Otherwise, insert a dummy i8 global. The fallback case could be extended with various heuristics (e.g. we could look at load/store types), but I've chosen to keep it simple for now, because I'm unsure to what degree this would really useful without more experience. I expect that in most cases the declaration type doesn't really matter (note that the type of an external global specifies a *minimum* size only, not a precise size). This is a followup to llvm/llvm-project#78421.
Configuration menu - View commit details
-
Copy full SHA for 5cc87b4 - Browse repository at this point
Copy the full SHA 5cc87b4View commit details -
[OpenMP] atomic compare weak : Parser & AST support (#79475)
This is a support for " #pragma omp atomic compare weak". It has Parser & AST support for now. --------- Authored-by: Sunil Kuravinakop <kuravina@pe28vega.us.cray.com>
Configuration menu - View commit details
-
Copy full SHA for a74e9ce - Browse repository at this point
Copy the full SHA a74e9ceView commit details -
[AArch64][SME] Fix inlining bug introduced in #78703 (#79994)
Calling a `__arm_locally_streaming` function from a function that is not a streaming-SVE function would lead to incorrect inlining. The issue didn't surface because the tests were not testing what they were supposed to test.
Configuration menu - View commit details
-
Copy full SHA for 3abf55a - Browse repository at this point
Copy the full SHA 3abf55aView commit details -
[llvm][InstCombine] bitcast bfloat half castpair bug (#79832)
Miscompilation arises due to instruction combining of cast pairs of the type `bitcast bfloat to half` + `<FPOp> bfloat to half` or `bitcast half to bfloat` + `<FPOp half to bfloat`. For example `bitcast bfloat to half`+`fpext half to double` or `bitcast bfloat to half`+`fpext bfloat to double` respectively reduce to `fpext bfloat to double` and `fpext half to double`. This is an incorrect conversion as it assumes the representation of `bfloat` and `half` are equivalent due to having the same width. As a consequence miscompilation arises. Fixes #61984
Configuration menu - View commit details
-
Copy full SHA for d309261 - Browse repository at this point
Copy the full SHA d309261View commit details -
[llvm-rc] Support ARM64EC resource generation (#78908)
This is already supported in llvm-cvtres, so only a small change is needed.
Configuration menu - View commit details
-
Copy full SHA for d55d72e - Browse repository at this point
Copy the full SHA d55d72eView commit details -
Configuration menu - View commit details
-
Copy full SHA for d74619a - Browse repository at this point
Copy the full SHA d74619aView commit details -
[mlir][ArmSME] Add initial SME vector legalization pass (#79152)
This adds a new pass (`-arm-sme-vector-legalization`) which legalizes vector operations so that they can be lowered to ArmSME. This initial patch adds decomposition for `vector.outerproduct`, `vector.transfer_read`, and `vector.transfer_write` when they operate on vector types larger than a single SME tile. For example, a [8]x[8]xf32 outer product would be decomposed into four [4]x[4]xf32 outer products, which could then be lowered to ArmSME. These three ops have been picked as supporting them alone allows lowering matmuls that use all ZA accumulators to ArmSME. For it to be possible to legalize a vector type it has to be a multiple of an SME tile size, but other than that any shape can be used. E.g. `vector<[8]x[8]xf32>`, `vector<[4]x[16]xf32>`, `vector<[16]x[4]xf32>` can all be lowered to four `vector<[4]x[4]xf32>` operations. In future, this pass will be extended with more SME-specific rewrites to legalize unrolling the reduction dimension of matmuls (which is not type-decomposition), which is why the pass has quite a general name.
Configuration menu - View commit details
-
Copy full SHA for 042800a - Browse repository at this point
Copy the full SHA 042800aView commit details -
[DAG] AddNodeIDCustom - call ShuffleVectorSDNode::getMask once instea…
…d of repeated getMaskElt calls. Use a simpler for-range loop to append all shuffle mask elements
Configuration menu - View commit details
-
Copy full SHA for 912cdd2 - Browse repository at this point
Copy the full SHA 912cdd2View commit details -
[X86] insertps-from-constantpool.ll - replace X32 check prefixes with…
… X86 and expose address math We try to only use X32 for gnux32 triple tests. Use no_x86_scrub_mem_shuffle so the test shows updated shuffle intermediate and the +4 offset into the constant pool vector entry
Configuration menu - View commit details
-
Copy full SHA for a82ca1c - Browse repository at this point
Copy the full SHA a82ca1cView commit details -
[X86] divrem.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for e4af212 - Browse repository at this point
Copy the full SHA e4af212View commit details -
[X86] divide-by-constant.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for ed11f25 - Browse repository at this point
Copy the full SHA ed11f25View commit details -
[X86] fold-vector-sext - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 824d073 - Browse repository at this point
Copy the full SHA 824d073View commit details -
[X86] cfguard - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 1d8c8f1 - Browse repository at this point
Copy the full SHA 1d8c8f1View commit details -
[X86] divrem8_ext.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
Configuration menu - View commit details
-
Copy full SHA for 648eb7c - Browse repository at this point
Copy the full SHA 648eb7cView commit details -
[SYCL][Fusion] Silence warning (intel#12555)
Silence unused variable warning which tripped post-commit checks for intel#12492. Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for b8f9c8b - Browse repository at this point
Copy the full SHA b8f9c8bView commit details -
[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c)…
…, concat(b,d)) (#79464) We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16 uhadd(concat(a,c), concat(b,d)), which can lead to further simplifications.
Configuration menu - View commit details
-
Copy full SHA for cf828ae - Browse repository at this point
Copy the full SHA cf828aeView commit details -
[X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load
Broadcast of a single float should not be any slower than loading 32B using vmovaps. So remat it can help reduce register spill when there is big register pressure.
Configuration menu - View commit details
-
Copy full SHA for e3c9327 - Browse repository at this point
Copy the full SHA e3c9327View commit details -
[AMDGPU][GFX12] Add tests for unsupported builtins (#78729)
__builtin_amdgcn_mfma* and __builtin_amdgcn_smfmac*
Configuration menu - View commit details
-
Copy full SHA for f96e85b - Browse repository at this point
Copy the full SHA f96e85bView commit details -
[X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCN…
…T instructions (#79954) Two variants: promoted legacy, NF (no flags update). The syntax of NF instructions is aligned with GNU binutils. https://sourceware.org/pipermail/binutils/2023-September/129545.html
Configuration menu - View commit details
-
Copy full SHA for d9e875d - Browse repository at this point
Copy the full SHA d9e875dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 817d0cb - Browse repository at this point
Copy the full SHA 817d0cbView commit details -
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.
Configuration menu - View commit details
-
Copy full SHA for 9536a62 - Browse repository at this point
Copy the full SHA 9536a62View commit details -
Configuration menu - View commit details
-
Copy full SHA for ab87426 - Browse repository at this point
Copy the full SHA ab87426View commit details -
[mlir][IR] Send missing notifications when inlining a block (#79593)
When a block is inlined into another block, the nested operations are moved into another block and the `notifyOperationInserted` callback should be triggered. This commit adds the missing notifications for: * `RewriterBase::inlineBlockBefore` * `RewriterBase::mergeBlocks`
Configuration menu - View commit details
-
Copy full SHA for c672b34 - Browse repository at this point
Copy the full SHA c672b34View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7e45cfd - Browse repository at this point
Copy the full SHA 7e45cfdView commit details -
[mlir][EmitC] Remove unused attribute from verbatim op (#80142)
The uses of the attribute were removed in code review of #79584, but it's definition was inadvertently kept.
Configuration menu - View commit details
-
Copy full SHA for 121a0ef - Browse repository at this point
Copy the full SHA 121a0efView commit details -
Configuration menu - View commit details
-
Copy full SHA for cec24f0 - Browse repository at this point
Copy the full SHA cec24f0View commit details -
[mlir][IR] Send missing notification when splitting a block (#79597)
When a block is split with `RewriterBase::splitBlock`, a `notifyBlockInserted` notification, followed by `notifyOperationInserted` notifications (for moving over the operations into the new block) should be sent. This commit adds those notifications.
Configuration menu - View commit details
-
Copy full SHA for c2675ba - Browse repository at this point
Copy the full SHA c2675baView commit details -
[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79…
…287) This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes #71763
Configuration menu - View commit details
-
Copy full SHA for de75e50 - Browse repository at this point
Copy the full SHA de75e50View commit details -
[Driver] Fix erroneous warning for -fcx-limited-range and -fcx-fortra…
…n-rules. (#79821) The options `-fcx-limited-range` and `-fcx-fortran-rules` were added in _https://github.com/llvm/llvm-project/pull/70244_ The code adding the options introduced an erroneous warning. `$ clang -c -fcx-limited-range t1.c` `clang: warning: overriding '' option with '-fcx-limited-range' [-Woverriding-option]` and `$ clang -c -fcx-fortran-rules t1.c` `clang: warning: overriding '' option with '-fcx-fortran-rules' [-Woverriding-option]` The warning doesn't make sense. This patch removes it.
Configuration menu - View commit details
-
Copy full SHA for e538486 - Browse repository at this point
Copy the full SHA e538486View commit details -
[AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294)
JumpThreading may perform AA queries while the dominator tree is not up to date, which may result in miscompilations. Fix this by adding a new AAQI option to disable the use of the dominator tree in BasicAA. Fixes llvm/llvm-project#79175.
Configuration menu - View commit details
-
Copy full SHA for 4f32f5d - Browse repository at this point
Copy the full SHA 4f32f5dView commit details -
[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)
This PR moves lowering of math dialect later in the pipeline. Because math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for GPU target, and it needs to run it first.
Configuration menu - View commit details
-
Copy full SHA for 74bf0b1 - Browse repository at this point
Copy the full SHA 74bf0b1View commit details -
[clang] Represent array refs as
TemplateArgument::Declaration
(#80050)This returns (probably temporarily) array-referring NTTP behavior to which was prior to #78041 because ~~I'm fed up~~ have no time to fix regressions.
Configuration menu - View commit details
-
Copy full SHA for 9bf4e54 - Browse repository at this point
Copy the full SHA 9bf4e54View commit details -
[MIRPrinter] Don't print space when there is no successor (#80143)
Extra space causes the checks generated by update_mir_test_checks to be unavailable. ``` # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 # RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s --- name: foo body: | ; CHECK-LABEL: name: foo ; CHECK: bb.0: ; CHECK-NEXT: successors: ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.1: ; CHECK-NEXT: RET 0, $eax bb.0: successors: bb.1: RET 0, $eax ... ``` The failure log is as follows: ``` llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match ; CHECK-NEXT: {{ $}} ^ <stdin>:21:13: note: 'next' match was here successors: ^ <stdin>:21:13: note: previous match ended here successors: ```
Configuration menu - View commit details
-
Copy full SHA for b7738e2 - Browse repository at this point
Copy the full SHA b7738e2View commit details -
Revert "[mlir][complex] Prevent underflow in complex.abs (#79786)"
This reverts commit 4effff2. It makes `complex.abs(-1)` return `-1`.
Configuration menu - View commit details
-
Copy full SHA for 70fb96a - Browse repository at this point
Copy the full SHA 70fb96aView commit details -
[SYCL][Fusion] Handle GEPs that were canonicalized to byte offsets (i…
…ntel#12557) Upstream now canonicalizes constant GEPs to represent byte offsets, i.e. using `i8` as source element type. This PR adapts the internalization pass to this change by also remapping GEPs with a constant offset, if that offset is a multiple of the internalized accessor's element size. Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 470e378 - Browse repository at this point
Copy the full SHA 470e378View commit details -
[flang] Lower ASYNCHRONOUS variables and IO statements (#80008)
Finish plugging-in ASYNCHRONOUS IO in lowering (GetAsynchronousId was not used yet). Add a runtime implementation for GetAsynchronousId (only the signature was defined). Always return zero since flang runtime "fakes" asynchronous IO (data transfer are always complete, see flang/docs/IORuntimeInternals.md). Update all runtime integer argument and results for IDs to use the AsynchronousId int alias for consistency. In lowering, asynchronous attribute is added on the hlfir.declare of ASYNCHRONOUS variable, but nothing else is done. This is OK given the synchronous aspects of flang IO, but it would be safer to treat these variable as volatile (prevent code motion of related store/loads) since the asynchronous data change can also be done by C defined user procedure (see 18.10.4 Asynchronous communication). Flang lowering anyway does not give enough info for LLVM to do such code motions (the variables that are passed in a call are not given the noescape attribute, so LLVM will assume any later opaque call may modify the related data and would not move load/stores of such variables before/after calls even if it could from a pure Fortran point of view without ASYNCHRONOUS).
Configuration menu - View commit details
-
Copy full SHA for 4679132 - Browse repository at this point
Copy the full SHA 4679132View commit details -
Configuration menu - View commit details
-
Copy full SHA for 47df391 - Browse repository at this point
Copy the full SHA 47df391View commit details -
Revert "[Clang][Sema] fix outline member function template with defau…
…… (#80144) …lt align crash (#78400)" This reverts commit 7b33899. A regression was discovered here: llvm/llvm-project#78400 and the author requested a revert to give time to review.
Configuration menu - View commit details
-
Copy full SHA for 6e6aa44 - Browse repository at this point
Copy the full SHA 6e6aa44View commit details -
[mlir][mesh] Refactoring code organization, tests and docs (#79606)
* Split out `MeshDialect.h` form `MeshOps.h` that defines the dialect class. Reduces include clutter if you care only about the dialect and not the ops. * Expose functions `getMesh` and `collectiveProcessGroupSize`. There functions are useful for outside users of the dialect. * Remove unused code. * Remove examples and tests of mesh.shard attribute in tensor encoding. Per the decision that Spmdization would be performed on sharding annotations and there will be no tensors with sharding specified in the type. For more info see this RFC comment: https://discourse.llvm.org/t/rfc-sharding-framework-design-for-device-mesh/73533/81
Configuration menu - View commit details
-
Copy full SHA for 31fc0a1 - Browse repository at this point
Copy the full SHA 31fc0a1View commit details -
Move the PowerPC/PPCMergeStringPool work to initializer (#77352)
Currently, the `PPCMergeStringPool` merges the global variable after the `AsmPrinter` initializer adds the global variables to its symbol list. This is to move the merging work of `PPCMergeStringPool` to its initializer, just like what GlobalMerge does, to avoid adding merged global variables to the `AsmPrinter` symbol lis.
Configuration menu - View commit details
-
Copy full SHA for 1bab570 - Browse repository at this point
Copy the full SHA 1bab570View commit details -
Fix: CMake Error at cmake/modules/LLVMExternalProjectUtils.cmake:86 (…
…is_msvc_triple) (#80071) Adding quotes around the `${target_triple}` Fix: #78530
Configuration menu - View commit details
-
Copy full SHA for c651b2b - Browse repository at this point
Copy the full SHA c651b2bView commit details -
[AST] Add dump() method to TypeLoc (#65484)
The ability to dump AST nodes is important to ad-hoc debugging, and the fact this doesn't work with TypeLoc nodes is an obvious missing feature in e.g. clang-query (`set output dump` simply does nothing). Having TypeLoc::dump(), and enabling DynTypedNode::dump() for such nodes seems like a clear win. It looks like this: ``` int main(int argc, char **argv); FunctionProtoTypeLoc <test.cc:3:1, col:31> 'int (int, char **)' cdecl |-ParmVarDecl 0x30071a8 <col:10, col:14> col:14 argc 'int' | `-BuiltinTypeLoc <col:10> 'int' |-ParmVarDecl 0x3007250 <col:20, col:27> col:27 argv 'char **' | `-PointerTypeLoc <col:20, col:26> 'char **' | `-PointerTypeLoc <col:20, col:25> 'char *' | `-BuiltinTypeLoc <col:20> 'char' `-BuiltinTypeLoc <col:1> 'int' ``` It dumps the lexically nested tree of type locs. This often looks similar to how types are dumped, but unlike types we don't look at desugaring e.g. typedefs, as their underlying types are not lexically spelled here. --- Less clear is exactly when to include these nodes in existing text AST dumps rooted at (TranslationUnit)Decls. These already omit supported nodes sometimes, e.g. NestedNameSpecifiers are often mentioned but not recursively dumped. TypeLocs are a more extreme case: they're ~always more verbose than the current AST dump. So this patch punts on that, TypeLocs are only ever printed recursively as part of a TypeLoc::dump() call. It would also be nice to be able to invoke `clang` to dump a typeloc somehow, like `clang -cc1 -ast-dump`. But I don't know exactly what the best verison of that is, so this patch doesn't do it. --- There are similar (less critical!) nodes: TemplateArgumentLoc etc, these also don't have dump() functions today and are obvious extensions. I suspect that we should add these, and Loc nodes should dump each other (e.g. the ElaboratedTypeLoc `vector<int>::iterator` should dump the NestedNameSpecifierLoc `vector<int>::`, which dumps the TemplateSpecializationTypeLoc `vector<int>::` etc). Maybe this generalizes further to a "full syntactic dump" mode, where even Decls and Stmts would print the TypeLocs they lexically contain. But this may be more complex than useful. --- While here, ConceptReference JSON dumping must be implemented. It's not totally clear to me why this implementation wasn't required before but is now...
Configuration menu - View commit details
-
Copy full SHA for 8d1b1c9 - Browse repository at this point
Copy the full SHA 8d1b1c9View commit details -
[AArch64] MI Scheduler LDP combine follow up (#79003)
This is a follow up of 75d820d, adding more opcodes to the combine target hook enabling more LDP creation. Patch co-authored by Cameron McInally.
Configuration menu - View commit details
-
Copy full SHA for 8841846 - Browse repository at this point
Copy the full SHA 8841846View commit details -
Add a release note for TypeLoc::dump() support; NFC
This amends 8d1b1c9 which added the functionality the release note refers to.
Configuration menu - View commit details
-
Copy full SHA for e33dc6b - Browse repository at this point
Copy the full SHA e33dc6bView commit details -
[AArch64] Use add_and_or_is_add for CSINC (#79552)
Adds or add-like-or's of 1 can both be turned into csinc, which can help fold more instructions into a csinc.
Configuration menu - View commit details
-
Copy full SHA for 5d7d89d - Browse repository at this point
Copy the full SHA 5d7d89dView commit details -
[clang][Interp] Handle casts between complex types (#79269)
Just handle this like two primtive casts.
Configuration menu - View commit details
-
Copy full SHA for 32c0048 - Browse repository at this point
Copy the full SHA 32c0048View commit details -
[clang][Interp] Remove wrong * operator
classifyComplexElementType used to return a std::optional, seems like this was left in a PR and not re-tested. This broke build bots, e.g. https://lab.llvm.org/buildbot/#/builders/68/builds/67930
Configuration menu - View commit details
-
Copy full SHA for dfd5a64 - Browse repository at this point
Copy the full SHA dfd5a64View commit details -
[AsmParser] Support non-consecutive global value numbers (#80013)
llvm/llvm-project#78171 added support for non-consecutive local value numbers. This extends the support for global value numbers (for globals and functions). This means that it is now possible to delete an unnamed global definition/declaration without breaking the IR. This is a lot less common than unnamed local values, but it seems like something we should support for consistency. (Unnamed globals are used a lot in Rust though.)
Configuration menu - View commit details
-
Copy full SHA for f2df4bf - Browse repository at this point
Copy the full SHA f2df4bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cd8348 - Browse repository at this point
Copy the full SHA 0cd8348View commit details -
[clang][dataflow] fix assert in `Environment::getResultObjectLocation…
…` (#79608) When calling `Environment::getResultObjectLocation` with a CXXOperatorCallExpr that is a prvalue, we just hit an assert because no record was ever created. --------- Co-authored-by: martinboehme <mboehme@google.com>
Configuration menu - View commit details
-
Copy full SHA for 5c2da28 - Browse repository at this point
Copy the full SHA 5c2da28View commit details -
[Flang] Support NULL(procptr): null intrinsic that has procedure poin…
…ter argument. (#80072) This PR adds support for NULL intrinsic to have a procedure pointer argument.
Configuration menu - View commit details
-
Copy full SHA for bd8bec2 - Browse repository at this point
Copy the full SHA bd8bec2View commit details -
Configuration menu - View commit details
-
Copy full SHA for e34fd2e - Browse repository at this point
Copy the full SHA e34fd2eView commit details -
Configuration menu - View commit details
-
Copy full SHA for baf1b19 - Browse repository at this point
Copy the full SHA baf1b19View commit details -
Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline…
… (#78556)" This reverts commit 74bf0b1. The test always fails. | mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input | // CHECK-PTX: __nv_expf https://lab.llvm.org/buildbot/#/builders/61/builds/53789
Configuration menu - View commit details
-
Copy full SHA for 98dbc68 - Browse repository at this point
Copy the full SHA 98dbc68View commit details -
Revert "[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(con…
…cat(a,c), concat(b,d))" (#80157) Reverts llvm/llvm-project#79464 while figuring out why the tests are failing.
Configuration menu - View commit details
-
Copy full SHA for 2907c63 - Browse repository at this point
Copy the full SHA 2907c63View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6720e3a - Browse repository at this point
Copy the full SHA 6720e3aView commit details -
[AArch64] Use DAG->isAddLike in add_and_or_is_add (#79563)
This allows it to work with disjoint or's as well as computing the known bits.
Configuration menu - View commit details
-
Copy full SHA for d04ae1b - Browse repository at this point
Copy the full SHA d04ae1bView commit details -
[Clang][test] Add fPIC when building shared library (#80065)
Fix linking error: "ld: error: can't create dynamic relocation R_X86_64_64 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output"
Configuration menu - View commit details
-
Copy full SHA for b929be2 - Browse repository at this point
Copy the full SHA b929be2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 16c4843 - Browse repository at this point
Copy the full SHA 16c4843View commit details -
[Exegesis] Print epsilon value in the sched model inconsistency repor…
…t (#80080) Since I've formatted the epsilon value, I don't think it's necessary to escape it.
Configuration menu - View commit details
-
Copy full SHA for 8241106 - Browse repository at this point
Copy the full SHA 8241106View commit details -
[lldb][DataFormatter][NFC] Use GetFirstValueOfLibCXXCompressedPair th…
…roughout formatters (#80133) This avoids duplicating the logic to get the first element of a libc++ `__compressed_pair`. This will be useful in supporting upcoming changes to the layout of `__compressed_pair`. Drive-by changes: * Renamed `m_item` to `size_node` for readability; `m_item` suggests it's a member variable, which it is not.
Configuration menu - View commit details
-
Copy full SHA for 08c0eb1 - Browse repository at this point
Copy the full SHA 08c0eb1View commit details -
[lldb] Add support for large watchpoints in lldb (#79962)
This patch is the next piece of work in my Large Watchpoint proposal, https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116 This patch breaks a user's watchpoint into one or more WatchpointResources which reflect what the hardware registers can cover. This means we can watch objects larger than 8 bytes, and we can watched unaligned address ranges. On a typical 64-bit target with 4 watchpoint registers you can watch 32 bytes of memory if the start address is doubleword aligned. Additionally, if the remote stub implements AArch64 MASK style watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2 size region of memory up to 2GB, aligned to that same size. I updated the Watchpoint constructor and CommandObjectWatchpoint to create a CompilerType of Array<UInt8> when the size of the watched region is greater than pointer-size and we don't have a variable type to use. For pointer-size and smaller, we can display the watched granule as an integer value; for larger-than-pointer-size we will display as an array of bytes. I have `watchpoint list` now print the WatchpointResources used to implement the watchpoint. I added a WatchpointAlgorithm class which has a top-level static method that takes an enum flag mask WatchpointHardwareFeature and a user address and size, and returns a vector of WatchpointResources covering the request. It does not take into account the number of watchpoint registers the target has, or the number still available for use. Right now there is only one algorithm, which monitors power-of-2 regions of memory. For up to pointer-size, this is what Intel hardware supports. AArch64 Byte Address Select watchpoints can watch any number of contiguous bytes in a pointer-size memory granule, that is not currently supported so if you ask to watch bytes 3-5, the algorithm will watch the entire doubleword (8 bytes). The newly default "modify" style means we will silently ignore modifications to bytes outside the watched range. I've temporarily skipped TestLargeWatchpoint.py for all targets. It was only run on Darwin when using the in-tree debugserver, which was a proxy for "debugserver supports MASK watchpoints". I'll be adding the aforementioned feature flag from the stub and enabling full mask watchpoints when a debugserver with that feature is enabled, and re-enable this test. I added a new TestUnalignedLargeWatchpoint.py which only has one test but it's a great one, watching a 22-byte range that is unaligned and requires four 8-byte watchpoints to cover. I also added a unit test, WatchpointAlgorithmsTests, which has a number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I think there's interesting possible different approaches to how we cover these; I note in the unit test that a user requesting a watch on address 0x12e0 of 120 bytes will be covered by two watchpoints today, a 128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer false positives/private stops. As we try refining this one, it's helpful to have a collection of tests to make sure things don't regress. I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64 Ubuntu. I have not modifed the Windows process plugins yet, I might try that as a standalone patch, I'd be making the change blind, but the necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty small so it might be obvious enough that I can change it and see what the Windows CI thinks. There isn't yet a packet (or a qSupported feature query) for the gdb remote serial protocol stub to communicate its watchpoint capabilities to lldb. I'll be doing that in a patch right after this is landed, having debugserver advertise its capability of AArch64 MASK watchpoints, and have ProcessGDBRemote add eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch larger than 32-byte requests on Darwin. I haven't yet tackled WatchpointResource *sharing* by multiple Watchpoints. This is all part of the goal, especially when we may be watching a larger memory range than the user requested, if they then add another watchpoint next to their first request, it may be covered by the same WatchpointResource (hardware watchpoint register). Also one "read" watchpoint and one "write" watchpoint on the same memory granule need to be handled, making the WatchpointResource cover all requests. As WatchpointResources aren't shared among multiple Watchpoints yet, there's no handling of running the conditions/commands/etc on multiple Watchpoints when their shared WatchpointResource is hit. The goal beyond "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint behavior and commands. I have a feeling I may be slowly chipping away at this for a while. rdar://108234227
Configuration menu - View commit details
-
Copy full SHA for 57c66b3 - Browse repository at this point
Copy the full SHA 57c66b3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 35a0089 - Browse repository at this point
Copy the full SHA 35a0089View commit details -
[Libomptarget] Remove handling of old ctor / dtor entries (#80153)
Summary: A previous patch removed creating these entries in clang in favor of the backend emitting a callable kernel and having the runtime call that if present. The support for the old style was kept around in LLVM 18.0 but now that we have forked to 19.0 we should remove the support. The effect of this would be that an application linking against a newer libomptarget that still had the old constructors will no longer be called. In that case, they can either recompile or use the `libomptarget.so.18` that comes with the previous release.
Configuration menu - View commit details
-
Copy full SHA for 2542876 - Browse repository at this point
Copy the full SHA 2542876View commit details -
[libc++abi] Add temporary workaround to unblock Chrome
Chrome rolls libc++ and libc++abi as separate projects. As a result, they may not always be updated in lockstep, and this can lead to build failures when mixing libc++ that doesn't have <__thread/support.h> with libc++abi that requires it. This patch adds a workaround to make libc++abi work with both versions. While Chrome's setup is not supported, this workaround will allow them to go back to green and do the required work needed to roll libc++ and libc++abi in lockstep. This workaround will be short-lived -- I have a reminder to go back and remove it by EOW.
Configuration menu - View commit details
-
Copy full SHA for 372f7dd - Browse repository at this point
Copy the full SHA 372f7ddView commit details -
Add extra printing to TestWatchpointCount.py to debug CI fail
The way the locals are laid out on the stack on x86-64 Debian is resulting in a test failure with the new large watchpoint support. Collecting more logging before I revert/debug it.
Configuration menu - View commit details
-
Copy full SHA for dad50fe - Browse repository at this point
Copy the full SHA dad50feView commit details -
[DirectX][docs] Architecture and design philosophy of DXIL support
This documents some of the architectural direction for DXIL and tries to provide a bit of a map for where to implement different aspects of DXIL support. Pull Request: llvm/llvm-project#78221
Configuration menu - View commit details
-
Copy full SHA for 151559c - Browse repository at this point
Copy the full SHA 151559cView commit details -
[SYCL][ESIMD] Implement unified memory API for scatter(usm, ...) (int…
…el#12510) This implements the unified memory API for scatter with USM pointers. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 0bf2e66 - Browse repository at this point
Copy the full SHA 0bf2e66View commit details -
[lld] enable fixup chains by default (#79894)
Enable chained fixups in lld when all platform and version criteria are met. This is an attempt at simplifying the logic used in ld 907: https://github.com/apple-oss-distributions/ld64/blob/93d74eafc37c0558b4ffb88a8bc15c17bed44a20/src/ld/Options.cpp#L5458-L5549 Some changes were made to simplify the logic: - only enable chained fixups for macOS from 13.0 to avoid the arch check - only enable chained fixups for iphonesimulator from 16.0 to avoid the arch check - don't enable chained fixups for not specifically listed platforms - don't enable chained fixups for arm64_32
Configuration menu - View commit details
-
Copy full SHA for 775c285 - Browse repository at this point
Copy the full SHA 775c285View commit details -
Collecting more logging to debug CI bots
Watchpoint test fails on arm-ubuntu and x86-64-debian
Configuration menu - View commit details
-
Copy full SHA for cf2533e - Browse repository at this point
Copy the full SHA cf2533eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 09fc333 - Browse repository at this point
Copy the full SHA 09fc333View commit details -
Add logging to WatchpointAlgorithm
When verbose lldb watch channel is enabled, print the user requested watchpoint and the resources we've broken it up into.
Configuration menu - View commit details
-
Copy full SHA for d6e1ae2 - Browse repository at this point
Copy the full SHA d6e1ae2View commit details -
[CI][NFC] Unify naming scheme for SYCL workflows. (intel#12525)
All GitHub Actions workflows added by intel/llvm project follow similar naming notation: 1. Name starts with `sycl` prefix. 2. Use dash `-` instead of underscore `_` to separate words.
Configuration menu - View commit details
-
Copy full SHA for 16a368c - Browse repository at this point
Copy the full SHA 16a368cView commit details -
Configuration menu - View commit details
-
Copy full SHA for fa42589 - Browse repository at this point
Copy the full SHA fa42589View commit details -
Revert "[CI][NFC] Unify naming scheme for SYCL workflows." (intel#12567)
Reverts intel#12525 In addition to file renaming, we need to update file names referenced inside the workflow files.
Configuration menu - View commit details
-
Copy full SHA for 1b5daa8 - Browse repository at this point
Copy the full SHA 1b5daa8View commit details -
[SYCL][ESIMD][E2E] Disable two LSC tests on DG2 (intel#12565)
They started failing in the recent driver update. I can't reproduce it locally with the same driver version but the hardware we have is a little different, maybe that's why. I made an internal tracker for this. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 7348207 - Browse repository at this point
Copy the full SHA 7348207View commit details -
[clang-tidy] Remove cert-dcl21-cpp check (#80181)
Deprecated since clang-tidy 17. The rule DCL21-CPP has been removed from the CERT guidelines, so it does not make sense to keep the check. Fixes #42788 Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
Configuration menu - View commit details
-
Copy full SHA for 4cb13f2 - Browse repository at this point
Copy the full SHA 4cb13f2View commit details -
[lldb][progress][NFC] Add unit test for progress reports (#79533)
This test is being added as a way to check the behaviour of how progress events are broadcasted when reports are started and ended with the current implementation of progress reports. Here we're mainly checking and ensuring that the current behaviour is that progress events are broadcasted individually and placed in the event queue in order of their creation and deletion.
Configuration menu - View commit details
-
Copy full SHA for 51e0d1b - Browse repository at this point
Copy the full SHA 51e0d1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for c84f2ba - Browse repository at this point
Copy the full SHA c84f2baView commit details -
[flang] DEALLOCATE(pointer) should use PointerDeallocate() (#79702)
A DEALLOCATE statement on a pointer should always use PointerDeallocate() in the runtime, even if there's no STAT= or polymorphism or derived types, so that it can be checked to ensure that it is indeed a whole allocation of a pointer.
Configuration menu - View commit details
-
Copy full SHA for dc15524 - Browse repository at this point
Copy the full SHA dc15524View commit details -
[flang][runtime] Add limit check to MOD/MODULO (#80026)
When testing the arguments to see whether they are integers, check first that they are within the maximum range of a 64-bit integer; otherwise, a value of larger magnitude will set an invalid operand exception flag.
Configuration menu - View commit details
-
Copy full SHA for dbf547f - Browse repository at this point
Copy the full SHA dbf547fView commit details -
[flang][preprocessor] Replace macros in some #include directives (#80…
…039) Ensure that #include FOO undergoes macro replacement. But, as is the case with C/C++, continue to not perform macro replacement in a #include directive with <angled brackets>.
Configuration menu - View commit details
-
Copy full SHA for 6086007 - Browse repository at this point
Copy the full SHA 6086007View commit details -
[flang] Downgrade a too-strong error message to a warning (#80095)
When a compilation unit has an interface to an external subroutine or function, and there is a global object (like a module) with the same name, we're emitting an error. This is too strong, the program will still build. This comes up in real applications, too. Downgrade the error to a warning.
Configuration menu - View commit details
-
Copy full SHA for 2ba94bf - Browse repository at this point
Copy the full SHA 2ba94bfView commit details -
Revert "[lldb][progress][NFC] Add unit test for progress reports (#79…
…533)" This reverts commit 51e0d1b. That commit breaks a unit test: ``` Failed Tests (1): lldb-unit :: Core/./LLDBCoreTests/4/8 ```
Configuration menu - View commit details
-
Copy full SHA for 209fe1f - Browse repository at this point
Copy the full SHA 209fe1fView commit details -
[SYCL] Fix resource leak related to SYCL_FALLBACK_ASSERT (intel#12532)
intel#6837 enabled asynchronous buffer destruction for buffers constructed without host data. However, initial fallback assert implementation in intel#3767 predates it and as such had to place the buffer inside `queue_impl` to avoid unintended synchronization point. I don't know if there was the same crash observed on the end-to-end test added as part of this PR prior to intel#3767, but it doesn't even matter because the "new" implementation is both simpler and doesn't result in a crash. I suspect that without it (with the buffer for fallback assert implementation being a data member of `sycl::queue_impl`) we had a cyclic dependency somewhere leading to resource leak and ultimately to the assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`.
Configuration menu - View commit details
-
Copy full SHA for b478d2f - Browse repository at this point
Copy the full SHA b478d2fView commit details -
Fix conflict resolution fa36da7
The conflict resoultion removed sycl related changes, this is to bring it back.
Configuration menu - View commit details
-
Copy full SHA for 99852c0 - Browse repository at this point
Copy the full SHA 99852c0View commit details -
Revert "Add one more verbose watchpoint logging for arm-ubuntu"
This reverts commit c84f2ba.
Configuration menu - View commit details
-
Copy full SHA for 9d41fba - Browse repository at this point
Copy the full SHA 9d41fbaView commit details -
Revert "Enable verbose watch log channel to debug x86-64-debian bot"
This reverts commit fa42589.
Configuration menu - View commit details
-
Copy full SHA for 19f429a - Browse repository at this point
Copy the full SHA 19f429aView commit details -
Revert "Add logging to WatchpointAlgorithm"
This reverts commit d6e1ae2.
Configuration menu - View commit details
-
Copy full SHA for e95250c - Browse repository at this point
Copy the full SHA e95250cView commit details -
Revert "Collecting more logging to debug CI bots"
This reverts commit cf2533e.
Configuration menu - View commit details
-
Copy full SHA for 46643e0 - Browse repository at this point
Copy the full SHA 46643e0View commit details -
Revert "Add extra printing to TestWatchpointCount.py to debug CI fail"
This reverts commit dad50fe.
Configuration menu - View commit details
-
Copy full SHA for cc4af03 - Browse repository at this point
Copy the full SHA cc4af03View commit details -
Revert "[lldb] Add support for large watchpoints in lldb (#79962)"
This reverts commit 57c66b3.
Configuration menu - View commit details
-
Copy full SHA for d347c56 - Browse repository at this point
Copy the full SHA d347c56View commit details -
[SYCL][E2E] Disable USM/usm_pooling.cpp on gpu-intel-dg2 (intel#12564)
See intel#12397, the test is flaky in post-commit.
Configuration menu - View commit details
-
Copy full SHA for 85e461e - Browse repository at this point
Copy the full SHA 85e461eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 742f88e - Browse repository at this point
Copy the full SHA 742f88eView commit details -
[clang][DependencyScanner] Remove unused -fmodule-map-file arguments …
…(#80090) Since we already add a `-fmodule-map-file=` argument for every used modulemap, we can remove all `ModuleMapFiles` entries before adding them. This reduces the number of module variants when `-fmodule-map-file=` appears on the original command line.
Configuration menu - View commit details
-
Copy full SHA for c003d85 - Browse repository at this point
Copy the full SHA c003d85View commit details -
[LSR] Add a test case mentioned in review
As mentioned in llvm/llvm-project#74747, this case is triggering a particularly high cost trip count expansion.
Configuration menu - View commit details
-
Copy full SHA for 5282202 - Browse repository at this point
Copy the full SHA 5282202View commit details -
[Github] Build PGO optimized toolchain in container (#80096)
This patch adjusts the Docker container intended for CI use to contain a PGO+ThinLTO+BOLT optimized clang. The toolchain is built within a Github action and takes ~3.5 hours. No caching is utilized. The current PGO optimization is fairly minimal, only running clang over hello world. This can be adjusted as needed.
Configuration menu - View commit details
-
Copy full SHA for 9107904 - Browse repository at this point
Copy the full SHA 9107904View commit details -
[ORC] Merge MaterializationResponsibility notifyEmitted and addDepend…
…encies Removes the MaterializationResponsibility::addDependencies and addDependenciesForAll methods, and transfers dependency registration to the notifyEmitted operation. The new dependency registration allows dependencies to be specified for arbitrary subsets of the MaterializationResponsibility's symbols (rather than just single symbols or all symbols) via an array of SymbolDependenceGroups (pairs of symbol sets and corresponding dependencies for that set). This patch aims to both improve emission performance and simplify dependence tracking. By eliminating some states (e.g. symbols having registered dependencies but not yet being resolved or emitted) we make some errors impossible by construction, and reduce the number of error cases that we need to check. NonOwningSymbolStringPtrs are used for dependence tracking under the session lock, which should reduce ref-counting operations, and intra-emit dependencies are resolved outside the session lock, which should provide better performance when JITing concurrently (since some dependence tracking can happen in parallel). The Orc C API is updated to account for this change, with the LLVMOrcMaterializationResponsibilityNotifyEmitted API being modified and the LLVMOrcMaterializationResponsibilityAddDependencies and LLVMOrcMaterializationResponsibilityAddDependenciesForAll operations being removed.
Configuration menu - View commit details
-
Copy full SHA for ebe8733 - Browse repository at this point
Copy the full SHA ebe8733View commit details -
[libc] Fix condition ordering in scanf (#80083)
The inf and nan string index bounds checks were after the index was being used. This patch moves the index usage to the end of the condition. Fixes #79988
Configuration menu - View commit details
-
Copy full SHA for 22773e5 - Browse repository at this point
Copy the full SHA 22773e5View commit details -
[AIX] [XCOFF] Add support for common and local common symbols in the …
…TOC (#79530) This patch adds support for common and local symbols in the TOC for AIX. Note that we need to update isVirtualSection so as a common symbol in TOC will have the symbol type XTY_CM and will be initialized when placed in the TOC so sections with this type are no longer virtual. --------- Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for a03a6e9 - Browse repository at this point
Copy the full SHA a03a6e9View commit details -
[analyzer] Unbreak [[clang::suppress]] on checkers without decl-with-…
…issue. (#79398) There are currently a few checkers that don't fill in the bug report's "decl-with-issue" field (typically a function in which the bug is found). The new attribute `[[clang::suppress]]` uses decl-with-issue to reduce the size of the suppression source range map so that it didn't need to do that for the entire translation unit. I'm already seeing a few problems with this approach so I'll probably redesign it in some point as it looks like a premature optimization. Not only checkers shouldn't be required to pass decl-with-issue (consider clang-tidy checkers that never had such notion), but also it's not necessarily uniquely determined (consider leak suppressions at allocation site). For now I'm adding a simple stop-gap solution that falls back to building the suppression map for the entire TU whenever decl-with-issue isn't specified. Which won't happen in the default setup because luckily all default checkers do provide decl-with-issue. --------- Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 56e241a - Browse repository at this point
Copy the full SHA 56e241aView commit details -
[AArch64][SVE2] Generate urshr rounding shift rights (#78374)
Add a new node `AArch64ISD::URSHR_I_PRED`. `srl(add(X, 1 << (ShiftValue - 1)), ShiftValue)` is transformed to `urshr`, or to `rshrnb` (as before) if the result it truncated. `uzp1(rshrnb(uunpklo(X),C), rshrnb(uunpkhi(X), C))` is converted to `urshr(X, C)` (tested by the wide_trunc tests). Pattern matching code in `canLowerSRLToRoundingShiftForVT` is taken from prior code in rshrnb. It returns true if the add has NUW or if the number of bits used in the return value allow us to not care about the overflow (tested by rshrnb test cases).
Configuration menu - View commit details
-
Copy full SHA for 1d14323 - Browse repository at this point
Copy the full SHA 1d14323View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4eee045 - Browse repository at this point
Copy the full SHA 4eee045View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f728a0 - Browse repository at this point
Copy the full SHA 0f728a0View commit details -
[NVPTX] improve Boolean ISel (#80166)
Add TableGen patterns to convert more instructions to boolean expressions: - **mul -> and/or**: i1 multiply instructions currently cannot be selected causing the compiler to crash. See llvm/llvm-project#57404 - **select -> and/or**: Converting selects to and/or can enable more optimizations. `InstCombine` cannot do this as aggressively due to poison semantics.
Configuration menu - View commit details
-
Copy full SHA for 5e3ae4c - Browse repository at this point
Copy the full SHA 5e3ae4cView commit details -
[RISCV] Improve legalization of e8 m8 VL>256 shuffles (#79330)
If we can't produce a large enough index vector in i8, we may need to legalize the shuffle (via scalarization - which in turn gets lowered into stack usage). This change makes two related changes: * Deferring legalization until we actually need to generate the vrgather instruction. With the new recursive structure, this only happens when doing the fallback for one of the arms. * Check the actual mask values for something outside of the representable range. Both are covered by recently added tests.
Configuration menu - View commit details
-
Copy full SHA for ff53d50 - Browse repository at this point
Copy the full SHA ff53d50View commit details -
[lldb][NFCI] Remove m_being_created from Breakpoint classes (#79716)
The purpose of m_being_created in these classes was to prevent broadcasting an event related to these Breakpoints during the creation of the breakpoint (i.e. in the constructor). In Breakpoint and Watchpoint, m_being_created had no effect. That is to say, removing it does not change behavior. However, BreakpointLocation does still use m_being_created. In the constructor, SetThreadID is called which does broadcast an event only if `m_being_created` is false. Instead of having this logic be roundabout, the constructor instead calls `SetThreadIDInternal`, which actually changes the thread ID. `SetThreadID` also will call `SetThreadIDInternal` in addition to broadcasting a changed event.
Configuration menu - View commit details
-
Copy full SHA for db68e92 - Browse repository at this point
Copy the full SHA db68e92View commit details -
[lsr][term-fold] Restrict transform to low cost expansions (#74747)
This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.
Configuration menu - View commit details
-
Copy full SHA for f264da4 - Browse repository at this point
Copy the full SHA f264da4View commit details -
Partial revert "[HIP] Fix -mllvm option for device lld linker" (#80202)
This partially reverts commit aa964f1 because it caused perf regressions in rccl due to drop of -mllvm -amgpu-kernarg-preload-count=16 from the linker step. Potentially it could cause similar regressions for other HIP apps using -mllvm options with -fgpu-rdc. Fixes: SWDEV-443345
Configuration menu - View commit details
-
Copy full SHA for 7c2e32d - Browse repository at this point
Copy the full SHA 7c2e32dView commit details -
[CI][NFC] Unify naming scheme for SYCL workflows. (intel#12568)
All GitHub Actions workflows added by intel/llvm project are expected to use following naming notation: 1. Name starts with `sycl` prefix. 2. Use dash `-` to separate words (instead of underscore `_`). This patches fixes naming of workflows which do not follow this notation.
Configuration menu - View commit details
-
Copy full SHA for 435845b - Browse repository at this point
Copy the full SHA 435845bView commit details -
Reland "[lldb][progress][NFC] Add unit test for progress reports (#79…
…533)" This reverts commit 209fe1f. The original commit failed to due an assertion failure in the unit test `ProgressReportTest` that the commit added. The Debugger::Initialize() function was called more than once which triggered the assertion, so this commit calls that function under a `std::call_once`.
Configuration menu - View commit details
-
Copy full SHA for a5a8cbb - Browse repository at this point
Copy the full SHA a5a8cbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5561bea - Browse repository at this point
Copy the full SHA 5561beaView commit details -
Revert "Reland "[lldb][progress][NFC] Add unit test for progress repo…
…rts (#79533)"" This reverts commit a5a8cbb. The test being added by that commit still fails on the assertion that Debugger::Initialize has been called more than once.
Configuration menu - View commit details
-
Copy full SHA for 40ebe52 - Browse repository at this point
Copy the full SHA 40ebe52View commit details -
[RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. (#80119)
We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.
Configuration menu - View commit details
-
Copy full SHA for cf401f7 - Browse repository at this point
Copy the full SHA cf401f7View commit details -
[libc][docs] fix stdbit.h docs (#80070)
Fix rst comment, add checks for recently implemented functions+macro.
Configuration menu - View commit details
-
Copy full SHA for 0e0d155 - Browse repository at this point
Copy the full SHA 0e0d155View commit details
Commits on Feb 1, 2024
-
[libc] Fix read under msan (#80203)
The read function wasn't properly unpoisoning its result under msan, causing test failures downstream when I tried to roll it out. This patch adds the msan unpoison call that fixes the issue.
Configuration menu - View commit details
-
Copy full SHA for 0e8eb44 - Browse repository at this point
Copy the full SHA 0e8eb44View commit details -
[mlir][Vector] Add support for sub-byte transpose emulation (#80110)
This PR adds patterns to convert a sub-byte vector transpose into a sequence of instructions that perform the transpose on i8 vector elements. Whereas this rewrite may not lead to the absolute peak performance, it should ensure correctness when dealing with sub-byte transposes.
Configuration menu - View commit details
-
Copy full SHA for 8ba018d - Browse repository at this point
Copy the full SHA 8ba018dView commit details -
[mlir][arith] Improve
truncf
folding (#80206)* Use APFloat conversion function instead of going through double to check if fold results in information loss. * Support folding vector constants.
Configuration menu - View commit details
-
Copy full SHA for 730f498 - Browse repository at this point
Copy the full SHA 730f498View commit details -
[llvm-objcopy][test] Use llvm-readelf instead for clearer visualizati…
…on(NFC) (#79874)
Configuration menu - View commit details
-
Copy full SHA for f8be7f2 - Browse repository at this point
Copy the full SHA f8be7f2View commit details -
[clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101)
So that it can be used by clang-format.
Configuration menu - View commit details
-
Copy full SHA for a8279a8 - Browse repository at this point
Copy the full SHA a8279a8View commit details -
[clang-format] Simplify the AfterPlacementOperator option (#79796)
Change AfterPlacementOperator to a boolean and deprecate SBPO_Never, which meant never inserting a space except when after new/delete. Fixes #78892.
Configuration menu - View commit details
-
Copy full SHA for 908fd09 - Browse repository at this point
Copy the full SHA 908fd09View commit details -
Configuration menu - View commit details
-
Copy full SHA for 994493c - Browse repository at this point
Copy the full SHA 994493cView commit details -
[clang][dataflow] Display line numbers in the HTML logger timeline. (…
…#80130) This makes it easier to count how many iterations an analysis takes to complete. It also makes it easier to compare how a change to the analysis code affects the timeline. Here's a sample screenshot: ![image](https://github.com/llvm/llvm-project/assets/29098113/b3f44b4d-7037-4f28-9532-5418663250e1)
Configuration menu - View commit details
-
Copy full SHA for 0c36127 - Browse repository at this point
Copy the full SHA 0c36127View commit details -
[lldb] Add support for large watchpoints in lldb (#79962)
This patch is the next piece of work in my Large Watchpoint proposal, https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116 This patch breaks a user's watchpoint into one or more WatchpointResources which reflect what the hardware registers can cover. This means we can watch objects larger than 8 bytes, and we can watched unaligned address ranges. On a typical 64-bit target with 4 watchpoint registers you can watch 32 bytes of memory if the start address is doubleword aligned. Additionally, if the remote stub implements AArch64 MASK style watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2 size region of memory up to 2GB, aligned to that same size. I updated the Watchpoint constructor and CommandObjectWatchpoint to create a CompilerType of Array<UInt8> when the size of the watched region is greater than pointer-size and we don't have a variable type to use. For pointer-size and smaller, we can display the watched granule as an integer value; for larger-than-pointer-size we will display as an array of bytes. I have `watchpoint list` now print the WatchpointResources used to implement the watchpoint. I added a WatchpointAlgorithm class which has a top-level static method that takes an enum flag mask WatchpointHardwareFeature and a user address and size, and returns a vector of WatchpointResources covering the request. It does not take into account the number of watchpoint registers the target has, or the number still available for use. Right now there is only one algorithm, which monitors power-of-2 regions of memory. For up to pointer-size, this is what Intel hardware supports. AArch64 Byte Address Select watchpoints can watch any number of contiguous bytes in a pointer-size memory granule, that is not currently supported so if you ask to watch bytes 3-5, the algorithm will watch the entire doubleword (8 bytes). The newly default "modify" style means we will silently ignore modifications to bytes outside the watched range. I've temporarily skipped TestLargeWatchpoint.py for all targets. It was only run on Darwin when using the in-tree debugserver, which was a proxy for "debugserver supports MASK watchpoints". I'll be adding the aforementioned feature flag from the stub and enabling full mask watchpoints when a debugserver with that feature is enabled, and re-enable this test. I added a new TestUnalignedLargeWatchpoint.py which only has one test but it's a great one, watching a 22-byte range that is unaligned and requires four 8-byte watchpoints to cover. I also added a unit test, WatchpointAlgorithmsTests, which has a number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I think there's interesting possible different approaches to how we cover these; I note in the unit test that a user requesting a watch on address 0x12e0 of 120 bytes will be covered by two watchpoints today, a 128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer false positives/private stops. As we try refining this one, it's helpful to have a collection of tests to make sure things don't regress. I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64 Ubuntu. I have not modifed the Windows process plugins yet, I might try that as a standalone patch, I'd be making the change blind, but the necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty small so it might be obvious enough that I can change it and see what the Windows CI thinks. There isn't yet a packet (or a qSupported feature query) for the gdb remote serial protocol stub to communicate its watchpoint capabilities to lldb. I'll be doing that in a patch right after this is landed, having debugserver advertise its capability of AArch64 MASK watchpoints, and have ProcessGDBRemote add eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch larger than 32-byte requests on Darwin. I haven't yet tackled WatchpointResource *sharing* by multiple Watchpoints. This is all part of the goal, especially when we may be watching a larger memory range than the user requested, if they then add another watchpoint next to their first request, it may be covered by the same WatchpointResource (hardware watchpoint register). Also one "read" watchpoint and one "write" watchpoint on the same memory granule need to be handled, making the WatchpointResource cover all requests. As WatchpointResources aren't shared among multiple Watchpoints yet, there's no handling of running the conditions/commands/etc on multiple Watchpoints when their shared WatchpointResource is hit. The goal beyond "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint behavior and commands. I have a feeling I may be slowly chipping away at this for a while. Re-landing this patch after fixing two undefined behaviors in WatchpointAlgorithms found by UBSan and by failures on different CI bots. rdar://108234227
Configuration menu - View commit details
-
Copy full SHA for 147d7a6 - Browse repository at this point
Copy the full SHA 147d7a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 19a10c1 - Browse repository at this point
Copy the full SHA 19a10c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 995d21b - Browse repository at this point
Copy the full SHA 995d21bView commit details -
[llvm-gsymutil] Print one-time DWO file missing warning under --quiet…
… flag (#79882) FileCheck test added ``` ./bin/llvm-lit -sv llvm/test/tools/llvm-gsymutil/X86/elf-dwo.yaml ``` Manual test steps: - Create binary with split-dwarf: ``` clang++ -g -gdwarf-4 -gsplit-dwarf main.cpp -o main_split ``` - Remove or remane the dwo file to a different name so llvm-gsymutil can't find it ``` mv main_split-main.dwo main_split-main__.dwo ``` - Now run llvm-gsymutil conversion, it should print out warning with and without the `--quiet` flag ``` $ ./bin/llvm-gsymutil --convert=./main_split Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for main_split-main.dwo Loaded 0 functions from DWARF. Loaded 12 functions from symbol table. Pruned 0 functions, ended with 12 total ``` ``` $ ./bin/llvm-gsymutil --convert=./main_split --quiet Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for some object files. (Remove the --quiet flag for full output) Pruned 0 functions, ended with 12 total ```
Configuration menu - View commit details
-
Copy full SHA for 5a8f290 - Browse repository at this point
Copy the full SHA 5a8f290View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b76b86 - Browse repository at this point
Copy the full SHA 3b76b86View commit details -
Configuration menu - View commit details
-
Copy full SHA for c82a645 - Browse repository at this point
Copy the full SHA c82a645View commit details -
[C++20] [Modules] Introduce -fskip-odr-check-in-gmf (#79959)
Close llvm/llvm-project#79240 Cite the comment from @mizvekov in //github.com/llvm/llvm-project/issues/79240: > There are two kinds of bugs / issues relevant here: > > Clang bugs that this change hides > Here we can add a Frontend flag that disables the GMF ODR check, just > so > we can keep tracking, testing and fixing these issues. > The Driver would just always pass that flag. > We could add that flag in this current issue. > Bugs in user code: > I don't think it's worth adding a corresponding Driver flag for > controlling the above Frontend flag, since we intend it's behavior to > become default as we fix the problems, and users interested in testing > the more strict behavior can just use the Frontend flag directly. This patch follows the suggestion: - Introduce the CC1 flag `-fskip-odr-check-in-gmf` which is by default off, so that the every existing test will still be tested with checking ODR violations. - Passing `-fskip-odr-check-in-gmf` in the driver to keep the behavior we intended. - Edit the document to tell the users who are still interested in more strict checks can use `-Xclang -fno-skip-odr-check-in-gmf` to get the existing behavior.
Configuration menu - View commit details
-
Copy full SHA for 8eea582 - Browse repository at this point
Copy the full SHA 8eea582View commit details -
[clang-tidy] Add AllowStringArrays option to modernize-avoid-c-arrays…
… (#71701) Add AllowStringArrays option, enabling the exclusion of array types with deduced sizes constructed from string literals. This includes only var declarations of array of characters constructed directly from c-strings. Closes #59475
Configuration menu - View commit details
-
Copy full SHA for b777bb7 - Browse repository at this point
Copy the full SHA b777bb7View commit details -
[clang-format] Allow decltype in requires clause (#78847)
If clang-format is not sure whether a `requires` keyword starts a requires clause or a requires expression, it looks ahead to see if any token disqualifies it from being a requires clause. Among these tokens was `decltype`, since it fell through the switch. This patch allows decltype to exist in a require clause. I'm not 100% sure this change won't have repercussions, but that just means we need more test coverage! Fixes llvm/llvm-project#78645
Configuration menu - View commit details
-
Copy full SHA for 9b68c09 - Browse repository at this point
Copy the full SHA 9b68c09View commit details -
Skip 2 of the three test sets to narrow down the arm-ubuntu
CI bot crash when running this unittest. The printfs aren't printing into the CI log output.
Configuration menu - View commit details
-
Copy full SHA for fdd98e5 - Browse repository at this point
Copy the full SHA fdd98e5View commit details -
[clang][Interp] complex binary operators aren't always initializing
The added test case would trigger the removed assertion.
Configuration menu - View commit details
-
Copy full SHA for a8f317a - Browse repository at this point
Copy the full SHA a8f317aView commit details -
[Github] Build stage2-clang-bolt target for CI container
Only the stage2-distribution target is built by default for the stage2 distribution installation target. This means that we don't get a BOLT optimized binary. This patch explicitly builds the stage2-clang-bolt target before the distribution installation target so that the clang binary is optimized before it gets installed.
Configuration menu - View commit details
-
Copy full SHA for 5d9ffcd - Browse repository at this point
Copy the full SHA 5d9ffcdView commit details -
[clang][Interp] Handle imaginary literals (#79130)
Initialize the first element to 0 and the second element to the value of the subexpression.
Configuration menu - View commit details
-
Copy full SHA for 6ff431b - Browse repository at this point
Copy the full SHA 6ff431bView commit details -
[X86][CodeGen] Set mayLoad = 1 for LZCNT/POPCNT/TZCNTrm_(EVEX|NF)
Promoted and NF LZCNT/POPCNT/TZCNT were supported in #79954. B/c null_frag is used in the patterns for these variants, tablgen can not infer mayLoad = 1 for them. This can be tested by MCA tests, which will be added after -mcpu=<cpu_with_apx> is supported.
Configuration menu - View commit details
-
Copy full SHA for 1395e58 - Browse repository at this point
Copy the full SHA 1395e58View commit details -
Configuration menu - View commit details
-
Copy full SHA for 021a2b4 - Browse repository at this point
Copy the full SHA 021a2b4View commit details -
[clang][Interp] Protect Inc/Dec ops against dummy pointers
We create them more often in C, so it's more likely to happen there.
Configuration menu - View commit details
-
Copy full SHA for a9e8309 - Browse repository at this point
Copy the full SHA a9e8309View commit details -
Configuration menu - View commit details
-
Copy full SHA for fa98e28 - Browse repository at this point
Copy the full SHA fa98e28View commit details -
[clang][Interp] Support GenericSelectionExprs
Just delegate to the resulting expression.
Configuration menu - View commit details
-
Copy full SHA for 48f8b74 - Browse repository at this point
Copy the full SHA 48f8b74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 54f324f - Browse repository at this point
Copy the full SHA 54f324fView commit details -
[mlir] Use
create
instead ofcreateOrFold
for ConstantOp as foldi……ng has no effect (NFC) (#80129) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails.
Configuration menu - View commit details
-
Copy full SHA for 65066c0 - Browse repository at this point
Copy the full SHA 65066c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ec996d - Browse repository at this point
Copy the full SHA 7ec996dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e851278 - Browse repository at this point
Copy the full SHA e851278View commit details -
Configuration menu - View commit details
-
Copy full SHA for 39fa304 - Browse repository at this point
Copy the full SHA 39fa304View commit details -
Configuration menu - View commit details
-
Copy full SHA for b67ce7e - Browse repository at this point
Copy the full SHA b67ce7eView commit details -
Done iterating with arm-ubuntu bot, I see the problem test.
Go back to the original form of this file before I add temp workaround.
Configuration menu - View commit details
-
Copy full SHA for eaa3d5e - Browse repository at this point
Copy the full SHA eaa3d5eView commit details -
Skip two WatchpointAlgorithm tests for 32-bit lldb's
After iterating with the arm-ubuntu CI bot, I found the crash (a std::bad_alloc exception being thrown) was caused by these two entries when built on a 32-bit machine. I probably have an assumption about size_t being 64-bits in WatchpointAlgorithms and we have a problem when it's actually 32-bits and we're dealing with a real 64-bit address. All of the cases where the address can be represented in the low 32-bits of the addr_t work correctly, so for now I'm skipping these two unit tests when building lldb on a 32-bit host until I can review that method and possibly switch to explicit uin64_t's. .
Configuration menu - View commit details
-
Copy full SHA for 90e6808 - Browse repository at this point
Copy the full SHA 90e6808View commit details -
[mlir][Transforms]
GreedyPatternRewriteDriver
: Hash ops separately ……(#78312) The greedy pattern rewrite driver has multiple "expensive checks" to detect invalid rewrite pattern API usage. As part of these checks, it computes fingerprints for every op that is in scope, and compares the fingerprints before and after an attempted pattern application. Until now, each computed fingerprint took into account all nested operations. That is quite expensive because it walks the entire IR subtree. It is also redundant in the expensive checks because we already compute a fingerprint for every op. This commit significantly improves the running time of the "expensive checks" in the greedy pattern rewrite driver.
Configuration menu - View commit details
-
Copy full SHA for 5fdf8c6 - Browse repository at this point
Copy the full SHA 5fdf8c6View commit details -
[flang][NFC] Cache derived type translation in lowering (#80179)
Derived type translation is proving expensive in modern fortran apps with many big derived types with dozens of components and parents. Extending the cache that prevent recursion is proving to have little cost on apps with small derived types and significant gain (can divide compile time by 2) on modern fortran apps. It is legal since the cache lifetime is less than the MLIRContext lifetime that owns the cached mlir::Type. Doing so also exposed that the current caching was incorrect, the type symbol is the same for kind parametrized derived types regardless of the kind parameters. Instances with different kinds should lower to different MLIR types. See added test. Using the type scopes fixes the problem.
Configuration menu - View commit details
-
Copy full SHA for 84564e1 - Browse repository at this point
Copy the full SHA 84564e1View commit details -
[Clang][test] Limit library search when linking shared lib (#80253)
Don't search for unnecessary libs when linking the shared lib. This allows the test to run in chroot environment.
Configuration menu - View commit details
-
Copy full SHA for ae931b4 - Browse repository at this point
Copy the full SHA ae931b4View commit details -
[mlir][EmitC] Add func, call and return operations and conversions (…
…#79612) This adds a `func`, `call` and `return` operation to the EmitC dialect, closely related to the corresponding operations of the Func dialect. In contrast to the operations of the Func dialect, the EmitC operations do not support multiple results. The `emitc.func` op features a `specifiers` argument that for example allows, with corresponding support in the emitter, to emit `inline static` functions. Furthermore, this adds patterns and a pass to convert the Func dialect to EmitC. A `func.func` op that is `private` is converted to `emitc.func` with a `"static"` specifier.
Configuration menu - View commit details
-
Copy full SHA for e7d40a8 - Browse repository at this point
Copy the full SHA e7d40a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0dbd50 - Browse repository at this point
Copy the full SHA d0dbd50View commit details -
[bazel] Merge TableGenGlobalISel into the tablegen target
These two are intertwined enough so it doesn't really make sense to have it standalone and hack around it by putting headers into both.
Configuration menu - View commit details
-
Copy full SHA for 468b239 - Browse repository at this point
Copy the full SHA 468b239View commit details -
[bazel] Put back the pieces of TableGenGlobalISel that unittests depe…
…nd on This is a mess and needs to be cleaned up some day.
Configuration menu - View commit details
-
Copy full SHA for 395c817 - Browse repository at this point
Copy the full SHA 395c817View commit details -
[llvm-exegesis] Replace --num-repetitions with --min-instructions (#7…
…7153) This patch replaces --num-repetitions with --min-instructions to make it more clear that the value refers to the minimum number of instructions in the final assembled snippet rather than the number of repetitions of the snippet. This patch also refactors some llvm-exegesis internal variable names to reflect the name change. Fixes #76890.
Configuration menu - View commit details
-
Copy full SHA for 415bf20 - Browse repository at this point
Copy the full SHA 415bf20View commit details -
Configuration menu - View commit details
-
Copy full SHA for ca7fd25 - Browse repository at this point
Copy the full SHA ca7fd25View commit details -
[flang][HLFIR] Relax verifiers of intrinsic operations (#80132)
The verifiers are currently very strict: requiring intrinsic operations to be used only in cases where the Fortran standard permits the intrinsic to be used. There have now been a lot of cases where these verifiers have caused bugs in corner cases. In a recent ticket, @jeanPerier pointed out that it could be useful for future optimizations if somewhat invalid uses of these operations could be allowed in dead code. See this comment: llvm/llvm-project#79995 (comment) In response to all of this, I have decided to relax the intrinsic operation verifiers. The intention is now to only disallow operation uses that are likely to crash the compiler. Other checks are still available under `-strict-intrinsic-verifier`. The disadvantage of this approach is that IR can now represent intrinsic invocations which are incorrect. The lowering and implementation of these intrinsic functions is unlikely to do the right thing in all of these cases, and as they should mostly be impossible to generate using normal Fortran code, these edge cases will see very little testing, before some new optimization causes them to become more common. Fixes #79995
Configuration menu - View commit details
-
Copy full SHA for e9e0167 - Browse repository at this point
Copy the full SHA e9e0167View commit details -
[Clang][AArch64] Add ACLE macros for FEAT_PAuth_LR (#80163)
This updates clang's target defines to include the ACLE changes covering the FEAT_PAuth_LR architecture extension. The changes include: * The new `__ARM_FEATURE_PAUTH_LR` feature macro, which is set to 1 when FEAT_PAuth_LR is available in the target. * A new bit field for the existing `__ARM_FEATURE_PAC_DEFAULT` macro, indicating the use of PC as a diversifier for Pointer Authentication (from -mbranch-protection=pac-ret+pc). The approved changes to the ACLE spec can be found here: ARM-software/acle#292
Configuration menu - View commit details
-
Copy full SHA for 1bbb797 - Browse repository at this point
Copy the full SHA 1bbb797View commit details -
[HWASAN] Remove DW_OP_LLVM_tag_offset from DIExpression::isImplicit (…
…#79816) According to its doc-comment `isImplicit` is meant to return true if the expression is an implicit location description (describes an object or part of an object which has no location by computing the value from available program state). There's a brief entry for `DW_OP_LLVM_tag_offset` in the LangRef and there's some info in the original commit fb9ce10. From what I can tell it doesn't look like `DW_OP_LLVM_tag_offset` affects whether or not the location is implicit; the opcode doesn't get included in the final location description but instead is added as an attribute to the variable. This was tripping an assertion in the latest application of the fix to #76545, #78606, where an expression containing a `DW_OP_LLVM_tag_offset` is split into a fragment (i.e., describe a part of the whole variable).
Configuration menu - View commit details
-
Copy full SHA for f34418c - Browse repository at this point
Copy the full SHA f34418cView commit details -
[GitHub][workflows] Reflow some text in buildbot info PR comment
When the markdown link renders the line gets a lot shorter.
Configuration menu - View commit details
-
Copy full SHA for 96a3d05 - Browse repository at this point
Copy the full SHA 96a3d05View commit details -
Configuration menu - View commit details
-
Copy full SHA for b5c0b67 - Browse repository at this point
Copy the full SHA b5c0b67View commit details -
[SCEVExp] Keep NUW/NSW if both original inc and isomporphic inc agree…
…. (#79512) We are replacing with a wider increment. If both OrigInc and IsomorphicInc are NUW/NSW, then we can preserve them on the wider increment; the narrower IsomorphicInc would wrap before the wider OrigInc, so the replacement won't make IsomorphicInc's uses more poisonous. PR: llvm/llvm-project#79512
Configuration menu - View commit details
-
Copy full SHA for da43733 - Browse repository at this point
Copy the full SHA da43733View commit details -
[libc++][memory] P2652R2: Disallow Specialization of `allocator_trait…
…s` (#79978) Implements P2652R2 <https://wg21.link/P2652R2>: - https://eel.is/c++draft/allocator.requirements.general - https://eel.is/c++draft/memory.syn - https://eel.is/c++draft/allocator.traits.general - https://eel.is/c++draft/allocator.traits.members - https://eel.is/c++draft/diff.cpp20.concepts - https://eel.is/c++draft/diff.cpp20.utilities --------- Co-authored-by: Zingam <zingam@outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 7d78ccf - Browse repository at this point
Copy the full SHA 7d78ccfView commit details -
Configuration menu - View commit details
-
Copy full SHA for ea29842 - Browse repository at this point
Copy the full SHA ea29842View commit details -
[SYCL][Fusion] Handle fusion leading to synchronization issues (intel…
…#12538) Do not allow fusion when one of the kernels has an explicit local size and it requires ID remapping, i.e., it has a different number of dimensions w.r.t. the fused ND-range or different global size in dimensions [2, N). In this case, two work-items belonging to the same work-group may not belong to the same work-group in the fused ND-range. Signed-off-by: Victor Perez <victor.perez@codeplay.com> --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for af448b0 - Browse repository at this point
Copy the full SHA af448b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for c105848 - Browse repository at this point
Copy the full SHA c105848View commit details -
[UR][CUDA] Use new variant of the enableCUDATracing function (intel#1…
…2521) oneapi-src/unified-runtime#1070 and intel#11952 introduced a new variant of the `enableCUDATracing` function that takes a context pointer parameter, replacing the parameterless variant of that function. The older variant will be removed from UR once this PR is merged.
Configuration menu - View commit details
-
Copy full SHA for e402523 - Browse repository at this point
Copy the full SHA e402523View commit details -
[SYCL][CUDA] Improved joint_matrix layout test coverage. (intel#12483)
Improved joint_matrix layout test coverage. The test framework that the cuda backend tests use has been updated to support all possible `joint_matrix` gemm API combinations, including all matrix layouts. the gemm header is backend agnostic; hence all backends could use this test framework in the future. This test framework can also act as an example to show how to deal with different layout combinations when computing a general GEMM. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for f9e4f10 - Browse repository at this point
Copy the full SHA f9e4f10View commit details -
[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll
Split out from #78417. Reviewers: topperc, asb, kito-cheng Reviewed By: asb Pull Request: llvm/llvm-project#79248
Configuration menu - View commit details
-
Copy full SHA for 178719e - Browse repository at this point
Copy the full SHA 178719eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4bdd647 - Browse repository at this point
Copy the full SHA 4bdd647View commit details -
Update LLVM version from 18 to 19 (intel#2315)
Original commit: KhronosGroup/SPIRV-LLVM-Translator@fd22f8e
Configuration menu - View commit details
-
Copy full SHA for b0c60b0 - Browse repository at this point
Copy the full SHA b0c60b0View commit details -
add support for out of bounds load/store (intel#2277)
Add support for load/store operations for a cooperative matrix such that original matrix shape is known and implementations are able to reason about how to deal with the out of bounds. CapabilityCooperativeMatrixCheckedInstructionsINTEL = 6192 CooperativeMatrixLoadCheckedINTEL = 6193 CooperativeMatrixStoreCheckedINTEL = 6194 Original commit: KhronosGroup/SPIRV-LLVM-Translator@b62cb55
Configuration menu - View commit details
-
Copy full SHA for 6f35f7c - Browse repository at this point
Copy the full SHA 6f35f7cView commit details -
add API to query error message by an error code (intel#2304)
The goal of the PR is to add API to SPIR-V LLVM Translator to query error message by an error code as discussed in intel#2298 A need and possible application is a way to generate human-readable error info by error codes returned by other SPIRV Translator API calls, including getSpirvReport(). Original commit: KhronosGroup/SPIRV-LLVM-Translator@afe1971
Configuration menu - View commit details
-
Copy full SHA for f0ac661 - Browse repository at this point
Copy the full SHA f0ac661View commit details -
Support llvm.frexp intrinsic translation (intel#2252)
Map @llvm.frexp intrinsic to OpenCL Extended Instruction frexp builtin. The difference in signatures and return values is covered by extracting/combining values from and into composite type. LLVM IR: { float %fract, i32 %exp } @llvm.frexp.f32.i32(float %val) SPIR-V: { float %fract } ExtInst frexp (float %val, i32 %exp) Original commit: KhronosGroup/SPIRV-LLVM-Translator@e8b2018
Configuration menu - View commit details
-
Copy full SHA for cccbd9e - Browse repository at this point
Copy the full SHA cccbd9eView commit details -
Fix SPIRVRegularizeLLVMBase::regularize fix for shl i1 and lshr i1 (i…
…ntel#2288) The translator failed assertion with V->user_empty() during regularize function when shl i1 or lshr i1 result is used. E.g. %2 = shl i1 %0 %1 store %2, ptr addrspace(1) @G.1, align 1 Instruction shl i1 is converted to lshr i32 which arithmetic have the same behavior. Original commit: KhronosGroup/SPIRV-LLVM-Translator@239fbd4
Configuration menu - View commit details
-
Copy full SHA for 6732fee - Browse repository at this point
Copy the full SHA 6732feeView commit details -
add initial support for CooperativeMatrixConstructCheckedINTEL (intel…
…#2331) Add support for checked matrix construct instruction. Specification draft: https://github.com/intel/llvm/blob/2fa153ee852ea3d7d64df097f1f494cddacee90e/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc Original commit: KhronosGroup/SPIRV-LLVM-Translator@a1b1f49
Configuration menu - View commit details
-
Copy full SHA for 805f842 - Browse repository at this point
Copy the full SHA 805f842View commit details -
Configuration menu - View commit details
-
Copy full SHA for 96812b9 - Browse repository at this point
Copy the full SHA 96812b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for f589d9b - Browse repository at this point
Copy the full SHA f589d9bView commit details -
[SYCL][NFC] Fix some 'startswith/endswith' related to SYCL (intel#12573)
Replace some deprecated 'startswith' and 'endswith' with 'starts_with' and 'ends_with' to clear some warnings when building SYCL compiler. --------- Signed-off-by: jinge90 <ge.jin@intel.com>
Configuration menu - View commit details
-
Copy full SHA for f7a360d - Browse repository at this point
Copy the full SHA f7a360dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 21e703a - Browse repository at this point
Copy the full SHA 21e703aView commit details -
[Driver] Allow for -O3 on Windows using clang-cl (intel#12504)
We currently support -O3 for Linux compilations, expand this to also be available on Windows. This also better aligns with our existing product offerings.
Configuration menu - View commit details
-
Copy full SHA for 0af4ac7 - Browse repository at this point
Copy the full SHA 0af4ac7View commit details -
[SYCL] Fix compiler crash. (intel#12324)
The compiler was crashing when the user requested fp-accuracy for the functions in a call of the form f1(f2(f3 ...), where f1, f2 and f3 were fpbuiltin but the innermost function didn't have an fpbuiltin. The current builtinID was used instead of getting the builtinID from the current function. that created a crash in the compiler. This patch fixes the issue and renames the function EmitFPBuiltinIndirectCall to MaybeEmitFPBuiltinofFD .
Configuration menu - View commit details
-
Copy full SHA for 4fdcb58 - Browse repository at this point
Copy the full SHA 4fdcb58View commit details -
[SYCL][HIP][CUDA] Use new version of piMemGetNativeHandle and add test (
intel#12297) We want to change the signature of `piMemGetNativeHandle` for reasons explained here oneapi-src/unified-runtime#1199 Corresponding UR PR: oneapi-src/unified-runtime#1226 A previous PR added a new entry point intel#12199 but it was decided that it is better to modify the existing entry point
Configuration menu - View commit details
-
Copy full SHA for 8427bd2 - Browse repository at this point
Copy the full SHA 8427bd2View commit details -
[SYCL][libdevice] Add sqrt with rounding mode supported in sycl::ext:…
…:intel::math (intel#12571) Signed-off-by: jinge90 <ge.jin@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 6c1dde4 - Browse repository at this point
Copy the full SHA 6c1dde4View commit details -
LLVM and SPIRV-LLVM-Translator pulldown (WW05)
LLVM: llvm/llvm-project@178719e SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@a1b1f49
Configuration menu - View commit details
-
Copy full SHA for 0dc97ec - Browse repository at this point
Copy the full SHA 0dc97ecView commit details -
[SYCL][ESIMD] Fix a few issues with scatter(usm, ...) (intel#12585)
Problems found by Gregory (thanks!): 1) There were some duplicated tests, remove those 2) We didn't test non-LSC mask on Gen12 3) We get an ambiguous call because we had an old function that didn't have VS, but the new functions have default VS=1, so we don't need the old one. 4) When we pass a simd_view for the vals, we got a template match failure. This is the same issue we hit in the compile-time tests where even if we have a simd_view overload the compiler can't infer N, so we need to provide T,N anyway, so add that in the tests. I tested this on Gen12. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 8bfc56f - Browse repository at this point
Copy the full SHA 8bfc56fView commit details
Commits on Feb 2, 2024
-
[SYCL] [NATIVECPU] Add OCK subdirectory with EXCLUDE_FROM_ALL (intel#…
…12579) Adding `EXCLUDE_FROM_ALL` to the `add_subdirectory` for the OneAPI Construction Kit, in order to to avoid building its components unless they are required by the SYCL toolchain.
Configuration menu - View commit details
-
Copy full SHA for 71eee2c - Browse repository at this point
Copy the full SHA 71eee2cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 262b44a - Browse repository at this point
Copy the full SHA 262b44aView commit details -
[SYCL] Disable dynamic_address_cast test on FPGA (intel#12561)
The FPGA emulator is currently affected by the same issue as the CPU runtime. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 9b2e77a - Browse repository at this point
Copy the full SHA 9b2e77aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 46bce9c - Browse repository at this point
Copy the full SHA 46bce9cView commit details -
[CI] Modify Nightly task to run opencl:cpu testing on different CPUs (i…
…ntel#12548) We have flakyness in nightly testing results. Having more variety would helpfully provide some insights on conditions when it happens. The task is only executed once a day, so extra resources needed shouldn't affect the load on the runners much.
Configuration menu - View commit details
-
Copy full SHA for 35f9696 - Browse repository at this point
Copy the full SHA 35f9696View commit details -
Configuration menu - View commit details
-
Copy full SHA for faad41d - Browse repository at this point
Copy the full SHA faad41dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 30ab2fe - Browse repository at this point
Copy the full SHA 30ab2feView commit details