Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Node Profiling #353

Closed
wants to merge 1,803 commits into from
Closed

[SYCL][Graph] Node Profiling #353

wants to merge 1,803 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jan 31, 2024

  1. Configuration menu
    Copy the full SHA
    292b508 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8a98091 View commit details
    Browse the repository at this point in the history
  3. [InstCombine] Simplify and/or by replacing operands with constants (#…

    …77231)
    
    This patch tries to simplify `X | Y` by replacing occurrences of `Y` in
    `X` with 0. Similarly, it tries to simplify `X & Y` by replacing
    occurrences of `Y` in `X` with -1.
    
    Alive2: https://alive2.llvm.org/ce/z/cNjDTR
    Note: As the current implementation is too conservative in the one-use
    checks, I cannot remove other existing hard-coded simplifications if
    they involves more than two instructions (e.g, `A & ~(A ^ B) --> A &
    B`).
    
    Compile-time impact:
    http://llvm-compile-time-tracker.com/compare.php?from=a085402ef54379758e6c996dbaedfcb92ad222b5&to=9d655c6685865ffce0ad336fed81228f3071bd03&stat=instructions%3Au
    
    |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
    |--|--|--|--|--|--|--|
    |+0.01%|-0.00%|+0.00%|-0.02%|+0.01%|+0.02%|-0.01%|
    
    Fixes #76554.
    dtcxzyw committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f2816ff View commit details
    Browse the repository at this point in the history
  4. [clang][Interp] Add inline descriptor to global variables (#72892)

    Some time ago, I did a similar patch for local variables.
    
    Initializing global variables can fail as well:
    ```c++
    constexpr int a = 1/0;
    static_assert(a == 0);
    ```
    ... would succeed in the new interpreter, because we never saved the
    fact that `a` has not been successfully initialized.
    tbaederr committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5bb99ed View commit details
    Browse the repository at this point in the history
  5. [NFC] Update .git-blame-ignore-revs for compiler-rt builtins (#79803)

    The three commits from "[RFC] compiler-rt builtins cleanup and
    refactoring" rewrote lots of code in compiler-rt builtins.
    
    - 082b89b: [builtins] Reformat builtins with clang-format
    - 0ba22f5: [builtins] Use single line C++/C99 comment style
    - 84da0e1: [builtins] Use aliases for function redirects
    piggynl committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    6f35f1d View commit details
    Browse the repository at this point in the history
  6. [NFC] Add compiler-rt:* to .github/new-prs-labeler.yml (#79872)

    After this change, all current compiler-rt:* labels on GitHub are
    covered.
    piggynl committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    9594746 View commit details
    Browse the repository at this point in the history
  7. [clang][dataflow] Extend debug output for Environment. (#79982)

    *  Print `ReturnLoc`, `ReturnVal`, and `ThisPointeeLoc` if applicable.
    
    * For entries in `LocToVal` that correspond to declarations, print the
    names
       of the declarations next to them.
    
    I've removed the FIXME because all relevant fields are now being dumped.
    I'm
    not sure we actually need the capability for the caller to specify which
    fields
    to dump, so I've simply deleted this part of the comment.
    
    Some examples of the output:
    
    
    ![image](https://github.com/llvm/llvm-project/assets/29098113/17d0978f-b86d-4555-8a61-d1f2021f8d59)
    
    
    ![image](https://github.com/llvm/llvm-project/assets/29098113/021dbb24-5fe2-4720-8a08-f48dcf4b88f8)
    martinboehme committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c83ec84 View commit details
    Browse the repository at this point in the history
  8. [AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087)

    Make the wmma intrinsic type signatures to be canonical. We need
    a type signature as long as the type is not fixed. However, when an
    argument's type matches a previous argument's type, we do not need the
    signature for this argument.
    
     This patch fixes three general cases:
      1. add missing signatures
      2. remove signatures for matching arguments
    3. reorer the signatures -- return type signature should always appear
    first
    changpeng committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    3564666 View commit details
    Browse the repository at this point in the history
  9. [clang] static operators should evaluate object argument (reland) (#8…

    …0108)
    
    This re-applies 30155fc with a fix for clangd.
    
    ### Description
    
    clang don't evaluate the object argument of `static operator()` and
    `static operator[]` currently, for example:
    
    ```cpp
    #include <iostream>
    
    struct Foo {
        static int operator()(int x, int y) {
            std::cout << "Foo::operator()" << std::endl;
            return x + y;
        }
        static int operator[](int x, int y) {
            std::cout << "Foo::operator[]" << std::endl;
            return x + y;
        }
    };
    Foo getFoo() {
        std::cout << "getFoo()" << std::endl;
        return {};
    }
    int main() {
        std::cout << getFoo()(1, 2) << std::endl;
        std::cout << getFoo()[1, 2] << std::endl;
    }
    ```
    
    `getFoo()` is expected to be called, but clang don't call it currently
    (17.0.6). This PR fixes this issue.
    
    Fixes #67976, reland #68485.
    
    ### Walkthrough
    
    - **clang/lib/Sema/SemaOverload.cpp**
    - **`Sema::CreateOverloadedArraySubscriptExpr` &
    `Sema::BuildCallToObjectOfClassType`**
    Previously clang generate `CallExpr` for static operators, ignoring the
    object argument. In this PR `CXXOperatorCallExpr` is generated for
    static operators instead, with the object argument as the first
    argument.
      - **`TryObjectArgumentInitialization`**
    `const` / `volatile` objects are allowed for static methods, so that we
    can call static operators on them.
    - **clang/lib/CodeGen/CGExpr.cpp**
      - **`CodeGenFunction::EmitCall`**
    CodeGen changes for `CXXOperatorCallExpr` with static operators: emit
    and ignore the object argument first, then emit the operator call.
    - **clang/lib/AST/ExprConstant.cpp**
      - **`‎ExprEvaluatorBase::handleCallExpr‎`**
    Evaluation of static operators in constexpr also need some small changes
    to work, so that the arguments won't be out of position.
    - **clang/lib/Sema/SemaChecking.cpp**
      - **`Sema::CheckFunctionCall`**
    Code for argument checking also need to be modify, or it will fail the
    test `clang/test/SemaCXX/overloaded-operator-decl.cpp`.
    - **clang-tools-extra/clangd/InlayHints.cpp**
      - **`InlayHintVisitor::VisitCallExpr`**
    Now that the `CXXOperatorCallExpr` for static operators also have object
    argument, we should also take care of this situation in clangd.
    
    ### Tests
    
    - **Added:**
        - **clang/test/AST/ast-dump-static-operators.cpp**
          Verify the AST generated for static operators.
        - **clang/test/SemaCXX/cxx2b-static-operator.cpp**
    Static operators should be able to be called on const / volatile
    objects.
    - **Modified:**
        - **clang/test/CodeGenCXX/cxx2b-static-call-operator.cpp**
        - **clang/test/CodeGenCXX/cxx2b-static-subscript-operator.cpp**
          Matching the new CodeGen.
    
    ### Documentation
    
    - **clang/docs/ReleaseNotes.rst**
      Update release notes.
    
    ---------
    
    Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>
    Co-authored-by: cor3ntin <corentinjabot@gmail.com>
    Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
    4 people committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ee01a2c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    82324bc View commit details
    Browse the repository at this point in the history
  11. [ADT] Use a constexpr version of llvm::bit_ceil (NFC) (#79709)

    This patch replaces the template trick with a constexpr function that
    is more readable.  Once C++20 is available in our code base, we can
    remove the constexpr function in favor of std::bit_ceil.
    kazutakahirata committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b49b3dd View commit details
    Browse the repository at this point in the history
  12. [SYCL][Fusion] Enable fusion of rounded-range kernels (intel#12492)

    Enable, test, and document the support for fusing rounded range kernels.
    This mostly worked already – we just have to query the original kernel's
    global size, and use that to compute the private memory size used for
    internalization.
    
    ---------
    
    Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
    jopperm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    a3e2315 View commit details
    Browse the repository at this point in the history
  13. [InstCombine] Fold select with signbit idiom into fabs (#76342)

    This patch folds:
    ```
    ((bitcast X to int) <s 0 ? -X : X) -> fabs(X)
    ((bitcast X to int) >s -1 ? X : -X) -> fabs(X)
    ((bitcast X to int) <s 0 ? X : -X) -> -fabs(X)
    ((bitcast X to int) >s -1 ? -X : X) -> -fabs(X)
    ```
    Alive2: https://alive2.llvm.org/ce/z/rGepow
    dtcxzyw committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f292f90 View commit details
    Browse the repository at this point in the history
  14. [SYCL] [NATIVECPU] Update OneAPI Construction Kit tag (intel#12543)

    Updates the commit tag for the OCK.
    PietroGhg committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    565490d View commit details
    Browse the repository at this point in the history
  15. [SYCL][Joint matrix tests] Fix test execution env setting for two tes…

    …ts (intel#12529)
    
    This will make the two tests run in the presence of either CPU OR GPU
    and not requiring both to be present to run.
    dkhaldi committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    6ec040e View commit details
    Browse the repository at this point in the history
  16. [NFC] [clang-repl] Fix test failures due to incosistent target settings

    See llvm/llvm-project#79261 for details.
    
    It shows that clang-repl uses a different target triple with clang so that it
    may be problematic if the calng-repl reads the generated BMI from clang
    in a different target triple.
    
    While the underlying issue is not easy to fix, this patch tries to make
    this test green to not bother developers.
    ChuanqiXu9 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d71831a View commit details
    Browse the repository at this point in the history
  17. [SYCL][Fusion] Improve error messages on incompatible ND-ranges (inte…

    …l#12524)
    
    Show detailed error messages when users try to fuse kernels with
    incompatible ND-ranges, showing different errors for each different
    scenario. Also combine the validation and fusion logic to reduce the
    number of ND-ranges list traversals.
    
    ---------
    
    Signed-off-by: Victor Perez <victor.perez@codeplay.com>
    victor-eds committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    7d492f8 View commit details
    Browse the repository at this point in the history
  18. [RISCV][Isel] Remove redundant vmerge for the scalable vwadd(u).wv (#…

    …80079)
    
    Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited.
    
    ### Code
    ```
    define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) {
        %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
        %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer
        %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64>
        %ret = add <vscale x 8 x i64> %sa, %y
        ret <vscale x 8 x i64> %ret
    }
    ```
    
    ### Before this patch
    [Compiler Explorer](https://godbolt.org/z/xsoa5xPrd)
    ```
    vwadd_wv_mask_v8i32:
            li      a0, 42
            vsetvli a1, zero, e32, m4, ta, ma
            vmslt.vx        v0, v8, a0
            vmv.v.i v12, 0
            vmerge.vvm      v24, v12, v8, v0
            vwadd.wv        v8, v16, v24
            ret
    ```
    
    ### After this patch
    ```
    vwadd_wv_mask_v8i32:
            li a0, 42
            vsetvli a1, zero, e32, m4, ta, ma
            vmslt.vx v0, v8, a0
            vsetvli zero, zero, e32, m4, tu, mu
            vwadd.wv v16, v16, v8, v0.t
            vmv8r.v v8, v16
            ret
    ```
    sun-jacobi committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dc5dca1 View commit details
    Browse the repository at this point in the history
  19. [mlir][memref] memref.subview: Verify result strides (#79865)

    The `memref.subview` verifier currently checks result shape, element
    type, memory space and offset of the result type. However, the strides
    of the result type are currently not verified. This commit adds
    verification of result strides for non-rank reducing ops and fixes
    invalid IR in test cases.
    
    Verification of result strides for ops with rank reductions is more
    complex (and there could be multiple possible result types). That is
    left for a separate commit.
    
    Also refactor the implementation a bit:
    * If `computeMemRefRankReductionMask` could not compute the dropped
    dimensions, there must be something wrong with the op. Return
    `FailureOr` instead of `std::optional`.
    * `isRankReducedMemRefType` did much more than just checking whether the
    op has rank reductions or not. Inline the implementation into the
    verifier and add better comments.
    * `produceSubViewErrorMsg` does not have to be templatized.
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    db49319 View commit details
    Browse the repository at this point in the history
  20. [CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAll…

    …ocOrder (#80015)
    
    Previously we called ignoreCSRForAllocationOrder on every alias of every
    CSR which was expensive on targets like AMDGPU which define a very large
    number of overlapping register tuples.
    
    On such targets it is simpler and faster to call
    ignoreCSRForAllocationOrder once for every physical register.
    
    Differential Revision: https://reviews.llvm.org/D146735
    jayfoad committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f852503 View commit details
    Browse the repository at this point in the history
  21. Revert "[mlir][memref] memref.subview: Verify result strides" (#80116)

    Reverts llvm/llvm-project#79865
    
    I think there is a bug in the stride computation in
    `SubViewOp::inferResultType`. (Was already there before this change.)
    
    Reverting this commit for now and updating the original pull request
    with a fix and more test cases.
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    96c907d View commit details
    Browse the repository at this point in the history
  22. Turn on LLVM_USE_SPLIT_DWARF by default for Linux Debug build (intel#…

    …12527)
    
    split-dwarf feature can help reducing compile time and build footprint
    See examples from:
    https://www.productive-cpp.com/improving-cpp-builds-with-split-dwarf/
    
    Locally measured size reduction using debug build shows around 20%
    reduction for static linked build.
    
    Footprint reduction using after compile.py: 48G -> 37G (23%)
    after check-all: 170G -> 140G (18%)
    
    Debugability should not be affected.
    Should help with compile time, especially
    incremental build as well.
    
    -gsplit-dwarf not yet supported on windows, so not turn it on for
    now.
    jsji committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    2f20e37 View commit details
    Browse the repository at this point in the history
  23. [SYCL] Ensure that RTDeviceBinaryImage instances have a unique image …

    …ID (intel#12526)
    
    **Problem:**
    
    Currently, the image id of an RTDeviceBinaryImage instance is simply the
    pointer value of the underlying pi_device_binary (in
    [getImageID(](https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.hpp#L221))).
    However, consider the following scenario:
    1) We create a device image
    2) Put into cache
    3) Destroy the image (when it goes out of scope)
    4) Create another image that _happens to be created at the same memory
    address_ (thus having same image ID)
    
    This causes two instances of RTDeviceBinaryImage to share the same image
    id, which ends up causing a collision in the KernelProgramCache.
    
    **Solution (Proposed in this PR)**
    
    Have a counter in RTDeviceBinaryImage that increments upon instantiation
    of this class. The counter value is added to the image id to ensure that
    no two instances have the same ID.
    
    **Alternative Solutions**
    
    1. Remove the entry from the KernelProgramCache when the image is
    destroyed. This solution would require more work as the
    KernelProgramCache, currently, [does not support arbitrary element-wise
    eviction](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/KernelProgramCache.md#in-memory-cache-eviction)
    (eviction follows a LRU strategy when cache size exceeds the threshold).
    Moreover, I expect this to have additional performance overhead of
    having to lock the cache and evicting. The proposed solution is much
    more simpler.
    uditagarwal97 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    04ff5b8 View commit details
    Browse the repository at this point in the history
  24. [clang][Interp] Support arbitrary precision constants (#79747)

    Add (de)serialization support for them, like we do for Floating values.
    tbaederr committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    64a849a View commit details
    Browse the repository at this point in the history
  25. Add support of param type for transform.structured.tile_using_forall …

    …(#72097)
    
    Make transform.structured.tile_using_forall be able to take param type
    tile sizes.
    
    Examples:
    ```
    %tile_sizes = transform.param.constant 16 : i64 -> !transform.param<i64>
    transform.structured.tile_using_forall %matmul tile_sizes [%tile_sizes : !transform.param<i64>, 32] ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
    ```
    ```
    %c10 = transform.param.constant 10 : i64 -> !transform.any_param
    %c20 = transform.param.constant 20 : i64 -> !transform.any_param
    %tile_sizes = transform.merge_handles %c10, %c20 : !transform.any_param
    transform.structured.tile_using_forall %matmul tile_sizes *(%tile_sizes : !transform.any_param) ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
    ```
    jinchen62 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d439f36 View commit details
    Browse the repository at this point in the history
  26. [SME] Stop RA from coalescing COPY instructions that transcend beyond…

    … smstart/smstop. (#78294)
    
    This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
    expands to
    a 'nop', but which stops the register allocator from coalescing a COPY
    node when
    its use/def crosses a SMSTART or SMSTOP instruction.
    
    For example:
    
        %0:fpr64 = COPY killed $d0
        undef %2.dsub:zpr = COPY %0       // <- Do not coalesce this COPY
        ADJCALLSTACKDOWN 0, 0
    MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
        $d0 = COPY killed %0
        BL @use_f64, csr_aarch64_aapcs
    
    If the COPY would be coalesced, that would lead to:
    
        $d0 = COPY killed %0
    
    being replaced by:
    
        $d0 = COPY killed %2.dsub
    
    which means the whole ZPR reg would be live upto the call, causing the
    MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:
    
        str     q0, [sp]   // 16-byte Folded Spill
        smstop  sm
        ldr     z0, [sp]   // 16-byte Folded Reload
        bl      use_f64
    
    which would be incorrect for two reasons:
    1. The program may load more data than it has allocated.
    2. If there are other SVE objects on the stack, the compiler might use
    the
       'mul vl' addressing modes to access the spill location.
    
    By disabling the coalescing, we get the desired results:
    
        str     d0, [sp, #8]  // 8-byte Folded Spill
        smstop  sm
        ldr     d0, [sp, #8]  // 8-byte Folded Reload
        bl      use_f64
    sdesmalen-arm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dd73666 View commit details
    Browse the repository at this point in the history
  27. [RISCV][MC] Add MC layer support for the experimental zabha extension…

    … (#80005)
    
    This patch implements the zabha (Byte and Halfword Atomic Memory
    Operations) v1.0-rc1 extension.
    See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc.
    dtcxzyw committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    89f87c3 View commit details
    Browse the repository at this point in the history
  28. [mlir][transform] Add elementwise criteria to match.structured.body

    … (#79626)
    
    As far as I am aware, there is no simple way to match on elementwise
    ops. I propose to add an `elementwise` criteria to the
    `match.structured.body` op. Although my only hesitation is that
    elementwise is not only determined by the body, but also the indexing
    maps. So if others find this too awkward, I can implement a separate
    match op instead.
    srcarroll committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    488f88b View commit details
    Browse the repository at this point in the history
  29. [mlir][ArmSME] Support 2-way widening outer products (#78975)

    This patch introduces support for 2-way widening outer products. This
    enables the fusion of 2 'arm_sme.outerproduct' operations that are
    chained via the accumulator into a 2-way widening outer product
    operation.
    
    Changes:
    
    - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants.
      These map to instruction variants added in SME2 and use different
      intrinsics. Intrinsics are already implemented for widening variants
      from SME1.
    - Adds the following operations:
      - fmopa_2way, fmops_2way
      - smopa_2way, smops_2way
      - umopa_2way, umops_2way
    - Implements conversions for the above ops to intrinsics in
    ArmSMEToLLVM.
    - Adds a pass 'arm-sme-outer-product-fusion'  that fuses
      'arm_sme.outerproduct' operations.
    
    For a detailed description of these operations see the
    'arm_sme.fmopa_2way' description.
    
    The reason for introducing many operations rather than one is the
    signed/unsigned variants can't be distinguished with types (e.g., ui16,
    si16) since 'arith.extui' and 'arith.extsi' only support signless
    integers. A single operation would require this information and an
    attribute (for example) for the sign doesn't feel right if
    floating-point types are also supported where this wouldn't apply.
    Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32)
    introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but
    no subtract variant. Whilst these are not supported in this patch, it
    felt simpler to have separate ops for add/subtract given this.
    c-rhodes committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    95ef8e3 View commit details
    Browse the repository at this point in the history
  30. [mlir][vector] Disable transpose -> shuffle lowering for scalable vec…

    …tors (#79979)
    
    vector.shuffle is not supported for scalable vectors (outside of splats)
    MacDue committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    88610b7 View commit details
    Browse the repository at this point in the history
  31. [mlir][memref] memref.subview: Verify result strides

    The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.
    
    Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.
    
    Also refactor the implementation a bit:
    * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
    * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
    * `produceSubViewErrorMsg` does not have to be templatized.
    * Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.
    
    Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ce7cc72 View commit details
    Browse the repository at this point in the history
  32. [SYCL][COMPAT] Force device function to be inlined (intel#12550)

    Due to the way the inliner works, the launched function may become very
    large and go above the inline threshold. This results with a short
    kernel which only call one function.
    
    The patch adds an always_inline on the call site to force the user
    function to be inline in the SYCL kernel to reduce overhead.
    
    Signed-off-by: Victor Lomuller <victor@codeplay.com>
    Naghasan committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e121c88 View commit details
    Browse the repository at this point in the history
  33. Merge from 'main' to 'sycl-web' (46 commits)

      CONFLICT (content): Merge conflict in clang/lib/Basic/Targets/NVPTX.cpp
      CONFLICT (content): Merge conflict in clang/test/Driver/cuda-cross-compiling.c
    KseniyaTikhomirova committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    9bf5d5c View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    db1fbd6 View commit details
    Browse the repository at this point in the history
  35. [GitHub][workflows] Add buildbot information comment to first merged …

    …PR from a new contributor (#78292)
    
    This change adds a comment to the first PR from a new contributor that
    is merged, which tells them what to expect post merge from the build
    bots.
    
    How they will be notified, where to ask questions, that you're more
    likely to be reverted than in other projects, etc. The information
    overlaps with, and links to
    https://llvm.org/docs/MyFirstTypoFix.html#myfirsttypofix-issues-after-landing-your-pr.
    So that users who simply read the email are still aware, and know where
    to follow up if they do get reports.
    
    To do this, I have added a hidden HTML comment to the new contributor
    greeting comment. This workflow will look for that to tell if the author
    of the PR was a new contributor at the time they opened the merge. It
    has to be done this way because as soon as the PR is merged, they are by
    GitHub's definition no longer a new contributor and I suspect that their
    author association will be "contributor" instead.
    
    I cannot 100% confirm that without a whole lot of effort and probably
    breaking GitHub's terms of service, but it's fairly cheap to work around
    anyway. It seems rare / almost impossible to reopen a PR in llvm at
    least, but in case it does happen the buildbot info comment has its own
    hidden HTML comment. If we find this we will not post another copy of
    the same information.
    DavidSpickett committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    44ba4c7 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    24a8041 View commit details
    Browse the repository at this point in the history
  37. [BDCE] Fix clearing of poison-generating flags

    If the demanded bits of an instruction are full, we don't have to
    recurse to its users, but we may still have to clear flags on the
    instruction itself.
    
    Fixes llvm/llvm-project#80113.
    nikic committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b210cbb View commit details
    Browse the repository at this point in the history
  38. [mlir][IR] Add RewriterBase::moveBlockBefore and fix bug in `moveOp…

    …Before` (#79579)
    
    This commit adds a new method to the rewriter API: `moveBlockBefore`.
    This op is utilized by `inlineRegionBefore` and covered by dialect
    conversion test cases.
    
    Also fixes a bug in `moveOpBefore`, where the previous op location was
    not passed correctly. Adds a test case to
    `test-strict-pattern-driver.mlir`.
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    da784a2 View commit details
    Browse the repository at this point in the history
  39. Revert "[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreC…

    …SRForAllocOrder (#80015)"
    
    This reverts commit f852503.
    
    It was supposed to speed things up but llvm-compile-time-tracker.com
    showed a slight slow down.
    jayfoad committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    942cc9a View commit details
    Browse the repository at this point in the history
  40. [ValueTracking] Merge cannotBeOrderedLessThanZeroImpl into `compute…

    …KnownFPClass` (#76360)
    
    This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into
    `computeKnownFPClass` to improve the signbit inference.
    
    ---------
    
    Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
    dtcxzyw and arsenm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    50e80e0 View commit details
    Browse the repository at this point in the history
  41. [AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034)

    PAL uses ELF REL (not RELA) relocations which can only store a 32-bit
    addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI
    which require the upper 32 bits of a 64-bit address calculation to be
    correct. This means that it is not safe to fold an arbitrary offset into
    a GlobalAddressSDNode, so stop doing that.
    
    In practice this is mostly a problem for small negative offsets which do
    not work as expected because PAL treats the 32-bit addend as unsigned.
    jayfoad committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c2c650f View commit details
    Browse the repository at this point in the history
  42. [clang][AMDGPU] Remove trialing whitespace in doc

    Added by f2a78e6.
    
    Wouldn't normally bother but it's showing up in some CI checks,
    just want to reduce the noise.
    DavidSpickett committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0217d2e View commit details
    Browse the repository at this point in the history
  43. [SYCL][Bindless] Unique sampler addressing modes per dimension (intel…

    …#12109)
    
    Add the ability to specify unique addressing modes per dimension to the
    bindless_image_sampler
    
    Corresponding CUDA adapter UR PR:
    oneapi-src/unified-runtime#1168
    
    ---------
    
    Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>
    Seanst98 and kbenzie committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b897152 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    fbbc822 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    7ff2327 View commit details
    Browse the repository at this point in the history
  46. [mlir] Fix debug output for passes that modify top-level operation. (…

    …#80022)
    
    Make it so that when the top-level (root) operation itself is being
    modified, it is also used as the root for debug output in
    PatternApplicator.
    
    Fix #80021
    Jezurko committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    78e0cca View commit details
    Browse the repository at this point in the history
  47. [mlir][EmitC] Add verbatim op (#79584)

    The `verbatim` operation produces no results and the value is emitted as
    is followed by a line break ('\n' character) during translation.
    
    Note: Use with caution. This operation can have arbitrary effects on the
    semantics of the emitted code. Use semantically more meaningful
    operations whenever possible. Additionally this op is *NOT* intended to
    be used to inject large snippets of code.
    
    This operation can be used in situations where a more suitable operation
    is not yet implemented in the dialect or where preprocessor directives
    interfere with the structure of the code.
    
    Co-authored-by: Marius Brehler <marius.brehler@iml.fraunhofer.de>
    simon-camp and marbre committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e624648 View commit details
    Browse the repository at this point in the history
  48. [SPIR-V] Improve how lowering of formal arguments in SPIR-V Backend i…

    …nterprets a value of 'kernel_arg_type' (#78730)
    
    The goal of this PR is to tolerate differences between description of
    formal arguments by function metadata (represented by "kernel_arg_type")
    and LLVM actual parameter types. A compiler may use "kernel_arg_type" of
    function metadata fields to encode detailed type information, whereas
    LLVM IR may utilize for an actual parameter a more general type, in
    particular, opaque pointer type. This PR proposes to resolve this by a
    fallback to LLVM actual parameter types during the lowering of formal
    function arguments in cases when the type can't be created by string
    content of "kernel_arg_type", i.e., when "kernel_arg_type" contains a
    type unknown for the SPIR-V Backend.
    
    An example of the issue manifestation is
    https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/KernelArgTypeInOpString.ll,
    where a compiler generates for the following kernel function detailed
    `kernel_arg_type` info in a form of `!{!"image_kernel_data*", !"myInt",
    !"struct struct_name*"}`, and in LLVM IR same arguments are referred to
    as `@foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData)`.
    Both definitions are correct, and the resulting LLVM IR is correct, but
    lowering stage of SPIR-V Backend fails to generate SPIR-V type.
    
    ```
    typedef int myInt;
    
     typedef struct {
       int width;
       int height;
     } image_kernel_data;
    
     struct struct_name {
       int i;
       int y;
     };
     void kernel foo(__global image_kernel_data* in,
                     __global struct struct_name *outData,
                     myInt out) {}
    ```
    
    ```
    define spir_kernel void @foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData) ... !kernel_arg_type !7 ... {
    entry:
      ret void
    }
    ...
    !7 = !{!"image_kernel_data*", !"myInt", !"struct struct_name*"}
    ```
    
    The PR changes a contract of `SPIRVType *getArgSPIRVType(...)` in a way
    that it may return `nullptr` to signal that the metadata string content
    is not recognized, so corresponding comments are added and a couple of
    checks for `nullptr` are inserted where appropriate.
    VyacheslavLevytskyy committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5a07774 View commit details
    Browse the repository at this point in the history
  49. [X86] i256-add - replace i386 triple X32 check prefixes with X86 and …

    …add gnux32 triple tests
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    53b9d47 View commit details
    Browse the repository at this point in the history
  50. [X86] mmx-arith.ll - replace X32 check prefixes with X86 + strip cfi …

    …noise
    
    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    8d450b4 View commit details
    Browse the repository at this point in the history
  51. [X86] v4f32-immediate.ll - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    00a6817 View commit details
    Browse the repository at this point in the history
  52. [X86] v2f32.ll - replace X32 check prefixes with X86 (and add common …

    …CHECK prefix)
    
    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    929503e View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    3f5fcb5 View commit details
    Browse the repository at this point in the history
  54. [OpenMPIRBuilder] Do not call host runtime for GPU teams codegen (#79…

    …984)
    
    Patch ensures that host runtime functions are not called for handling
    OpenMP teams clause on the device.
    
    GPU code for pragma `omp target teams distribute parallel do` will
    require only one call to OpenMP loop-worksharing GPU runtime. Support
    for it will be added later.
    
    This patch does not include changes required for handling `omp target
    teams` for the host side.
    DominikAdamski committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b437014 View commit details
    Browse the repository at this point in the history
  55. [BDCE] Also drop poison-generating metadata

    The comment was incorrect: !range also applies to calls, and we
    do need to drop it in some cases.
    nikic committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    cb6240d View commit details
    Browse the repository at this point in the history
  56. [AsmParser] Add missing globals declarations in incomplete IR mode (#…

    …79855)
    
    If `-allow-incomplete-ir` is enabled, automatically insert declarations
    for missing globals.
    
    If a global is only used in calls with the same function type, insert a
    function declaration with that type.
    
    Otherwise, insert a dummy i8 global. The fallback case could be extended
    with various heuristics (e.g. we could look at load/store types), but
    I've chosen to keep it simple for now, because I'm unsure to what degree
    this would really useful without more experience. I expect that in most
    cases the declaration type doesn't really matter (note that the type of
    an external global specifies a *minimum* size only, not a precise size).
    
    This is a followup to llvm/llvm-project#78421.
    nikic committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5cc87b4 View commit details
    Browse the repository at this point in the history
  57. [OpenMP] atomic compare weak : Parser & AST support (#79475)

    This is a support for " #pragma omp atomic compare weak". It has Parser
    & AST support for now.
    
    ---------
    
    Authored-by: Sunil Kuravinakop <kuravina@pe28vega.us.cray.com>
    SunilKuravinakop committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    a74e9ce View commit details
    Browse the repository at this point in the history
  58. [AArch64][SME] Fix inlining bug introduced in #78703 (#79994)

    Calling a `__arm_locally_streaming` function from a function that
    is not a streaming-SVE function would lead to incorrect inlining.
    
    The issue didn't surface because the tests were not testing what
    they were supposed to test.
    sdesmalen-arm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    3abf55a View commit details
    Browse the repository at this point in the history
  59. [llvm][InstCombine] bitcast bfloat half castpair bug (#79832)

    Miscompilation arises due to instruction combining of cast pairs of the
    type `bitcast bfloat to half` + `<FPOp> bfloat to half` or `bitcast half
    to bfloat` + `<FPOp half to bfloat`. For example `bitcast bfloat to
    half`+`fpext half to double` or `bitcast bfloat to half`+`fpext bfloat
    to double` respectively reduce to `fpext bfloat to double` and `fpext
    half to double`. This is an incorrect conversion as it assumes the
    representation of `bfloat` and `half` are equivalent due to having the
    same width. As a consequence miscompilation arises.
    
    Fixes #61984
    nasherm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d309261 View commit details
    Browse the repository at this point in the history
  60. [llvm-rc] Support ARM64EC resource generation (#78908)

    This is already supported in llvm-cvtres, so only a small change is
    needed.
    bylaws committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d55d72e View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    d74619a View commit details
    Browse the repository at this point in the history
  62. [mlir][ArmSME] Add initial SME vector legalization pass (#79152)

    This adds a new pass (`-arm-sme-vector-legalization`) which legalizes
    vector operations so that they can be lowered to ArmSME. This initial
    patch adds decomposition for `vector.outerproduct`,
    `vector.transfer_read`, and `vector.transfer_write` when they operate on
    vector types larger than a single SME tile. For example, a [8]x[8]xf32
    outer product would be decomposed into four [4]x[4]xf32 outer products,
    which could then be lowered to ArmSME. These three ops have been picked
    as supporting them alone allows lowering matmuls that use all ZA
    accumulators to ArmSME.
    
    For it to be possible to legalize a vector type it has to be a multiple
    of an SME tile size, but other than that any shape can be used. E.g.
    `vector<[8]x[8]xf32>`, `vector<[4]x[16]xf32>`, `vector<[16]x[4]xf32>`
    can all be lowered to four `vector<[4]x[4]xf32>` operations.
    
    In future, this pass will be extended with more SME-specific rewrites to
    legalize unrolling the reduction dimension of matmuls (which is not
    type-decomposition), which is why the pass has quite a general name.
    MacDue committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    042800a View commit details
    Browse the repository at this point in the history
  63. [DAG] AddNodeIDCustom - call ShuffleVectorSDNode::getMask once instea…

    …d of repeated getMaskElt calls.
    
    Use a simpler for-range loop to append all shuffle mask elements
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    912cdd2 View commit details
    Browse the repository at this point in the history
  64. [X86] insertps-from-constantpool.ll - replace X32 check prefixes with…

    … X86 and expose address math
    
    We try to only use X32 for gnux32 triple tests.
    
    Use no_x86_scrub_mem_shuffle so the test shows updated shuffle intermediate and the +4 offset into the constant pool vector entry
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    a82ca1c View commit details
    Browse the repository at this point in the history
  65. [X86] divrem.ll - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e4af212 View commit details
    Browse the repository at this point in the history
  66. [X86] divide-by-constant.ll - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ed11f25 View commit details
    Browse the repository at this point in the history
  67. [X86] fold-vector-sext - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    824d073 View commit details
    Browse the repository at this point in the history
  68. [X86] cfguard - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    1d8c8f1 View commit details
    Browse the repository at this point in the history
  69. [X86] divrem8_ext.ll - replace X32 check prefixes with X86

    We try to only use X32 for gnux32 triple tests.
    RKSimon committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    648eb7c View commit details
    Browse the repository at this point in the history
  70. [SYCL][Fusion] Silence warning (intel#12555)

    Silence unused variable warning which tripped post-commit checks for
    intel#12492.
    
    Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
    jopperm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b8f9c8b View commit details
    Browse the repository at this point in the history
  71. [AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c)…

    …, concat(b,d)) (#79464)
    
    We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16
    uhadd(concat(a,c), concat(b,d)), which can lead to further
    simplifications.
    Rin18 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    cf828ae View commit details
    Browse the repository at this point in the history
  72. [X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load

    Broadcast of a single float should not be any slower than
    loading 32B using vmovaps. So remat it can help reduce
    register spill when there is big register pressure.
    KanRobert committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e3c9327 View commit details
    Browse the repository at this point in the history
  73. [AMDGPU][GFX12] Add tests for unsupported builtins (#78729)

    __builtin_amdgcn_mfma* and __builtin_amdgcn_smfmac*
    mariusz-sikora-at-amd committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f96e85b View commit details
    Browse the repository at this point in the history
  74. [X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCN…

    …T instructions (#79954)
    
    Two variants: promoted legacy, NF (no flags update).
    
    The syntax of NF instructions is aligned with GNU binutils.
    https://sourceware.org/pipermail/binutils/2023-September/129545.html
    XinWang10 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d9e875d View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    817d0cb View commit details
    Browse the repository at this point in the history
  76. [VPlan] Preserve original induction order when creating scalar steps.

    Update createScalarIVSteps to take an insert point as parameter. This
    ensures that the inserted scalar steps are in the same order as the
    recipes they replace (vs in reverse order as currently). This helps to
    reduce the diff for follow-up changes.
    fhahn committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    9536a62 View commit details
    Browse the repository at this point in the history
  77. Fix after #79152

    JoelWee committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ab87426 View commit details
    Browse the repository at this point in the history
  78. [mlir][IR] Send missing notifications when inlining a block (#79593)

    When a block is inlined into another block, the nested operations are
    moved into another block and the `notifyOperationInserted` callback
    should be triggered. This commit adds the missing notifications for:
    * `RewriterBase::inlineBlockBefore`
    * `RewriterBase::mergeBlocks`
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c672b34 View commit details
    Browse the repository at this point in the history
  79. [mlir] Fix ab87426

    JoelWee committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    7e45cfd View commit details
    Browse the repository at this point in the history
  80. [mlir][EmitC] Remove unused attribute from verbatim op (#80142)

    The uses of the attribute were removed in code review of #79584, but
    it's definition was inadvertently kept.
    simon-camp committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    121a0ef View commit details
    Browse the repository at this point in the history
  81. Configuration menu
    Copy the full SHA
    cec24f0 View commit details
    Browse the repository at this point in the history
  82. [mlir][IR] Send missing notification when splitting a block (#79597)

    When a block is split with `RewriterBase::splitBlock`, a
    `notifyBlockInserted` notification, followed by
    `notifyOperationInserted` notifications (for moving over the operations
    into the new block) should be sent. This commit adds those
    notifications.
    matthias-springer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c2675ba View commit details
    Browse the repository at this point in the history
  83. [ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79…

    …287)
    
    This ensures the odd/even pseudo instructions are allocated to the same
    register range.
    
    This fixes #71763
    AlfieRichardsArm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    de75e50 View commit details
    Browse the repository at this point in the history
  84. [Driver] Fix erroneous warning for -fcx-limited-range and -fcx-fortra…

    …n-rules. (#79821)
    
    The options `-fcx-limited-range` and `-fcx-fortran-rules` were added in
    _https://github.com/llvm/llvm-project/pull/70244_
    
    The code adding the options introduced an erroneous warning.
    `$ clang -c -fcx-limited-range t1.c` 
    `clang: warning: overriding '' option with '-fcx-limited-range'
    [-Woverriding-option]`
    and
    `$ clang -c -fcx-fortran-rules t1.c`
    `clang: warning: overriding '' option with '-fcx-fortran-rules'
    [-Woverriding-option]`
    
    The warning doesn't make sense. This patch removes it.
    zahiraam committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e538486 View commit details
    Browse the repository at this point in the history
  85. [AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294)

    JumpThreading may perform AA queries while the dominator tree is not up
    to date, which may result in miscompilations.
    
    Fix this by adding a new AAQI option to disable the use of the dominator
    tree in BasicAA.
    
    Fixes llvm/llvm-project#79175.
    nikic committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    4f32f5d View commit details
    Browse the repository at this point in the history
  86. [mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)

    This PR moves lowering of math dialect later in the pipeline. Because
    math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for
    GPU target, and it needs to run it first.
    grypp committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    74bf0b1 View commit details
    Browse the repository at this point in the history
  87. [clang] Represent array refs as TemplateArgument::Declaration (#80050)

    This returns (probably temporarily) array-referring NTTP behavior to
    which was prior to #78041 because ~~I'm fed up~~ have no time to fix
    regressions.
    bolshakov-a committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    9bf4e54 View commit details
    Browse the repository at this point in the history
  88. [MIRPrinter] Don't print space when there is no successor (#80143)

    Extra space causes the checks generated by update_mir_test_checks to be
    unavailable.
    
    ```
    # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
    # RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s
    ---
    name: foo
    body: |
      ; CHECK-LABEL: name: foo
      ; CHECK: bb.0:
      ; CHECK-NEXT:   successors:
      ; CHECK-NEXT: {{  $}}
      ; CHECK-NEXT: {{  $}}
      ; CHECK-NEXT: bb.1:
      ; CHECK-NEXT:   RET 0, $eax
      bb.0:
        successors:
    
      bb.1:
        RET 0, $eax
    ...
    ```
    
    The failure log is as follows:
    
    ```
    llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match
     ; CHECK-NEXT: {{ $}}
                   ^
    <stdin>:21:13: note: 'next' match was here
     successors:
                ^
    <stdin>:21:13: note: previous match ended here
     successors:
    ```
    DianQK committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b7738e2 View commit details
    Browse the repository at this point in the history
  89. Revert "[mlir][complex] Prevent underflow in complex.abs (#79786)"

    This reverts commit 4effff2. It makes
    `complex.abs(-1)` return `-1`.
    d0k committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    70fb96a View commit details
    Browse the repository at this point in the history
  90. [SYCL][Fusion] Handle GEPs that were canonicalized to byte offsets (i…

    …ntel#12557)
    
    Upstream now canonicalizes constant GEPs to represent byte offsets, i.e.
    using `i8` as source element type. This PR adapts the internalization
    pass to this change by also remapping GEPs with a constant offset, if
    that offset is a multiple of the internalized accessor's element size.
    
    Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
    jopperm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    470e378 View commit details
    Browse the repository at this point in the history
  91. [flang] Lower ASYNCHRONOUS variables and IO statements (#80008)

    Finish plugging-in ASYNCHRONOUS IO in lowering (GetAsynchronousId was
    not used yet).
    
    Add a runtime implementation for GetAsynchronousId (only the signature
    was defined). Always return zero since flang runtime "fakes"
    asynchronous IO (data transfer are always complete, see
    flang/docs/IORuntimeInternals.md).
    
    Update all runtime integer argument and results for IDs to use the
    AsynchronousId int alias for consistency.
    
    In lowering, asynchronous attribute is added on the hlfir.declare of
    ASYNCHRONOUS variable, but nothing else is done. This is OK given the
    synchronous aspects of flang IO, but it would be safer to treat these
    variable as volatile (prevent code motion of related store/loads) since
    the asynchronous data change can also be done by C defined user
    procedure (see 18.10.4 Asynchronous communication). Flang lowering
    anyway does not give enough info for LLVM to do such code motions (the
    variables that are passed in a call are not given the noescape
    attribute, so LLVM will assume any later opaque call may modify the
    related data and would not move load/stores of such variables
    before/after calls even if it could from a pure Fortran point of view
    without ASYNCHRONOUS).
    jeanPerier committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    4679132 View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    47df391 View commit details
    Browse the repository at this point in the history
  93. Revert "[Clang][Sema] fix outline member function template with defau…

    …… (#80144)
    
    …lt align crash (#78400)"
    
    This reverts commit 7b33899.
    
    A regression was discovered here:
    llvm/llvm-project#78400
    
    and the author requested a revert to give time to review.
    erichkeane committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    6e6aa44 View commit details
    Browse the repository at this point in the history
  94. [mlir][mesh] Refactoring code organization, tests and docs (#79606)

    * Split out `MeshDialect.h` form `MeshOps.h` that defines the dialect
    class. Reduces include clutter if you care only about the dialect and
    not the ops.
    
    * Expose functions `getMesh` and `collectiveProcessGroupSize`. There
    functions are useful for outside users of the dialect.
    
    * Remove unused code.
    
    * Remove examples and tests of mesh.shard attribute in tensor encoding.
    Per the decision that Spmdization would be performed on sharding
    annotations and there will be no tensors with sharding specified in the
    type.
    For more info see this RFC comment:
    https://discourse.llvm.org/t/rfc-sharding-framework-design-for-device-mesh/73533/81
    sogartar committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    31fc0a1 View commit details
    Browse the repository at this point in the history
  95. Move the PowerPC/PPCMergeStringPool work to initializer (#77352)

    Currently, the `PPCMergeStringPool` merges the global variable after the
    `AsmPrinter` initializer adds the global variables to its symbol list.
    This is to move the merging work of `PPCMergeStringPool` to its
    initializer, just like what GlobalMerge does, to avoid adding merged
    global variables to the `AsmPrinter` symbol lis.  
    scui-ibm committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    1bab570 View commit details
    Browse the repository at this point in the history
  96. Fix: CMake Error at cmake/modules/LLVMExternalProjectUtils.cmake:86 (…

    …is_msvc_triple) (#80071)
    
    Adding quotes around the `${target_triple}`
    
    Fix: #78530
    hiraditya committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c651b2b View commit details
    Browse the repository at this point in the history
  97. [AST] Add dump() method to TypeLoc (#65484)

    The ability to dump AST nodes is important to ad-hoc debugging, and
    the fact this doesn't work with TypeLoc nodes is an obvious missing
    feature in e.g. clang-query (`set output dump` simply does nothing).
    
    Having TypeLoc::dump(), and enabling DynTypedNode::dump() for such nodes
    seems like a clear win.
    
    It looks like this:
    ```
    int main(int argc, char **argv);
    
    FunctionProtoTypeLoc <test.cc:3:1, col:31> 'int (int, char **)' cdecl
    |-ParmVarDecl 0x30071a8 <col:10, col:14> col:14 argc 'int'
    | `-BuiltinTypeLoc <col:10> 'int'
    |-ParmVarDecl 0x3007250 <col:20, col:27> col:27 argv 'char **'
    | `-PointerTypeLoc <col:20, col:26> 'char **'
    |   `-PointerTypeLoc <col:20, col:25> 'char *'
    |     `-BuiltinTypeLoc <col:20> 'char'
    `-BuiltinTypeLoc <col:1> 'int'
    ```
    
    It dumps the lexically nested tree of type locs.
    This often looks similar to how types are dumped, but unlike types
    we don't look at desugaring e.g. typedefs, as their underlying types
    are not lexically spelled here.
    
    ---
    
    Less clear is exactly when to include these nodes in existing text AST
    dumps rooted at (TranslationUnit)Decls.
    These already omit supported nodes sometimes, e.g. NestedNameSpecifiers
    are often mentioned but not recursively dumped.
    
    TypeLocs are a more extreme case: they're ~always more verbose
    than the current AST dump.
    So this patch punts on that, TypeLocs are only ever printed recursively
    as part of a TypeLoc::dump() call.
    
    It would also be nice to be able to invoke `clang` to dump a typeloc
    somehow, like `clang -cc1 -ast-dump`. But I don't know exactly what the
    best verison of that is, so this patch doesn't do it.
    
    ---
    
    There are similar (less critical!) nodes: TemplateArgumentLoc etc,
    these also don't have dump() functions today and are obvious extensions.
    
    I suspect that we should add these, and Loc nodes should dump each other
    (e.g. the ElaboratedTypeLoc `vector<int>::iterator` should dump
    the NestedNameSpecifierLoc `vector<int>::`, which dumps the
    TemplateSpecializationTypeLoc `vector<int>::` etc).
    
    Maybe this generalizes further to a "full syntactic dump" mode, where
    even Decls and Stmts would print the TypeLocs they lexically contain.
    But this may be more complex than useful.
    
    ---
    
    While here, ConceptReference JSON dumping must be implemented. It's not
    totally clear to me why this implementation wasn't required before but
    is now...
    sam-mccall committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    8d1b1c9 View commit details
    Browse the repository at this point in the history
  98. [AArch64] MI Scheduler LDP combine follow up (#79003)

    This is a follow up of 75d820d, adding more opcodes to the combine
    target hook enabling more LDP creation.
    
    Patch co-authored by Cameron McInally.
    sjoerdmeijer committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    8841846 View commit details
    Browse the repository at this point in the history
  99. Add a release note for TypeLoc::dump() support; NFC

    This amends 8d1b1c9 which added the
    functionality the release note refers to.
    AaronBallman committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e33dc6b View commit details
    Browse the repository at this point in the history
  100. [AArch64] Use add_and_or_is_add for CSINC (#79552)

    Adds or add-like-or's of 1 can both be turned into csinc, which can help
    fold more instructions into a csinc.
    davemgreen committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5d7d89d View commit details
    Browse the repository at this point in the history
  101. [clang][Interp] Handle casts between complex types (#79269)

    Just handle this like two primtive casts.
    tbaederr committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    32c0048 View commit details
    Browse the repository at this point in the history
  102. [clang][Interp] Remove wrong * operator

    classifyComplexElementType used to return a std::optional, seems like
    this was left in a PR and not re-tested.
    
    This broke build bots, e.g.
    https://lab.llvm.org/buildbot/#/builders/68/builds/67930
    tbaederr committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dfd5a64 View commit details
    Browse the repository at this point in the history
  103. [AsmParser] Support non-consecutive global value numbers (#80013)

    llvm/llvm-project#78171 added support for
    non-consecutive local value numbers. This extends the support for global
    value numbers (for globals and functions).
    
    This means that it is now possible to delete an unnamed global
    definition/declaration without breaking the IR.
    
    This is a lot less common than unnamed local values, but it seems like
    something we should support for consistency. (Unnamed globals are used a
    lot in Rust though.)
    nikic committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f2df4bf View commit details
    Browse the repository at this point in the history
  104. [gn build] Port 8d1b1c9

    llvmgnsyncbot committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0cd8348 View commit details
    Browse the repository at this point in the history
  105. [clang][dataflow] fix assert in `Environment::getResultObjectLocation…

    …` (#79608)
    
    When calling `Environment::getResultObjectLocation` with a
    CXXOperatorCallExpr that is a prvalue, we just hit an assert because no
    record was ever created.
    
    ---------
    
    Co-authored-by: martinboehme <mboehme@google.com>
    paulsemel and martinboehme committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5c2da28 View commit details
    Browse the repository at this point in the history
  106. [Flang] Support NULL(procptr): null intrinsic that has procedure poin…

    …ter argument. (#80072)
    
    This PR adds support for NULL intrinsic to have a procedure pointer
    argument.
    DanielCChen committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    bd8bec2 View commit details
    Browse the repository at this point in the history
  107. Configuration menu
    Copy the full SHA
    e34fd2e View commit details
    Browse the repository at this point in the history
  108. Configuration menu
    Copy the full SHA
    baf1b19 View commit details
    Browse the repository at this point in the history
  109. Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline…

    … (#78556)"
    
    This reverts commit 74bf0b1. The test
    always fails.
    
     | mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input
     |  // CHECK-PTX: __nv_expf
    
    https://lab.llvm.org/buildbot/#/builders/61/builds/53789
    d0k committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    98dbc68 View commit details
    Browse the repository at this point in the history
  110. Revert "[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(con…

    …cat(a,c), concat(b,d))" (#80157)
    
    Reverts llvm/llvm-project#79464 while figuring out why the tests are
    failing.
    Rin18 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    2907c63 View commit details
    Browse the repository at this point in the history
  111. [bazel] Port 31fc0a1

    d0k committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    6720e3a View commit details
    Browse the repository at this point in the history
  112. [AArch64] Use DAG->isAddLike in add_and_or_is_add (#79563)

    This allows it to work with disjoint or's as well as computing the known
    bits.
    davemgreen committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d04ae1b View commit details
    Browse the repository at this point in the history
  113. [Clang][test] Add fPIC when building shared library (#80065)

    Fix linking error: "ld: error: can't create dynamic relocation
    R_X86_64_64 against local symbol in readonly segment; recompile object
    files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in
    the output"
    jsji committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b929be2 View commit details
    Browse the repository at this point in the history
  114. Configuration menu
    Copy the full SHA
    16c4843 View commit details
    Browse the repository at this point in the history
  115. [Exegesis] Print epsilon value in the sched model inconsistency repor…

    …t (#80080)
    
    Since I've formatted the epsilon value, I don't think it's necessary to
    escape it.
    mshockwave committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    8241106 View commit details
    Browse the repository at this point in the history
  116. [lldb][DataFormatter][NFC] Use GetFirstValueOfLibCXXCompressedPair th…

    …roughout formatters (#80133)
    
    This avoids duplicating the logic to get the first
    element of a libc++ `__compressed_pair`. This will
    be useful in supporting upcoming changes to the layout
    of `__compressed_pair`.
    
    Drive-by changes:
    * Renamed `m_item` to `size_node` for readability;
      `m_item` suggests it's a member variable, which it
      is not.
    Michael137 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    08c0eb1 View commit details
    Browse the repository at this point in the history
  117. [lldb] Add support for large watchpoints in lldb (#79962)

    This patch is the next piece of work in my Large Watchpoint proposal,
    https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116
    
    This patch breaks a user's watchpoint into one or more
    WatchpointResources which reflect what the hardware registers can cover.
    This means we can watch objects larger than 8 bytes, and we can watched
    unaligned address ranges. On a typical 64-bit target with 4 watchpoint
    registers you can watch 32 bytes of memory if the start address is
    doubleword aligned.
    
    Additionally, if the remote stub implements AArch64 MASK style
    watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
    size region of memory up to 2GB, aligned to that same size.
    
    I updated the Watchpoint constructor and CommandObjectWatchpoint to
    create a CompilerType of Array<UInt8> when the size of the watched
    region is greater than pointer-size and we don't have a variable type to
    use. For pointer-size and smaller, we can display the watched granule as
    an integer value; for larger-than-pointer-size we will display as an
    array of bytes.
    
    I have `watchpoint list` now print the WatchpointResources used to
    implement the watchpoint.
    
    I added a WatchpointAlgorithm class which has a top-level static method
    that takes an enum flag mask WatchpointHardwareFeature and a user
    address and size, and returns a vector of WatchpointResources covering
    the request. It does not take into account the number of watchpoint
    registers the target has, or the number still available for use. Right
    now there is only one algorithm, which monitors power-of-2 regions of
    memory. For up to pointer-size, this is what Intel hardware supports.
    AArch64 Byte Address Select watchpoints can watch any number of
    contiguous bytes in a pointer-size memory granule, that is not currently
    supported so if you ask to watch bytes 3-5, the algorithm will watch the
    entire doubleword (8 bytes). The newly default "modify" style means we
    will silently ignore modifications to bytes outside the watched range.
    
    I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
    only run on Darwin when using the in-tree debugserver, which was a proxy
    for "debugserver supports MASK watchpoints". I'll be adding the
    aforementioned feature flag from the stub and enabling full mask
    watchpoints when a debugserver with that feature is enabled, and
    re-enable this test.
    
    I added a new TestUnalignedLargeWatchpoint.py which only has one test
    but it's a great one, watching a 22-byte range that is unaligned and
    requires four 8-byte watchpoints to cover.
    
    I also added a unit test, WatchpointAlgorithmsTests, which has a number
    of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
    think there's interesting possible different approaches to how we cover
    these; I note in the unit test that a user requesting a watch on address
    0x12e0 of 120 bytes will be covered by two watchpoints today, a
    128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
    watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
    false positives/private stops. As we try refining this one, it's helpful
    to have a collection of tests to make sure things don't regress.
    
    I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
    Ubuntu. I have not modifed the Windows process plugins yet, I might try
    that as a standalone patch, I'd be making the change blind, but the
    necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
    small so it might be obvious enough that I can change it and see what
    the Windows CI thinks.
    
    There isn't yet a packet (or a qSupported feature query) for the gdb
    remote serial protocol stub to communicate its watchpoint capabilities
    to lldb. I'll be doing that in a patch right after this is landed,
    having debugserver advertise its capability of AArch64 MASK watchpoints,
    and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
    WatchpointAlgorithms so we can watch larger than 32-byte requests on
    Darwin.
    
    I haven't yet tackled WatchpointResource *sharing* by multiple
    Watchpoints. This is all part of the goal, especially when we may be
    watching a larger memory range than the user requested, if they then add
    another watchpoint next to their first request, it may be covered by the
    same WatchpointResource (hardware watchpoint register). Also one "read"
    watchpoint and one "write" watchpoint on the same memory granule need to
    be handled, making the WatchpointResource cover all requests.
    
    As WatchpointResources aren't shared among multiple Watchpoints yet,
    there's no handling of running the conditions/commands/etc on multiple
    Watchpoints when their shared WatchpointResource is hit. The goal beyond
    "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
    behavior and commands. I have a feeling I may be slowly chipping away at
    this for a while.
    
    rdar://108234227
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    57c66b3 View commit details
    Browse the repository at this point in the history
  118. [gn build] Port 57c66b3

    llvmgnsyncbot committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    35a0089 View commit details
    Browse the repository at this point in the history
  119. [Libomptarget] Remove handling of old ctor / dtor entries (#80153)

    Summary:
    A previous patch removed creating these entries in clang in favor of the
    backend emitting a callable kernel and having the runtime call that if
    present. The support for the old style was kept around in LLVM 18.0 but
    now that we have forked to 19.0 we should remove the support.
    
    The effect of this would be that an application linking against a newer
    libomptarget that still had the old constructors will no longer be
    called. In that case, they can either recompile or use the
    `libomptarget.so.18` that comes with the previous release.
    jhuber6 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    2542876 View commit details
    Browse the repository at this point in the history
  120. [libc++abi] Add temporary workaround to unblock Chrome

    Chrome rolls libc++ and libc++abi as separate projects. As a result, they
    may not always be updated in lockstep, and this can lead to build failures
    when mixing libc++ that doesn't have <__thread/support.h> with libc++abi
    that requires it.
    
    This patch adds a workaround to make libc++abi work with both versions.
    While Chrome's setup is not supported, this workaround will allow them
    to go back to green and do the required work needed to roll libc++ and
    libc++abi in lockstep. This workaround will be short-lived -- I have a
    reminder to go back and remove it by EOW.
    ldionne committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    372f7dd View commit details
    Browse the repository at this point in the history
  121. Add extra printing to TestWatchpointCount.py to debug CI fail

    The way the locals are laid out on the stack on x86-64 Debian is
    resulting in a test failure with the new large watchpoint support.
    Collecting more logging before I revert/debug it.
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dad50fe View commit details
    Browse the repository at this point in the history
  122. [DirectX][docs] Architecture and design philosophy of DXIL support

    This documents some of the architectural direction for DXIL and tries
    to provide a bit of a map for where to implement different aspects of
    DXIL support.
    
    Pull Request: llvm/llvm-project#78221
    bogner committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    151559c View commit details
    Browse the repository at this point in the history
  123. [SYCL][ESIMD] Implement unified memory API for scatter(usm, ...) (int…

    …el#12510)
    
    This implements the unified memory API for scatter with USM pointers.
    
    ---------
    
    Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
    sarnex committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0bf2e66 View commit details
    Browse the repository at this point in the history
  124. [lld] enable fixup chains by default (#79894)

    Enable chained fixups in lld when all platform and version criteria are
    met. This is an attempt at simplifying the logic used in ld 907:
    
    https://github.com/apple-oss-distributions/ld64/blob/93d74eafc37c0558b4ffb88a8bc15c17bed44a20/src/ld/Options.cpp#L5458-L5549
    
    Some changes were made to simplify the logic:
    - only enable chained fixups for macOS from 13.0 to avoid the arch check
    - only enable chained fixups for iphonesimulator from 16.0 to avoid the
    arch check
    - don't enable chained fixups for not specifically listed platforms
    - don't enable chained fixups for arm64_32
    rmaz committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    775c285 View commit details
    Browse the repository at this point in the history
  125. Collecting more logging to debug CI bots

    Watchpoint test fails on arm-ubuntu and x86-64-debian
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    cf2533e View commit details
    Browse the repository at this point in the history
  126. Configuration menu
    Copy the full SHA
    09fc333 View commit details
    Browse the repository at this point in the history
  127. Add logging to WatchpointAlgorithm

    When verbose lldb watch channel is enabled, print the
    user requested watchpoint and the resources we've
    broken it up into.
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d6e1ae2 View commit details
    Browse the repository at this point in the history
  128. [CI][NFC] Unify naming scheme for SYCL workflows. (intel#12525)

    All GitHub Actions workflows added by intel/llvm project follow similar naming notation:
    1. Name starts with `sycl` prefix.
    2. Use dash `-` instead of underscore `_` to separate words.
    bader committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    16a368c View commit details
    Browse the repository at this point in the history
  129. Configuration menu
    Copy the full SHA
    fa42589 View commit details
    Browse the repository at this point in the history
  130. Revert "[CI][NFC] Unify naming scheme for SYCL workflows." (intel#12567)

    Reverts intel#12525
    
    In addition to file renaming, we need to update file names referenced
    inside the workflow files.
    bader committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    1b5daa8 View commit details
    Browse the repository at this point in the history
  131. [SYCL][ESIMD][E2E] Disable two LSC tests on DG2 (intel#12565)

    They started failing in the recent driver update. I can't reproduce it
    locally with the same driver version but the hardware we have is a
    little different, maybe that's why. I made an internal tracker for this.
    
    Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
    sarnex committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    7348207 View commit details
    Browse the repository at this point in the history
  132. [clang-tidy] Remove cert-dcl21-cpp check (#80181)

    Deprecated since clang-tidy 17. The rule DCL21-CPP has been removed from
    the CERT guidelines, so it does not make sense to keep the check.
    
    Fixes #42788
    
    Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
    carlosgalvezp and Carlos Gálvez committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    4cb13f2 View commit details
    Browse the repository at this point in the history
  133. [lldb][progress][NFC] Add unit test for progress reports (#79533)

    This test is being added as a way to check the behaviour of how progress
    events are broadcasted when reports are started and ended with the
    current implementation of progress reports. Here we're mainly checking
    and ensuring that the current behaviour is that progress events are
    broadcasted individually and placed in the event queue in order of their
    creation and deletion.
    chelcassanova committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    51e0d1b View commit details
    Browse the repository at this point in the history
  134. Configuration menu
    Copy the full SHA
    c84f2ba View commit details
    Browse the repository at this point in the history
  135. [flang] DEALLOCATE(pointer) should use PointerDeallocate() (#79702)

    A DEALLOCATE statement on a pointer should always use
    PointerDeallocate() in the runtime, even if there's no STAT= or
    polymorphism or derived types, so that it can be checked to ensure that
    it is indeed a whole allocation of a pointer.
    klausler committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dc15524 View commit details
    Browse the repository at this point in the history
  136. [flang][runtime] Add limit check to MOD/MODULO (#80026)

    When testing the arguments to see whether they are integers, check first
    that they are within the maximum range of a 64-bit integer; otherwise, a
    value of larger magnitude will set an invalid operand exception flag.
    klausler committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    dbf547f View commit details
    Browse the repository at this point in the history
  137. [flang][preprocessor] Replace macros in some #include directives (#80…

    …039)
    
    Ensure that #include FOO undergoes macro replacement. But, as is the
    case with C/C++, continue to not perform macro replacement in a #include
    directive with <angled brackets>.
    klausler committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    6086007 View commit details
    Browse the repository at this point in the history
  138. [flang] Downgrade a too-strong error message to a warning (#80095)

    When a compilation unit has an interface to an external subroutine or
    function, and there is a global object (like a module) with the same
    name, we're emitting an error. This is too strong, the program will
    still build. This comes up in real applications, too. Downgrade the
    error to a warning.
    klausler committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    2ba94bf View commit details
    Browse the repository at this point in the history
  139. Revert "[lldb][progress][NFC] Add unit test for progress reports (#79…

    …533)"
    
    This reverts commit 51e0d1b.
    That commit breaks a unit test:
    
    ```
    Failed Tests (1):
      lldb-unit :: Core/./LLDBCoreTests/4/8
    ```
    chelcassanova committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    209fe1f View commit details
    Browse the repository at this point in the history
  140. [SYCL] Fix resource leak related to SYCL_FALLBACK_ASSERT (intel#12532)

    intel#6837 enabled asynchronous buffer
    destruction for buffers constructed without host data. However, initial
    fallback assert implementation in
    intel#3767 predates it and as such had to
    place the buffer inside `queue_impl` to avoid unintended synchronization
    point. I don't know if there was the same crash observed on the
    end-to-end test added as part of this PR prior to
    intel#3767, but it doesn't even matter
    because the "new" implementation is both simpler and doesn't result in a
    crash.
    
    I suspect that without it (with the buffer for fallback assert
    implementation being a data member of `sycl::queue_impl`) we had a
    cyclic dependency somewhere leading to resource leak and ultimately to
    the assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`.
    aelovikov-intel committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b478d2f View commit details
    Browse the repository at this point in the history
  141. Fix conflict resolution fa36da7

    The conflict resoultion removed sycl related changes,
    this is to bring it back.
    jsji committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    99852c0 View commit details
    Browse the repository at this point in the history
  142. Configuration menu
    Copy the full SHA
    9d41fba View commit details
    Browse the repository at this point in the history
  143. Configuration menu
    Copy the full SHA
    19f429a View commit details
    Browse the repository at this point in the history
  144. Revert "Add logging to WatchpointAlgorithm"

    This reverts commit d6e1ae2.
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    e95250c View commit details
    Browse the repository at this point in the history
  145. Revert "Collecting more logging to debug CI bots"

    This reverts commit cf2533e.
    jasonmolenda committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    46643e0 View commit details
    Browse the repository at this point in the history
  146. Configuration menu
    Copy the full SHA
    cc4af03 View commit details
    Browse the repository at this point in the history
  147. Configuration menu
    Copy the full SHA
    d347c56 View commit details
    Browse the repository at this point in the history
  148. [SYCL][E2E] Disable USM/usm_pooling.cpp on gpu-intel-dg2 (intel#12564)

    See intel#12397, the test is flaky in
    post-commit.
    aelovikov-intel committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    85e461e View commit details
    Browse the repository at this point in the history
  149. [gn build] Port d347c56

    llvmgnsyncbot committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    742f88e View commit details
    Browse the repository at this point in the history
  150. [clang][DependencyScanner] Remove unused -fmodule-map-file arguments …

    …(#80090)
    
    Since we already add a `-fmodule-map-file=` argument for every used
    modulemap, we can remove all `ModuleMapFiles` entries before adding
    them.
    
    This reduces the number of module variants when `-fmodule-map-file=`
    appears on the original command line.
    Bigcheese committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c003d85 View commit details
    Browse the repository at this point in the history
  151. [LSR] Add a test case mentioned in review

    As mentioned in llvm/llvm-project#74747, this case is triggering a particularly high cost trip count expansion.
    preames committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5282202 View commit details
    Browse the repository at this point in the history
  152. [Github] Build PGO optimized toolchain in container (#80096)

    This patch adjusts the Docker container intended for CI use to contain a
    PGO+ThinLTO+BOLT optimized clang. The toolchain is built within a Github
    action and takes ~3.5 hours. No caching is utilized. The current PGO
    optimization is fairly minimal, only running clang over hello world.
    This can be adjusted as needed.
    boomanaiden154 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    9107904 View commit details
    Browse the repository at this point in the history
  153. [ORC] Merge MaterializationResponsibility notifyEmitted and addDepend…

    …encies
    
    Removes the MaterializationResponsibility::addDependencies and
    addDependenciesForAll methods, and transfers dependency registration to
    the notifyEmitted operation. The new dependency registration allows
    dependencies to be specified for arbitrary subsets of the
    MaterializationResponsibility's symbols (rather than just single symbols
    or all symbols) via an array of SymbolDependenceGroups (pairs of symbol
    sets and corresponding dependencies for that set).
    
    This patch aims to both improve emission performance and simplify
    dependence tracking. By eliminating some states (e.g. symbols having
    registered dependencies but not yet being resolved or emitted) we make
    some errors impossible by construction, and reduce the number of error
    cases that we need to check. NonOwningSymbolStringPtrs are used for
    dependence tracking under the session lock, which should reduce
    ref-counting operations, and intra-emit dependencies are resolved
    outside the session lock, which should provide better performance when
    JITing concurrently (since some dependence tracking can happen in
    parallel).
    
    The Orc C API is updated to account for this change, with the
    LLVMOrcMaterializationResponsibilityNotifyEmitted API being modified and
    the LLVMOrcMaterializationResponsibilityAddDependencies and
    LLVMOrcMaterializationResponsibilityAddDependenciesForAll operations
    being removed.
    lhames committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ebe8733 View commit details
    Browse the repository at this point in the history
  154. [libc] Fix condition ordering in scanf (#80083)

    The inf and nan string index bounds checks were after the index was
    being used. This patch moves the index usage to the end of the
    condition.
    
    Fixes #79988
    michaelrj-google committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    22773e5 View commit details
    Browse the repository at this point in the history
  155. [AIX] [XCOFF] Add support for common and local common symbols in the …

    …TOC (#79530)
    
    This patch adds support for common and local symbols in the TOC for AIX.
    Note that we need to update isVirtualSection so as a common symbol in
    TOC will have the symbol type XTY_CM and will be initialized when placed
    in the TOC so sections with this type are no longer virtual.
    
    ---------
    
    Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
    syzaara and syzaara committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    a03a6e9 View commit details
    Browse the repository at this point in the history
  156. [analyzer] Unbreak [[clang::suppress]] on checkers without decl-with-…

    …issue. (#79398)
    
    There are currently a few checkers that don't fill in the bug report's
    "decl-with-issue" field (typically a function in which the bug is
    found).
    
    The new attribute `[[clang::suppress]]` uses decl-with-issue to reduce
    the size of the suppression source range map so that it didn't need to
    do that for the entire translation unit.
    
    I'm already seeing a few problems with this approach so I'll probably
    redesign it in some point as it looks like a premature optimization. Not
    only checkers shouldn't be required to pass decl-with-issue (consider
    clang-tidy checkers that never had such notion), but also it's not
    necessarily uniquely determined (consider leak suppressions at
    allocation site).
    
    For now I'm adding a simple stop-gap solution that falls back to
    building the suppression map for the entire TU whenever decl-with-issue
    isn't specified. Which won't happen in the default setup because luckily
    all default checkers do provide decl-with-issue.
    
    ---------
    
    Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
    haoNoQ and steakhal committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    56e241a View commit details
    Browse the repository at this point in the history
  157. [AArch64][SVE2] Generate urshr rounding shift rights (#78374)

    Add a new node `AArch64ISD::URSHR_I_PRED`.
    
    `srl(add(X, 1 << (ShiftValue - 1)), ShiftValue)` is transformed to
    `urshr`, or to `rshrnb` (as before) if the result it truncated.
    
    `uzp1(rshrnb(uunpklo(X),C), rshrnb(uunpkhi(X), C))` is converted to
    `urshr(X, C)` (tested by the wide_trunc tests).
    
    Pattern matching code in `canLowerSRLToRoundingShiftForVT` is taken
    from prior code in rshrnb. It returns true if the add has NUW or if the
    number of bits used in the return value allow us to not care about the
    overflow (tested by rshrnb test cases).
    UsmanNadeem committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    1d14323 View commit details
    Browse the repository at this point in the history
  158. Configuration menu
    Copy the full SHA
    4eee045 View commit details
    Browse the repository at this point in the history
  159. Configuration menu
    Copy the full SHA
    0f728a0 View commit details
    Browse the repository at this point in the history
  160. [NVPTX] improve Boolean ISel (#80166)

    Add TableGen patterns to convert more instructions to boolean
    expressions:
    
    - **mul -> and/or**: i1 multiply instructions currently cannot be
    selected causing the compiler to crash. See
    llvm/llvm-project#57404
    - **select -> and/or**: Converting selects to and/or can enable more
    optimizations. `InstCombine` cannot do this as aggressively due to
    poison semantics.
    AlexMaclean committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    5e3ae4c View commit details
    Browse the repository at this point in the history
  161. [RISCV] Improve legalization of e8 m8 VL>256 shuffles (#79330)

    If we can't produce a large enough index vector in i8, we may need to legalize
    the shuffle (via scalarization - which in turn gets lowered into stack usage).
    This change makes two related changes:
    * Deferring legalization until we actually need to generate the vrgather
      instruction.  With the new recursive structure, this only happens when
      doing the fallback for one of the arms.
    * Check the actual mask values for something outside of the representable
      range.
    
    Both are covered by recently added tests.
    preames committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ff53d50 View commit details
    Browse the repository at this point in the history
  162. [lldb][NFCI] Remove m_being_created from Breakpoint classes (#79716)

    The purpose of m_being_created in these classes was to prevent
    broadcasting an event related to these Breakpoints during the creation
    of the breakpoint (i.e. in the constructor). In Breakpoint and
    Watchpoint, m_being_created had no effect. That is to say, removing it
    does not change behavior.
    However, BreakpointLocation does still use m_being_created. In the
    constructor, SetThreadID is called which does broadcast an event only if
    `m_being_created` is false. Instead of having this logic be roundabout,
    the constructor instead calls `SetThreadIDInternal`, which actually
    changes the thread ID. `SetThreadID` also will call
    `SetThreadIDInternal` in addition to broadcasting a changed event.
    bulbazord committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    db68e92 View commit details
    Browse the repository at this point in the history
  163. [lsr][term-fold] Restrict transform to low cost expansions (#74747)

    This is a follow up to an item I noted in my submission comment for
    e947f95. I don't have a real world example where this is triggering
    unprofitably, but avoiding the transform when we estimate the loop to be
    short running from profiling seems quite reasonable. It's also now come
    up as a possibility in a regression twice in two days, so I'd like to
    get this in to close out the possibility if nothing else.
    
    The original review dropped the threshold for short trip count loops. I
    will return to that in a separate review if this lands.
    preames committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f264da4 View commit details
    Browse the repository at this point in the history
  164. Partial revert "[HIP] Fix -mllvm option for device lld linker" (#80202)

    This partially reverts commit aa964f1
    because it caused perf regressions in rccl due to drop of -mllvm
    -amgpu-kernarg-preload-count=16 from the linker step. Potentially it
    could cause similar regressions for other HIP apps using -mllvm options
    with -fgpu-rdc.
    
    Fixes: SWDEV-443345
    yxsamliu committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    7c2e32d View commit details
    Browse the repository at this point in the history
  165. [CI][NFC] Unify naming scheme for SYCL workflows. (intel#12568)

    All GitHub Actions workflows added by intel/llvm project are expected to
    use following naming notation:
    
    1. Name starts with `sycl` prefix.
    2. Use dash `-` to separate words (instead of underscore `_`).
    
    This patches fixes naming of workflows which do not follow this
    notation.
    bader committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    435845b View commit details
    Browse the repository at this point in the history
  166. Reland "[lldb][progress][NFC] Add unit test for progress reports (#79…

    …533)"
    
    This reverts commit 209fe1f.
    
    The original commit failed to due an assertion failure in the unit test
    `ProgressReportTest` that the commit added. The Debugger::Initialize()
    function was called more than once which triggered the assertion, so
    this commit calls that function under a `std::call_once`.
    chelcassanova committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    a5a8cbb View commit details
    Browse the repository at this point in the history
  167. Configuration menu
    Copy the full SHA
    5561bea View commit details
    Browse the repository at this point in the history
  168. Revert "Reland "[lldb][progress][NFC] Add unit test for progress repo…

    …rts (#79533)""
    
    This reverts commit a5a8cbb.
    
    The test being added by that commit still fails on the assertion that
    Debugger::Initialize has been called more than once.
    chelcassanova committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    40ebe52 View commit details
    Browse the repository at this point in the history
  169. [RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. (#80119)

    We don't have an AMO instruction for Nand, so with the A extension we
    use an LR/SC loop. If we have Zacas we can use a CAS loop instead.
    
    According to the Zacas spec, a CAS loop scales to highly parallel
    systems better than LR/SC.
    topperc committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    cf401f7 View commit details
    Browse the repository at this point in the history
  170. [libc][docs] fix stdbit.h docs (#80070)

    Fix rst comment, add checks for recently implemented functions+macro.
    nickdesaulniers committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0e0d155 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. [libc] Fix read under msan (#80203)

    The read function wasn't properly unpoisoning its result under msan,
    causing test failures downstream when I tried to roll it out. This patch
    adds the msan unpoison call that fixes the issue.
    michaelrj-google committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    0e8eb44 View commit details
    Browse the repository at this point in the history
  2. [mlir][Vector] Add support for sub-byte transpose emulation (#80110)

    This PR adds patterns to convert a sub-byte vector transpose into a
    sequence of instructions that perform the transpose on i8 vector
    elements. Whereas this rewrite may not lead to the absolute peak
    performance, it should ensure correctness when dealing with sub-byte
    transposes.
    dcaballe committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    8ba018d View commit details
    Browse the repository at this point in the history
  3. [mlir][arith] Improve truncf folding (#80206)

    * Use APFloat conversion function instead of going through double to
    check if fold results in information loss.
    * Support folding vector constants.
    kuhar committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    730f498 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f8be7f2 View commit details
    Browse the repository at this point in the history
  5. [clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101)

    So that it can be used by clang-format.
    owenca committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    a8279a8 View commit details
    Browse the repository at this point in the history
  6. [clang-format] Simplify the AfterPlacementOperator option (#79796)

    Change AfterPlacementOperator to a boolean and deprecate SBPO_Never,
    which meant never inserting a space except when after new/delete.
    
    Fixes #78892.
    owenca committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    908fd09 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    994493c View commit details
    Browse the repository at this point in the history
  8. [clang][dataflow] Display line numbers in the HTML logger timeline. (…

    …#80130)
    
    This makes it easier to count how many iterations an analysis takes to
    complete.
    It also makes it easier to compare how a change to the analysis code
    affects
    the timeline.
    
    Here's a sample screenshot:
    
    
    ![image](https://github.com/llvm/llvm-project/assets/29098113/b3f44b4d-7037-4f28-9532-5418663250e1)
    martinboehme committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    0c36127 View commit details
    Browse the repository at this point in the history
  9. [lldb] Add support for large watchpoints in lldb (#79962)

    This patch is the next piece of work in my Large Watchpoint proposal,
    https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116
    
    This patch breaks a user's watchpoint into one or more
    WatchpointResources which reflect what the hardware registers can cover.
    This means we can watch objects larger than 8 bytes, and we can watched
    unaligned address ranges. On a typical 64-bit target with 4 watchpoint
    registers you can watch 32 bytes of memory if the start address is
    doubleword aligned.
    
    Additionally, if the remote stub implements AArch64 MASK style
    watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
    size region of memory up to 2GB, aligned to that same size.
    
    I updated the Watchpoint constructor and CommandObjectWatchpoint to
    create a CompilerType of Array<UInt8> when the size of the watched
    region is greater than pointer-size and we don't have a variable type to
    use. For pointer-size and smaller, we can display the watched granule as
    an integer value; for larger-than-pointer-size we will display as an
    array of bytes.
    
    I have `watchpoint list` now print the WatchpointResources used to
    implement the watchpoint.
    
    I added a WatchpointAlgorithm class which has a top-level static method
    that takes an enum flag mask WatchpointHardwareFeature and a user
    address and size, and returns a vector of WatchpointResources covering
    the request. It does not take into account the number of watchpoint
    registers the target has, or the number still available for use. Right
    now there is only one algorithm, which monitors power-of-2 regions of
    memory. For up to pointer-size, this is what Intel hardware supports.
    AArch64 Byte Address Select watchpoints can watch any number of
    contiguous bytes in a pointer-size memory granule, that is not currently
    supported so if you ask to watch bytes 3-5, the algorithm will watch the
    entire doubleword (8 bytes). The newly default "modify" style means we
    will silently ignore modifications to bytes outside the watched range.
    
    I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
    only run on Darwin when using the in-tree debugserver, which was a proxy
    for "debugserver supports MASK watchpoints". I'll be adding the
    aforementioned feature flag from the stub and enabling full mask
    watchpoints when a debugserver with that feature is enabled, and
    re-enable this test.
    
    I added a new TestUnalignedLargeWatchpoint.py which only has one test
    but it's a great one, watching a 22-byte range that is unaligned and
    requires four 8-byte watchpoints to cover.
    
    I also added a unit test, WatchpointAlgorithmsTests, which has a number
    of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
    think there's interesting possible different approaches to how we cover
    these; I note in the unit test that a user requesting a watch on address
    0x12e0 of 120 bytes will be covered by two watchpoints today, a
    128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
    watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
    false positives/private stops. As we try refining this one, it's helpful
    to have a collection of tests to make sure things don't regress.
    
    I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
    Ubuntu. I have not modifed the Windows process plugins yet, I might try
    that as a standalone patch, I'd be making the change blind, but the
    necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
    small so it might be obvious enough that I can change it and see what
    the Windows CI thinks.
    
    There isn't yet a packet (or a qSupported feature query) for the gdb
    remote serial protocol stub to communicate its watchpoint capabilities
    to lldb. I'll be doing that in a patch right after this is landed,
    having debugserver advertise its capability of AArch64 MASK watchpoints,
    and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
    WatchpointAlgorithms so we can watch larger than 32-byte requests on
    Darwin.
    
    I haven't yet tackled WatchpointResource *sharing* by multiple
    Watchpoints. This is all part of the goal, especially when we may be
    watching a larger memory range than the user requested, if they then add
    another watchpoint next to their first request, it may be covered by the
    same WatchpointResource (hardware watchpoint register). Also one "read"
    watchpoint and one "write" watchpoint on the same memory granule need to
    be handled, making the WatchpointResource cover all requests.
    
    As WatchpointResources aren't shared among multiple Watchpoints yet,
    there's no handling of running the conditions/commands/etc on multiple
    Watchpoints when their shared WatchpointResource is hit. The goal beyond
    "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
    behavior and commands. I have a feeling I may be slowly chipping away at
    this for a while.
    
    Re-landing this patch after fixing two undefined behaviors in
    WatchpointAlgorithms found by UBSan and by failures on different
    CI bots.
    
    rdar://108234227
    jasonmolenda committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    147d7a6 View commit details
    Browse the repository at this point in the history
  10. [gn build] Port 147d7a6

    llvmgnsyncbot committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    19a10c1 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    995d21b View commit details
    Browse the repository at this point in the history
  12. [llvm-gsymutil] Print one-time DWO file missing warning under --quiet…

    … flag (#79882)
    
    FileCheck test added
    ```
    ./bin/llvm-lit -sv llvm/test/tools/llvm-gsymutil/X86/elf-dwo.yaml
    ```
    
    Manual test steps:
    
    - Create binary with split-dwarf:
    ```
    clang++ -g -gdwarf-4 -gsplit-dwarf main.cpp -o main_split
    ```
    
    - Remove or remane the dwo file to a different name so llvm-gsymutil can't find it
    ```
    mv main_split-main.dwo main_split-main__.dwo
    ```
    
    - Now run llvm-gsymutil conversion, it should print out warning with and
    without the `--quiet` flag
    ```
    $ ./bin/llvm-gsymutil --convert=./main_split
    Input file: ./main_split
    Output file (x86_64): ./main_split.gsym
    warning: Unable to retrieve DWO .debug_info section for main_split-main.dwo
    Loaded 0 functions from DWARF.
    Loaded 12 functions from symbol table.
    Pruned 0 functions, ended with 12 total
    ```
    
    ```
    $ ./bin/llvm-gsymutil --convert=./main_split --quiet
    Input file: ./main_split
    Output file (x86_64): ./main_split.gsym
    warning: Unable to retrieve DWO .debug_info section for some object files. (Remove the --quiet flag for full output)
    Pruned 0 functions, ended with 12 total
    ```
    kusmour committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    5a8f290 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    3b76b86 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    c82a645 View commit details
    Browse the repository at this point in the history
  15. [C++20] [Modules] Introduce -fskip-odr-check-in-gmf (#79959)

    Close llvm/llvm-project#79240
    
    Cite the comment from @mizvekov in
    //github.com/llvm/llvm-project/issues/79240:
    
    > There are two kinds of bugs / issues relevant here:
    >
    > Clang bugs that this change hides
    > Here we can add a Frontend flag that disables the GMF ODR check, just
    > so
    > we can keep tracking, testing and fixing these issues.
    > The Driver would just always pass that flag.
    > We could add that flag in this current issue.
    > Bugs in user code:
    > I don't think it's worth adding a corresponding Driver flag for
    > controlling the above Frontend flag, since we intend it's behavior to
    > become default as we fix the problems, and users interested in testing
    > the more strict behavior can just use the Frontend flag directly.
    
    This patch follows the suggestion:
    - Introduce the CC1 flag `-fskip-odr-check-in-gmf` which is by default
    off, so that the every existing test will still be tested with checking
    ODR violations.
    - Passing `-fskip-odr-check-in-gmf` in the driver to keep the behavior
    we intended.
    - Edit the document to tell the users who are still interested in more
    strict checks can use `-Xclang -fno-skip-odr-check-in-gmf` to get the
    existing behavior.
    ChuanqiXu9 committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    8eea582 View commit details
    Browse the repository at this point in the history
  16. [clang-tidy] Add AllowStringArrays option to modernize-avoid-c-arrays…

    … (#71701)
    
    Add AllowStringArrays option, enabling the exclusion of array types with
    deduced sizes constructed from string literals. This includes only var
    declarations of array of characters constructed directly from c-strings.
    
    Closes #59475
    PiotrZSL committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    b777bb7 View commit details
    Browse the repository at this point in the history
  17. [clang-format] Allow decltype in requires clause (#78847)

    If clang-format is not sure whether a `requires` keyword starts a
    requires clause or a requires expression, it looks ahead to see if any
    token disqualifies it from being a requires clause. Among these tokens
    was `decltype`, since it fell through the switch.
    
    This patch allows decltype to exist in a require clause.
    
    I'm not 100% sure this change won't have repercussions, but that just
    means we need more test coverage!
    
    Fixes llvm/llvm-project#78645
    rymiel committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    9b68c09 View commit details
    Browse the repository at this point in the history
  18. Skip 2 of the three test sets to narrow down the arm-ubuntu

    CI bot crash when running this unittest.  The printfs aren't
    printing into the CI log output.
    jasonmolenda committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    fdd98e5 View commit details
    Browse the repository at this point in the history
  19. [clang][Interp] complex binary operators aren't always initializing

    The added test case would trigger the removed assertion.
    tbaederr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    a8f317a View commit details
    Browse the repository at this point in the history
  20. [Github] Build stage2-clang-bolt target for CI container

    Only the stage2-distribution target is built by default for the
    stage2 distribution installation target. This means that we don't get a
    BOLT optimized binary. This patch explicitly builds the
    stage2-clang-bolt target before the distribution installation target so
    that the clang binary is optimized before it gets installed.
    boomanaiden154 committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    5d9ffcd View commit details
    Browse the repository at this point in the history
  21. [clang][Interp] Handle imaginary literals (#79130)

    Initialize the first element to 0 and the second element to the value of
    the subexpression.
    tbaederr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    6ff431b View commit details
    Browse the repository at this point in the history
  22. [X86][CodeGen] Set mayLoad = 1 for LZCNT/POPCNT/TZCNTrm_(EVEX|NF)

    Promoted and NF LZCNT/POPCNT/TZCNT were supported in #79954.
    B/c null_frag is used in the patterns for these variants, tablgen can
    not infer mayLoad = 1 for them.
    
    This can be tested by MCA tests, which will be added after
    -mcpu=<cpu_with_apx> is supported.
    KanRobert committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    1395e58 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    021a2b4 View commit details
    Browse the repository at this point in the history
  24. [clang][Interp] Protect Inc/Dec ops against dummy pointers

    We create them more often in C, so it's more likely to happen there.
    tbaederr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    a9e8309 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    fa98e28 View commit details
    Browse the repository at this point in the history
  26. [clang][Interp] Support GenericSelectionExprs

    Just delegate to the resulting expression.
    tbaederr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    48f8b74 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    54f324f View commit details
    Browse the repository at this point in the history
  28. [mlir] Use create instead of createOrFold for ConstantOp as foldi…

    …ng has no effect (NFC) (#80129)
    
    This aims to clean-up confusing uses of
    builder.createOrFold<ConstantOp> since folding of constants fails.
    nujaa committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    65066c0 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    7ec996d View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    e851278 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    39fa304 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    b67ce7e View commit details
    Browse the repository at this point in the history
  33. Done iterating with arm-ubuntu bot, I see the problem test.

    Go back to the original form of this file before I add temp
    workaround.
    jasonmolenda committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    eaa3d5e View commit details
    Browse the repository at this point in the history
  34. Skip two WatchpointAlgorithm tests for 32-bit lldb's

    After iterating with the arm-ubuntu CI bot, I found the crash (a
    std::bad_alloc exception being thrown) was caused by these two
    entries when built on a 32-bit machine.  I probably have an assumption
    about size_t being 64-bits in WatchpointAlgorithms and we have a
    problem when it's actually 32-bits and we're dealing with a real
    64-bit address.  All of the cases where the address can be represented
    in the low 32-bits of the addr_t work correctly, so for now I'm
    skipping these two unit tests when building lldb on a 32-bit host
    until I can review that method and possibly switch to explicit
    uin64_t's.
    .
    jasonmolenda committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    90e6808 View commit details
    Browse the repository at this point in the history
  35. [mlir][Transforms] GreedyPatternRewriteDriver: Hash ops separately …

    …(#78312)
    
    The greedy pattern rewrite driver has multiple "expensive checks" to
    detect invalid rewrite pattern API usage. As part of these checks, it
    computes fingerprints for every op that is in scope, and compares the
    fingerprints before and after an attempted pattern application.
    
    Until now, each computed fingerprint took into account all nested
    operations. That is quite expensive because it walks the entire IR
    subtree. It is also redundant in the expensive checks because we already
    compute a fingerprint for every op.
    
    This commit significantly improves the running time of the "expensive
    checks" in the greedy pattern rewrite driver.
    matthias-springer committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    5fdf8c6 View commit details
    Browse the repository at this point in the history
  36. [flang][NFC] Cache derived type translation in lowering (#80179)

    Derived type translation is proving expensive in modern fortran apps
    with many big derived types with dozens of components and parents.
    
    Extending the cache that prevent recursion is proving to have little
    cost on apps with small derived types and significant gain (can divide
    compile time by 2) on modern fortran apps.
    
    It is legal since the cache lifetime is less than the MLIRContext
    lifetime that owns the cached mlir::Type.
    
    Doing so also exposed that the current caching was incorrect, the type
    symbol is the same for kind parametrized derived types regardless of the
    kind parameters. Instances with different kinds should lower to
    different MLIR types. See added test.
    Using the type scopes fixes the problem.
    jeanPerier committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    84564e1 View commit details
    Browse the repository at this point in the history
  37. [Clang][test] Limit library search when linking shared lib (#80253)

    Don't search for unnecessary libs when linking the shared lib. This
    allows the test to run in chroot environment.
    apolloww committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    ae931b4 View commit details
    Browse the repository at this point in the history
  38. [mlir][EmitC] Add func, call and return operations and conversions (…

    …#79612)
    
    This adds a `func`, `call` and `return` operation to the EmitC dialect,
    closely related to the corresponding operations of the Func dialect. In
    contrast to the operations of the Func dialect, the EmitC operations do
    not support multiple results. The `emitc.func` op features a
    `specifiers` argument that for example allows, with corresponding
    support in the emitter, to emit `inline static` functions.
    
    Furthermore, this adds patterns and a pass to convert the Func dialect
    to EmitC. A `func.func` op that is `private` is converted to
    `emitc.func` with a `"static"` specifier.
    marbre committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    e7d40a8 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    d0dbd50 View commit details
    Browse the repository at this point in the history
  40. [bazel] Merge TableGenGlobalISel into the tablegen target

    These two are intertwined enough so it doesn't really make sense to have
    it standalone and hack around it by putting headers into both.
    d0k committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    468b239 View commit details
    Browse the repository at this point in the history
  41. [bazel] Put back the pieces of TableGenGlobalISel that unittests depe…

    …nd on
    
    This is a mess and needs to be cleaned up some day.
    d0k committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    395c817 View commit details
    Browse the repository at this point in the history
  42. [llvm-exegesis] Replace --num-repetitions with --min-instructions (#7…

    …7153)
    
    This patch replaces --num-repetitions with --min-instructions to make it
    more clear that the value refers to the minimum number of instructions
    in the final assembled snippet rather than the number of repetitions of
    the snippet. This patch also refactors some llvm-exegesis internal
    variable names to reflect the name change.
    
    Fixes #76890.
    boomanaiden154 committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    415bf20 View commit details
    Browse the repository at this point in the history
  43. [bazel] Fix a typo from e7d40a8

    d0k committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    ca7fd25 View commit details
    Browse the repository at this point in the history
  44. [flang][HLFIR] Relax verifiers of intrinsic operations (#80132)

    The verifiers are currently very strict: requiring intrinsic operations
    to be used only in cases where the Fortran standard permits the
    intrinsic to be used.
    
    There have now been a lot of cases where these verifiers have caused
    bugs in corner cases. In a recent ticket, @jeanPerier pointed out that
    it could be useful for future optimizations if somewhat invalid uses of
    these operations could be allowed in dead code. See this comment:
    llvm/llvm-project#79995 (comment)
    
    In response to all of this, I have decided to relax the intrinsic
    operation verifiers. The intention is now to only disallow operation
    uses that are likely to crash the compiler. Other checks are still
    available under `-strict-intrinsic-verifier`.
    
    The disadvantage of this approach is that IR can now represent intrinsic
    invocations which are incorrect. The lowering and implementation of
    these intrinsic functions is unlikely to do the right thing in all of
    these cases, and as they should mostly be impossible to generate using
    normal Fortran code, these edge cases will see very little testing,
    before some new optimization causes them to become more common.
    
    Fixes #79995
    tblah committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    e9e0167 View commit details
    Browse the repository at this point in the history
  45. [Clang][AArch64] Add ACLE macros for FEAT_PAuth_LR (#80163)

    This updates clang's target defines to include the ACLE changes covering
    the FEAT_PAuth_LR architecture extension.
    The changes include:
    * The new `__ARM_FEATURE_PAUTH_LR` feature macro, which is set to 1 when
      FEAT_PAuth_LR is available in the target.
    * A new bit field for the existing `__ARM_FEATURE_PAC_DEFAULT` macro,
      indicating the use of PC as a diversifier for Pointer Authentication
      (from -mbranch-protection=pac-ret+pc).
    
    The approved changes to the ACLE spec can be found here:
    ARM-software/acle#292
    pratlucas committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    1bbb797 View commit details
    Browse the repository at this point in the history
  46. [HWASAN] Remove DW_OP_LLVM_tag_offset from DIExpression::isImplicit (…

    …#79816)
    
    According to its doc-comment `isImplicit` is meant to return true if the
    expression is an implicit location description (describes an object or part of
    an object which has no location by computing the value from available program
    state).
    
    There's a brief entry for `DW_OP_LLVM_tag_offset` in the LangRef and there's
    some info in the original commit fb9ce10.
    
    From what I can tell it doesn't look like `DW_OP_LLVM_tag_offset` affects
    whether or not the location is implicit; the opcode doesn't get included in the
    final location description but instead is added as an attribute to the variable.
    
    This was tripping an assertion in the latest application of the fix to #76545,
    #78606, where an expression containing a `DW_OP_LLVM_tag_offset` is split into
    a fragment (i.e., describe a part of the whole variable).
    OCHyams committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    f34418c View commit details
    Browse the repository at this point in the history
  47. [GitHub][workflows] Reflow some text in buildbot info PR comment

    When the markdown link renders the line gets a lot shorter.
    DavidSpickett committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    96a3d05 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    b5c0b67 View commit details
    Browse the repository at this point in the history
  49. [SCEVExp] Keep NUW/NSW if both original inc and isomporphic inc agree…

    …. (#79512)
    
    We are replacing with a wider increment. If both OrigInc and
    IsomorphicInc are NUW/NSW, then we can preserve them on the wider
    increment; the narrower IsomorphicInc would wrap before the wider
    OrigInc, so the replacement won't make IsomorphicInc's uses more
    poisonous.
    
    PR: llvm/llvm-project#79512
    fhahn committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    da43733 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    7d78ccf View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    ea29842 View commit details
    Browse the repository at this point in the history
  52. [SYCL][Fusion] Handle fusion leading to synchronization issues (intel…

    …#12538)
    
    Do not allow fusion when one of the kernels has an explicit local size
    and it requires ID remapping, i.e., it has a different number of
    dimensions w.r.t. the fused ND-range or different global size in
    dimensions [2, N). In this case, two work-items belonging to the same
    work-group may not belong to the same work-group in the fused ND-range.
    
    Signed-off-by: Victor Perez <victor.perez@codeplay.com>
    
    ---------
    
    Signed-off-by: Victor Perez <victor.perez@codeplay.com>
    victor-eds committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    af448b0 View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    c105848 View commit details
    Browse the repository at this point in the history
  54. [UR][CUDA] Use new variant of the enableCUDATracing function (intel#1…

    …2521)
    
    oneapi-src/unified-runtime#1070 and
    intel#11952 introduced a new variant of the
    `enableCUDATracing` function that takes a context pointer parameter,
    replacing the parameterless variant of that function. The older variant
    will be removed from UR once this PR is merged.
    pasaulais committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    e402523 View commit details
    Browse the repository at this point in the history
  55. [SYCL][CUDA] Improved joint_matrix layout test coverage. (intel#12483)

    Improved joint_matrix layout test coverage.
    
    The test framework that the cuda backend tests use has been updated to
    support all possible `joint_matrix` gemm API combinations, including all
    matrix layouts. the gemm header is backend agnostic; hence all backends
    could use this test framework in the future.
    
    This test framework can also act as an example to show how to deal with
    different layout combinations when computing a general GEMM.
    
    Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
    JackAKirk committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    f9e4f10 View commit details
    Browse the repository at this point in the history
  56. [RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll

    Split out from #78417.
    
    Reviewers: topperc, asb, kito-cheng
    
    Reviewed By: asb
    
    Pull Request: llvm/llvm-project#79248
    wangpc-pp committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    178719e View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    4bdd647 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    b0c60b0 View commit details
    Browse the repository at this point in the history
  59. add support for out of bounds load/store (intel#2277)

    Add support for load/store operations for a cooperative matrix such that original matrix shape is known and implementations are able to reason about how to deal with the out of bounds.
    
    CapabilityCooperativeMatrixCheckedInstructionsINTEL = 6192
    CooperativeMatrixLoadCheckedINTEL = 6193
    CooperativeMatrixStoreCheckedINTEL = 6194
    
    Original commit:
    KhronosGroup/SPIRV-LLVM-Translator@b62cb55
    VyacheslavLevytskyy authored and sys-ce-bb committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    6f35f7c View commit details
    Browse the repository at this point in the history
  60. add API to query error message by an error code (intel#2304)

    The goal of the PR is to add API to SPIR-V LLVM Translator to query error message by an error code as discussed in intel#2298
    
    A need and possible application is a way to generate human-readable error info by error codes returned by other SPIRV Translator API calls, including getSpirvReport().
    
    Original commit:
    KhronosGroup/SPIRV-LLVM-Translator@afe1971
    VyacheslavLevytskyy authored and sys-ce-bb committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    f0ac661 View commit details
    Browse the repository at this point in the history
  61. Support llvm.frexp intrinsic translation (intel#2252)

    Map @llvm.frexp intrinsic to OpenCL Extended Instruction frexp builtin.
    
    The difference in signatures and return values is covered by extracting/combining values from and into composite type.
    
    LLVM IR:
    { float %fract, i32 %exp }  @llvm.frexp.f32.i32(float %val)
    SPIR-V:
    { float %fract } ExtInst frexp (float %val, i32 %exp)
    
    Original commit:
    KhronosGroup/SPIRV-LLVM-Translator@e8b2018
    vmaksimo authored and sys-ce-bb committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    cccbd9e View commit details
    Browse the repository at this point in the history
  62. Fix SPIRVRegularizeLLVMBase::regularize fix for shl i1 and lshr i1 (i…

    …ntel#2288)
    
    The translator failed assertion with V->user_empty() during regularize function when shl i1 or lshr i1 result is used. E.g.
    
    %2 = shl i1 %0 %1
    store %2, ptr addrspace(1) @G.1, align 1
    
    Instruction shl i1 is converted to lshr i32 which arithmetic have the same behavior.
    
    Original commit:
    KhronosGroup/SPIRV-LLVM-Translator@239fbd4
    bwlodarcz authored and sys-ce-bb committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    6732fee View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    805f842 View commit details
    Browse the repository at this point in the history
  64. Updates UR branch

    mfrancepillois committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    96812b9 View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    f589d9b View commit details
    Browse the repository at this point in the history
  66. [SYCL][NFC] Fix some 'startswith/endswith' related to SYCL (intel#12573)

    Replace some deprecated 'startswith' and 'endswith' with 'starts_with'
    and 'ends_with' to clear some warnings when building SYCL compiler.
    
    ---------
    
    Signed-off-by: jinge90 <ge.jin@intel.com>
    jinge90 committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    f7a360d View commit details
    Browse the repository at this point in the history
  67. Revert "Update add-ir-annotations tests after 5518a9d"

    This reverts commit 3d4c6c7.
    
    Due to
    | * 6e6aa44 2024-01-31 Revert
    "[Clang][Sema] fix outline member function template with defau… (#80144)
    ekeane@nvidia.com
    sys-ce-bb committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    21e703a View commit details
    Browse the repository at this point in the history
  68. [Driver] Allow for -O3 on Windows using clang-cl (intel#12504)

    We currently support -O3 for Linux compilations, expand this to also be
    available on Windows. This also better aligns with our existing product
    offerings.
    mdtoguchi committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    0af4ac7 View commit details
    Browse the repository at this point in the history
  69. [SYCL] Fix compiler crash. (intel#12324)

    The compiler was crashing when the user requested fp-accuracy for the
    functions in a call of the form f1(f2(f3 ...), where f1, f2 and f3 were
    fpbuiltin but the innermost function didn't have an fpbuiltin. The
    current builtinID was used instead of getting the builtinID from the
    current function. that created a crash in the compiler.
    This patch fixes the issue and renames the function
    EmitFPBuiltinIndirectCall to MaybeEmitFPBuiltinofFD .
    zahiraam committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    4fdcb58 View commit details
    Browse the repository at this point in the history
  70. [SYCL][HIP][CUDA] Use new version of piMemGetNativeHandle and add test (

    intel#12297)
    
    We want to change the signature of `piMemGetNativeHandle` for reasons
    explained here oneapi-src/unified-runtime#1199
    
    Corresponding UR PR:
    oneapi-src/unified-runtime#1226
    
    A previous PR added a new entry point
    intel#12199 but it was decided that it is
    better to modify the existing entry point
    hdelan committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    8427bd2 View commit details
    Browse the repository at this point in the history
  71. [SYCL][libdevice] Add sqrt with rounding mode supported in sycl::ext:…

    …:intel::math (intel#12571)
    
    Signed-off-by: jinge90 <ge.jin@intel.com>
    jinge90 committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    6c1dde4 View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    0dc97ec View commit details
    Browse the repository at this point in the history
  73. [SYCL][ESIMD] Fix a few issues with scatter(usm, ...) (intel#12585)

    Problems found by Gregory (thanks!):
    
    1) There were some duplicated tests, remove those
    
    2) We didn't test non-LSC mask on Gen12
    
    3) We get an ambiguous call because we had an old function that didn't
    have VS, but the new functions have default VS=1, so we don't need the
    old one.
    
    4) When we pass a simd_view for the vals, we got a template match
    failure. This is the same issue we hit in the compile-time tests where
    even if we have a simd_view overload the compiler can't infer N, so we
    need to provide T,N anyway, so add that in the tests.
    
    I tested this on Gen12.
    
    Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
    sarnex committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    8bfc56f View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. [SYCL] [NATIVECPU] Add OCK subdirectory with EXCLUDE_FROM_ALL (intel#…

    …12579)
    
    Adding `EXCLUDE_FROM_ALL` to the `add_subdirectory` for the OneAPI
    Construction Kit, in order to to avoid building its components unless
    they are required by the SYCL toolchain.
    PietroGhg committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    71eee2c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    262b44a View commit details
    Browse the repository at this point in the history
  3. [SYCL] Disable dynamic_address_cast test on FPGA (intel#12561)

    The FPGA emulator is currently affected by the same issue as the CPU
    runtime.
    
    Signed-off-by: John Pennycook <john.pennycook@intel.com>
    Pennycook committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    9b2e77a View commit details
    Browse the repository at this point in the history
  4. Updates function name

    mfrancepillois committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    46bce9c View commit details
    Browse the repository at this point in the history
  5. [CI] Modify Nightly task to run opencl:cpu testing on different CPUs (i…

    …ntel#12548)
    
    We have flakyness in nightly testing results. Having more variety would
    helpfully provide some insights on conditions when it happens.
    
    The task is only executed once a day, so extra resources needed
    shouldn't affect the load on the runners much.
    aelovikov-intel committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    35f9696 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    faad41d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    30ab2fe View commit details
    Browse the repository at this point in the history