-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicated cmd-list #357
Commits on Feb 13, 2024
-
[mlir][nvgpu] Make
phaseParity
ofmbarrier.try_wait
i1
(#81460)Currently, `phaseParity` argument of `nvgpu.mbarrier.try_wait.parity` is index. This can cause a problem if it's passed any value different than 0 or 1. Because the PTX instruction only accepts even or odd phase. This PR makes phaseParity argument i1 to avoid misuse. Here is the information from PTX doc: ``` The .parity variant of the instructions test for the completion of the phase indicated by the operand phaseParity, which is the integer parity of either the current phase or the immediately preceding phase of the mbarrier object. An even phase has integer parity 0 and an odd phase has integer parity of 1. So the valid values of phaseParity operand are 0 and 1. ``` See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-mbarrier-try-wait
Configuration menu - View commit details
-
Copy full SHA for 0a600c3 - Browse repository at this point
Copy the full SHA 0a600c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4588525 - Browse repository at this point
Copy the full SHA 4588525View commit details -
[clang][dataflow] Add
Environment::initializeFieldsWithValues()
. (#……81239) This function will be useful when we change the behavior of record-type prvalues so that they directly initialize the associated result object. See also the comment here for more details: https://github.com/llvm/llvm-project/blob/9e73656af524a2c592978aec91de67316c5ce69f/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L354 As part of this patch, we document and assert that synthetic fields may not have reference type. There is no practical use case for this: A `StorageLocation` may not have reference type, and a synthetic field of the corresponding non-reference type can serve the same purpose.
Configuration menu - View commit details
-
Copy full SHA for 270f2c5 - Browse repository at this point
Copy the full SHA 270f2c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b01522 - Browse repository at this point
Copy the full SHA 5b01522View commit details -
[HWASAN] Update dbg.assign intrinsics in HWAsan pass (#79864)
llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update the second expression. Fixes #76545. This is #78606 rebased and with the addition of DPValue handling. Note the addition of --try-experimental-debuginfo-iterators in the tests and some shuffling of code in MemoryTaggingSupport.cpp.
Configuration menu - View commit details
-
Copy full SHA for d860ea9 - Browse repository at this point
Copy the full SHA d860ea9View commit details -
[InstCombine] Don't add fcmp instructions to strictfp functions (#81498)
The strictfp attribute has the requirement that "LLVM will not introduce any new floating-point instructions that may trap". The llvm.is.fpclass intrinsic is documented as "The function never raises floating-point exceptions", and the fcmp instruction may raise one, so we can't transform the former into the latter in functions with the strictfp attribute.
Configuration menu - View commit details
-
Copy full SHA for 44706bd - Browse repository at this point
Copy the full SHA 44706bdView commit details -
Revert "[CVP] Check whether the default case is reachable (#79993)" (…
…#81585) This reverts commit a034e65. Some protobuf users reported that this patch caused a significant compile-time regression because `TailDuplicator` works poorly with a specific pattern. We will reland it once the codegen issue is fixed.
Configuration menu - View commit details
-
Copy full SHA for ca61e6a - Browse repository at this point
Copy the full SHA ca61e6aView commit details -
[clang-tidy] ignore local variable with [maybe_unused] attribute in b…
…ugprone-unused-local-non-trivial-variable (#81563)
Configuration menu - View commit details
-
Copy full SHA for ebe77cc - Browse repository at this point
Copy the full SHA ebe77ccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c6e96d - Browse repository at this point
Copy the full SHA 8c6e96dView commit details -
Configuration menu - View commit details
-
Copy full SHA for f506192 - Browse repository at this point
Copy the full SHA f506192View commit details -
[AMDGPU][NFC] Get rid of some operand decoders defined using macros. …
…(#81482) Use templates instead. Part of <llvm/llvm-project#62629>.
Configuration menu - View commit details
-
Copy full SHA for 4c93109 - Browse repository at this point
Copy the full SHA 4c93109View commit details -
[lld] Add target support for SystemZ (s390x) (#75643)
This patch adds full support for linking SystemZ (ELF s390x) object files. Support should be generally complete: - All relocation types are supported. - Full shared library support (DYNAMIC, GOT, PLT, ifunc). - Relaxation of TLS and GOT relocations where appropriate. - Platform-specific test cases. In addition to new platform code and the obvious changes, there were a few additional changes to common code: - Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and R_PLT_GOTREL) needed to support certain s390x relocations. I chose not to use a platform-specific name since nothing in the definition of these relocs is actually platform-specific; it is well possible that other platforms will need the same. - A couple of tweaks to TLS relocation handling, as the particular semantics of the s390x versions differ slightly. See comments in the code. This was tested by building and testing >1500 Fedora packages, with only a handful of failures; as these also have issues when building with LLD on other architectures, they seem unrelated. Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for fe3406e - Browse repository at this point
Copy the full SHA fe3406eView commit details -
[flang][Driver] Add -masm option to flang (#81490)
The motivation here was a suggestion over in Compiler Explorer. You can use `-mllvm` already to do this but since gfortran supports `-masm`, I figured I'd try to add it. This is done by flang expanding `-masm` into `-mllvm x86-asm-syntax=`, then passing that to fc1. Which then collects all the `-mllvm` options and forwards them on. The code to expand it comes from clang `Clang::AddX86TargetArgs` (there are some other places doing the same thing too). However I've removed the `-inline-asm` that clang adds, as fortran doesn't have inline assembly. So `-masm` for flang purely changes the style of assembly output. ``` $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu <...> pushq %rbp $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=att <...> pushq %rbp $ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=intel <...> push rbp ``` The test is adapted from `clang/test/Driver/masm.c` by removing the clang-cl related lines and changing the 32 bit triples to 64 bit triples since flang doesn't support 32 bit targets.
Configuration menu - View commit details
-
Copy full SHA for 9ca1a15 - Browse repository at this point
Copy the full SHA 9ca1a15View commit details -
[dataflow] CXXOperatorCallExpr equal operator might not be a glvalue …
…(#80991) Although in a normal implementation the assumption is reasonable, it seems that some esoteric implementation are not returning a T&. This should be handled correctly and the values be propagated. --------- Co-authored-by: martinboehme <mboehme@google.com>
Configuration menu - View commit details
-
Copy full SHA for a8fb0dc - Browse repository at this point
Copy the full SHA a8fb0dcView commit details -
[mlir][VectorOps] Add conversion of 1-D vector.interleave ops to LLVM…
… (#80966) The 1-D case directly maps to LLVM intrinsics. The n-D case will be handled by unrolling to 1-D first (in a later patch). Depends on: #80965
Configuration menu - View commit details
-
Copy full SHA for 79ce2c9 - Browse repository at this point
Copy the full SHA 79ce2c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for e678e6e - Browse repository at this point
Copy the full SHA e678e6eView commit details -
[ADT] Allow std::next to work on BitVector's set_bits_iterator (#80830)
Without this I would hit errors with libstdc++-12 like: /usr/include/c++/12/bits/stl_iterator_base_funcs.h:230:5: note: candidate template ignored: substitution failure [with _InputIterator = llvm::const_set_bits_iterator_impl<llvm::BitVector>]: argument may not have 'void' type next(_InputIterator __x, typename ^
Configuration menu - View commit details
-
Copy full SHA for 8456e0c - Browse repository at this point
Copy the full SHA 8456e0cView commit details -
[mlir][openmp] - Add the depend clause to omp.target and related offl…
…oading directives (#81081) This patch adds support for the depend clause in a number of OpenMP directives/constructs related to offloading. Specifically, it adds the handling of the depend clause when it is used with the following constructs - target - target enter data - target update data - target exit data
Configuration menu - View commit details
-
Copy full SHA for 55d6643 - Browse repository at this point
Copy the full SHA 55d6643View commit details -
Configuration menu - View commit details
-
Copy full SHA for e79ad7b - Browse repository at this point
Copy the full SHA e79ad7bView commit details -
[RemoveDIs][ValueMapper] Remap DIAssignIDs in DPValues (#81595)
Fix crash raised in comments for 5c9f768
Configuration menu - View commit details
-
Copy full SHA for 97088b2 - Browse repository at this point
Copy the full SHA 97088b2View commit details -
[mlir][linalg] Document ops not supported by the vectoriser (nfc) (#8…
…1500) Adds a test to help document Linalg Ops that are currently not supported by the vectoriser (i.e. the logic to vectorise these is missing). The list is not exhaustive.
Configuration menu - View commit details
-
Copy full SHA for bfc0b7c - Browse repository at this point
Copy the full SHA bfc0b7cView commit details -
[mlir][vector] ND vectors linearization pass (#81159)
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V conversion doesn't handle them at all at the moment. Sometimes it's preferable to treat multidim vectors as linearized 1D. Add pass to do this. Only constants and simple elementwise ops are supported for now. @krzysz00 I've extracted yours result type conversion code from LegalizeToF32 and moved it to common place. Also, add ConversionPattern class operating on traits.
Configuration menu - View commit details
-
Copy full SHA for 35ef399 - Browse repository at this point
Copy the full SHA 35ef399View commit details -
Configuration menu - View commit details
-
Copy full SHA for 990896a - Browse repository at this point
Copy the full SHA 990896aView commit details -
[RISCV] Fix assertion in lowerEXTRACT_SUBVECTOR
This fixes a crash when lowering an extract_subvector like: t0:v1i64 = extract_subvector t1:v2i64, 1 Whilst we never need a vslidedown with M1 on scalable vector types, we might need to do it for v1i64/v1f64, since the smallest container type for it is nxv1i64/nxv1f64. The lowering code is still correct for this case, but the assertion was too strict. The actual invariant we're relying on is that ContainerSubVecVT's LMUL <= M1, not < M1. Hence why we handled v2i32 fine, because its container type was nxv1i32 and MF2.
Configuration menu - View commit details
-
Copy full SHA for 208edf7 - Browse repository at this point
Copy the full SHA 208edf7View commit details -
[clang][Interp] Handle CXXUuidofExprs
Allocate storage and initialize it with the given APValue contents.
Configuration menu - View commit details
-
Copy full SHA for 9b718c0 - Browse repository at this point
Copy the full SHA 9b718c0View commit details -
[SystemZ][z/OS][libcxx] mark aligned allocation tests XFAIL on z/OS (…
…#80735) zOS doesn't support aligned allocation, so mark these testcases as unsupported. Continuation of https://reviews.llvm.org/D102798
Configuration menu - View commit details
-
Copy full SHA for a70077e - Browse repository at this point
Copy the full SHA a70077eView commit details -
[MC/DC] Refactor: Make
MCDCParams
asstd::variant
(#81227)Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make sure them not initialized as zero. FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?
Configuration menu - View commit details
-
Copy full SHA for a17a3e9 - Browse repository at this point
Copy the full SHA a17a3e9View commit details -
[TableGen] Use vectors instead of sets for testing intersection. NFC.…
… (#81602) In a few places we test whether sets (i.e. sorted ranges) intersect by computing the set_intersection and then testing whether it is empty. For this purpose it should be more efficient to use a std:vector instead of a std::set to hold the result of the set_intersection, since insertion is simpler.
Configuration menu - View commit details
-
Copy full SHA for 880afa1 - Browse repository at this point
Copy the full SHA 880afa1View commit details -
[clang][Interp] Handle Requires- and ConceptSpecializationExprs
Just emit their satisfaction state, which is what the current interpreter does as well.
Configuration menu - View commit details
-
Copy full SHA for bb60c06 - Browse repository at this point
Copy the full SHA bb60c06View commit details -
[OpenACC] Implement AST for OpenACC Compute Constructs (#81188)
'serial', 'parallel', and 'kernel' constructs are all considered 'Compute' constructs. This patch creates the AST type, plus the required infrastructure for such a type, plus some base types that will be useful in the future for breaking this up. The only difference between the three is the 'kind'( plus some minor clause legalization rules, but those can be differentiated easily enough), so rather than representing them as separate AST nodes, it seems to make sense to make them the same. Additionally, no clause AST functionality is being implemented yet, as that fits better in a separate patch, and this is enough to get the 'naked' constructs implemented. This is otherwise an 'NFC' patch, as it doesn't alter execution at all, so there aren't any tests. I did this to break up the review workload and to get feedback on the layout.
Configuration menu - View commit details
-
Copy full SHA for f655778 - Browse repository at this point
Copy the full SHA f655778View commit details -
Configuration menu - View commit details
-
Copy full SHA for af56bea - Browse repository at this point
Copy the full SHA af56beaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 742ec3a - Browse repository at this point
Copy the full SHA 742ec3aView commit details -
[Object][COFF][NFC] Make writeImportLibrary NativeExports argument op…
…tional. (#81600) It's not interesting for majority of downstream users.
Configuration menu - View commit details
-
Copy full SHA for 4612208 - Browse repository at this point
Copy the full SHA 4612208View commit details -
Reapply "[DebugInfo][RemoveDIs] Turn on non-instrinsic debug-info by …
…default" This reapplies commit bdde5f9 by undoing the revert bc66e0c. The previous reapplication 5c9f768 was reverted due to a crash (reproducer in comments for 5c9f768) which was fixed in #81595. As noted in the original commit, this commit may break downstream tests. If this commit is breaking your downstream tests, please see comment 12 in [0], which documents the kind of variation in tests we'd expect to see from this change and what to do about it. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Configuration menu - View commit details
-
Copy full SHA for d759618 - Browse repository at this point
Copy the full SHA d759618View commit details -
[TableGen] Use std::move instead of swap. NFC. (#81606)
Historically TableGen has used `A.swap(B)` to move containers without the expense of copying them. Perhaps this predated rvalue references. In any case `A = std::move(B)` seems like a more direct way to implement this when only A is required after the operation.
Configuration menu - View commit details
-
Copy full SHA for f7cddf8 - Browse repository at this point
Copy the full SHA f7cddf8View commit details -
Fix warning by removing unused variable (#81604)
Apparently, some compilers [correctly] warn that the variable that was created prior to this change is unused. This reemoves the variable.
Configuration menu - View commit details
-
Copy full SHA for d1f510c - Browse repository at this point
Copy the full SHA d1f510cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e5e51e - Browse repository at this point
Copy the full SHA 5e5e51eView commit details -
[GitHub][workflows] Ask reviewers to merge PRs when author cannot (#8…
…1142) This uses https://pygithub.readthedocs.io/en/stable/github_objects/Repository.html?highlight=get_collaborator_permission#github.Repository.Repository.get_collaborator_permission. Which does https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user and returns the top level "permission" key. This is less detailed than the user/permissions key but should be fine for this use case. When a review is submitted we check: * If it's an approval. * Whether we have already left a merge on behalf comment (by looking for a hidden HTML comment). * Whether the author has permissions to merge their own PR. * Whether the reviewer has permissions to merge. If needed we leave a comment tagging the reviewer. If the reviewer also doesn't have merge permission, then it asks them to find someone else who does.
Configuration menu - View commit details
-
Copy full SHA for 38c706e - Browse repository at this point
Copy the full SHA 38c706eView commit details -
[ARM] __ARM_ARCH macro definition fix (#81493)
This patch changes how the macro __ARM_ARCH is defined to match its defintion in the ACLE. In ACLE 5.4.1, __ARM_ARCH is defined as equal to the major architecture version for ISAs up to and including v8. From v8.1 onwards, its definition is changed to include minor versions, such that for an architecture vX.Y, __ARM_ARCH = X*100 + Y. Before this patch, LLVM defined __ARM_ARCH using only the major architecture version for all architecture versions. This patch adds functionality to define __ARM_ARCH correctly for architectures greater than or equal to v8.1.
Configuration menu - View commit details
-
Copy full SHA for 89c1bf1 - Browse repository at this point
Copy the full SHA 89c1bf1View commit details -
[DAGCombine] Fix multi-use miscompile in load combine (#81586)
The load combine replaces a number of original loads with one new loads and also replaces the output chains of the original loads with the output chain of the new load. This is incorrect if the original load is retained (due to multi-use), as it may get incorrectly reordered. Fix this by using makeEquivalentMemoryOrdering() instead, which will create a TokenFactor with both chains. Fixes llvm/llvm-project#80911.
Configuration menu - View commit details
-
Copy full SHA for 25b9ed6 - Browse repository at this point
Copy the full SHA 25b9ed6View commit details -
ci: Temporarily disable the buildkite job on Windows (#81538)
The failure rate is too high. See https://discourse.llvm.org/t/rfc-future-of-windows-pre-commit-ci/76840
Configuration menu - View commit details
-
Copy full SHA for 4ad9f5b - Browse repository at this point
Copy the full SHA 4ad9f5bView commit details -
[SLP] Add X86 version of non-power-of-2 vectorization tests.
Extra X86 tests for llvm/llvm-project#77790.
Configuration menu - View commit details
-
Copy full SHA for 192c23b - Browse repository at this point
Copy the full SHA 192c23bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 485ebbf - Browse repository at this point
Copy the full SHA 485ebbfView commit details -
[NFC][LLVM][AsmWriter] Extract logic to write out ConstantFP from Wri…
…teConstantInternal. This makes is easier to extend the code to support vector types.
Configuration menu - View commit details
-
Copy full SHA for 4f13f35 - Browse repository at this point
Copy the full SHA 4f13f35View commit details -
[Flang] Add __powerpc__ macro to set c_intmax_t to c_int64_t rather t…
…han c_int128_t as PowerPC only supports up to c_int64_t. (#81222) PowerPC only supports up to `c_int64_t`. Add macro `__powerpc__` and preprocess it for setting `c_intmax_t` in `iso_c_binding` intrinsic module.
Configuration menu - View commit details
-
Copy full SHA for 987258f - Browse repository at this point
Copy the full SHA 987258fView commit details -
[clang][Driver][HLSL] Fix formatting of clang-dxc options group title
Some extra `<>` and a missing full stop.
Configuration menu - View commit details
-
Copy full SHA for 381a00d - Browse repository at this point
Copy the full SHA 381a00dView commit details -
[LLVM] Add
__builtin_readsteadycounter
intrinsic and builtin for re……altime clocks (#81331) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.
Configuration menu - View commit details
-
Copy full SHA for 11fcae6 - Browse repository at this point
Copy the full SHA 11fcae6View commit details -
[TableGen] Do not speculatively grow RegUnitSets. NFC.
This seems to be a trick to avoid copying a RegUnitSet, but it can be done more simply using std::move.
Configuration menu - View commit details
-
Copy full SHA for 1f90af1 - Browse repository at this point
Copy the full SHA 1f90af1View commit details -
[DirectX][NFC] Change specification of overload types and attribute i…
…n DXIL.td (#81184) - Specify overload types of DXIL Operation as list of types instead of a string. - Add supported DXIL type record definitions to `DXIL.td` leveraging `LLVMType` to avoid duplicate definitions. - Spell out DXIL Operation Attribute specification string. - Make corresponding changes to process the records in DXILEmitter.cpp
Configuration menu - View commit details
-
Copy full SHA for 8ba4ff3 - Browse repository at this point
Copy the full SHA 8ba4ff3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d84792 - Browse repository at this point
Copy the full SHA 1d84792View commit details -
Merge from 'sycl' to 'sycl-web'
iclsrc committedFeb 13, 2024 Configuration menu - View commit details
-
Copy full SHA for a23c262 - Browse repository at this point
Copy the full SHA a23c262View commit details -
[lldb-dap][NFC] Add Breakpoint struct to share common logic. (#80753)
This adds a layer between `SounceBreakpoint`/`FunctionBreakpoint` and `BreakpointBase` to have better separation and encapsulation so we are not directly operating on `SBBreakpoint`. I basically moved the `SBBreakpoint` and the methods that requires it from `BreakpointBase` to `Breakpoint`. This allows adding support for data watchpoint easier by sharing the logic inside `BreakpointBase`.
Configuration menu - View commit details
-
Copy full SHA for d58c128 - Browse repository at this point
Copy the full SHA d58c128View commit details -
[clang][docs] Fix warning in LanguageExtensions
build-llvm/tools/clang/docs/LanguageExtensions.rst:2768: WARNING: Title underline too short.
Configuration menu - View commit details
-
Copy full SHA for 7a5c1a4 - Browse repository at this point
Copy the full SHA 7a5c1a4View commit details -
[mlir][nfc] Add tests for linalg.mmt4d (#81422)
linalg.mmt4d was added a while back (https://reviews.llvm.org/D105244), but there are virtually no tests in-tree. In the spirit of documenting through test, this PR adds a few basic examples.
Configuration menu - View commit details
-
Copy full SHA for 7a47113 - Browse repository at this point
Copy the full SHA 7a47113View commit details -
[libc] Rework the RPC interface to accept runtime wave sizes (#80914)
Summary: The RPC interface needs to handle an entire warp or wavefront at once. This is currently done by using a compile time constant indicating the size of the buffer, which right now defaults to some value on the client (GPU) side. However, there are currently attempts to move the `libc` library to a single IR build. This is problematic as the size of the wave fronts changes between ISAs on AMDGPU. The builitin `__builtin_amdgcn_wavefrontsize()` will return the appropriate value, but it is only known at runtime now. In order to support this, this patch restructures the packet. Now instead of having an array of arrays, we simply have a large array of buffers and slice it according to the runtime value if we don't know it ahead of time. This also somewhat has the advantage of making the buffer contiguous within a page now that the header has been moved out of it.
Configuration menu - View commit details
-
Copy full SHA for f879ac0 - Browse repository at this point
Copy the full SHA f879ac0View commit details -
[flang][cuda] Lower launch_bounds values (#81537)
This PR adds a new attribute to carry over the information from `launch_bounds`. The new attribute `CUDALaunchBoundsAttr` holds 2 to 3 integer attrinbutes and is added to `func.func` operation.
Configuration menu - View commit details
-
Copy full SHA for d79c3c5 - Browse repository at this point
Copy the full SHA d79c3c5View commit details -
[libc] Round up time for GPU nanosleep implementation (#81630)
Summary: The GPU `nanosleep` tests would occasionally fail. This was due to the fact that we used integer division to determine how many ticks we had to sleep for. This would then truncate, leaving us with a value just slightly below the requested value. This would then occasionally leave us with a return value of `-1`. This patch just changes the code to round up by 1 so we always sleep for at least the requested value.
Configuration menu - View commit details
-
Copy full SHA for 1dacfd1 - Browse repository at this point
Copy the full SHA 1dacfd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for e847abc - Browse repository at this point
Copy the full SHA e847abcView commit details -
Configuration menu - View commit details
-
Copy full SHA for a7cebad - Browse repository at this point
Copy the full SHA a7cebadView commit details -
[IRGen][AArch64][RISCV] Generalize bitcast between i1 predicate vecto…
…r and i8 fixed vector. (#76548) Instead of only handling vscale x 16 x i1 predicate vectors, handle any scalable i1 vector where the known minimum is divisible by 8. This is used on RISC-V where we have multiple sizes of predicate types.
Configuration menu - View commit details
-
Copy full SHA for 9be7b0a - Browse repository at this point
Copy the full SHA 9be7b0aView commit details -
[clang] Remove #undef alloca workaround (#81534)
Added in 26670dc to workaround intel#4885. Windows CI and a local Windows build are happy with this change, so it seems like this has been properly fixed at some point. If this does break somebody, this can be easily reverted. (Also, Linux does the same `#define alloca` in system headers, so I'm not sure why it'd be different on Windows) This is tech debt that caused breakages, see comments on #71709.
Configuration menu - View commit details
-
Copy full SHA for 742a06f - Browse repository at this point
Copy the full SHA 742a06fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9838c85 - Browse repository at this point
Copy the full SHA 9838c85View commit details -
[RISCV] Enable the TypePromotion pass from AArch64/ARM.
This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.
Configuration menu - View commit details
-
Copy full SHA for 7d40ea8 - Browse repository at this point
Copy the full SHA 7d40ea8View commit details -
[flang][cuda] Lower cluster_dims values (#81636)
This PR adds a new attribute to carry over the information from `cluster_dims`. The new attribute `CUDAClusterDimsAttr` holds 3 integer attributes and is added to `func.func` operation.
Configuration menu - View commit details
-
Copy full SHA for 5e3c7e3 - Browse repository at this point
Copy the full SHA 5e3c7e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 502a88b - Browse repository at this point
Copy the full SHA 502a88bView commit details -
[libc] Remove remaining GPU architecture dependent instructions (#81612)
Summary: Recent patches have added solutions to the remaining sources of divergence. This patch simply removes the last occures of things like `has_builtin`, `ifdef` or builtins with feature requirements. The one exception here is `nanosleep`, but I made changes in the `__nvvm_reflect` pass to make usage like this actually work at O0. Depends on llvm/llvm-project#81331
Configuration menu - View commit details
-
Copy full SHA for 63198e0 - Browse repository at this point
Copy the full SHA 63198e0View commit details -
Merge from 'main' to 'sycl-web' (110 commits)
CONFLICT (content): Merge conflict in clang/include/clang/Serialization/ASTBitCodes.h
Configuration menu - View commit details
-
Copy full SHA for 6eae6b9 - Browse repository at this point
Copy the full SHA 6eae6b9View commit details -
[mlir][ROCDL] Add synchronization primitives (#80888)
This PR adds two LLVM intrinsics to MLIR: - llvm.amdgcn.s.setprio which sets the priority of a wave for the GPU scheduler - llvm.amdgcn.sched.barrier which sets a software barrier so that the scheduler cannot move instructions around
Configuration menu - View commit details
-
Copy full SHA for 16140ff - Browse repository at this point
Copy the full SHA 16140ffView commit details -
[libc] Remove leftover target dependent intrinsic
Summary: I forgot to remove these because I thought I did it already. This caused the build to fail when actually linked.
Configuration menu - View commit details
-
Copy full SHA for c830c12 - Browse repository at this point
Copy the full SHA c830c12View commit details -
[NFC][InstrProf]Factor out getCanonicalName to compute the canonical …
…name given a pgo name. (#81547) - Also update the `InstrProf::addFuncWithName` to call the newly added `getCanonicalName`.
Configuration menu - View commit details
-
Copy full SHA for 2422e96 - Browse repository at this point
Copy the full SHA 2422e96View commit details -
[InstCombine] Extend
(lshr/shl (shl/lshr -1, x), x)
-> `(lshr/shl -……1, x)` for multi-use We previously did this iff the inner `(shl/lshr -1, x)` was one-use. No instructions are added even if the inner `(shl/lshr -1, x)` is multi-use and this canonicalization both makes the resulting instruction easier to analyze and shrinks its dependency chain. Closes #81576
Configuration menu - View commit details
-
Copy full SHA for 79ce933 - Browse repository at this point
Copy the full SHA 79ce933View commit details -
Revert "[clang] Remove #undef alloca workaround" (#81649)
Reverts llvm/llvm-project#81534 llvm/llvm-project#81534 breaks building (Fuchsia) Clang toolchain on Windows. Log: https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8756186536543250705/+/u/clang/install/stdout Builder: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8756186536543250705/overview ``` FAILED: tools/clang/tools/extra/clang-include-fixer/tool/CMakeFiles/clang-include-fixer.dir/ClangIncludeFixer.cpp.obj C:\b\s\w\ir\x\w\cipd\bin\clang-cl.exe /nologo -TP -DCLANG_REPOSITORY_STRING=\"https://llvm.googlesource.com/llvm-project\" -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\b\s\w\ir\x\w\llvm_build\tools\clang\tools\extra\clang-include-fixer\tool -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang\include -IC:\b\s\w\ir\x\w\llvm_build\tools\clang\include -IC:\b\s\w\ir\x\w\recipe_cleanup\tensorflow-venv\store\python_venv-q9i5kpsp0iun0ktmqgab125ti8\contents\Lib\site-packages\tensorflow\include -IC:\b\s\w\ir\x\w\llvm_build\include -IC:\b\s\w\ir\x\w\llvm-llvm-project\llvm\include -IC:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\.. -imsvcC:\b\s\w\ir\x\w\zlib_install_target\include -imsvcC:\b\s\w\ir\x\w\zstd_install\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:__cplusplus /Oi /Brepro /bigobj /permissive- /W4 -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported /Gw -no-canonical-prefixes /O2 /Ob2 -std:c++17 -MT /EHs-c- /GR- -UNDEBUG /showIncludes /Fotools\clang\tools\extra\clang-include-fixer\tool\CMakeFiles\clang-include-fixer.dir\ClangIncludeFixer.cpp.obj /Fdtools\clang\tools\extra\clang-include-fixer\tool\CMakeFiles\clang-include-fixer.dir\ -c -- C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\ClangIncludeFixer.cpp In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\ClangIncludeFixer.cpp:11: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang-tools-extra\clang-include-fixer\tool\..\IncludeFixer.h:15: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Sema/ExternalSemaSource.h:15: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/ExternalASTSource.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/DeclBase.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/AST/DeclarationName.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/IdentifierTable.h:18: In file included from C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h:63: C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(151,1): error: redefinition of enumerator 'BI_alloca' 151 | LANGBUILTIN(_alloca, "v*z", "n", ALL_MS_LANGUAGES) | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(15,54): note: expanded from macro 'LANGBUILTIN' 15 | # define LANGBUILTIN(ID, TYPE, ATTRS, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS) | ^ C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h(62,34): note: expanded from macro 'BUILTIN' 62 | #define BUILTIN(ID, TYPE, ATTRS) BI##ID, | ^ <scratch space>(72,1): note: expanded from here 72 | BI_alloca | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(150,1): note: previous definition is here 150 | LIBBUILTIN(alloca, "v*z", "fn", STDLIB_H, ALL_GNU_LANGUAGES) | ^ C:\b\s\w\ir\x\w\llvm_build\tools\clang\include\clang/Basic/Builtins.inc(11,61): note: expanded from macro 'LIBBUILTIN' 11 | # define LIBBUILTIN(ID, TYPE, ATTRS, HEADER, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS) | ^ C:\b\s\w\ir\x\w\llvm-llvm-project\clang\include\clang/Basic/Builtins.h(62,34): note: expanded from macro 'BUILTIN' 62 | #define BUILTIN(ID, TYPE, ATTRS) BI##ID, | ^ <scratch space>(71,1): note: expanded from here 71 | BI_alloca | ^ ```
Configuration menu - View commit details
-
Copy full SHA for f79f58d - Browse repository at this point
Copy the full SHA f79f58dView commit details -
[StatepointLowering] Use Constant instead of TargetConstant for undef…
… value (#81635) Prevents isel errors when trying to lower gc relocate of undef value (which turns into CopyToReg of TargetConstant). Such relocates may occur after DCE (e.g. after GVN removes some dead blocks) if there are not passes like instcombine scheduled after to clean them up. Fixes #80294 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for e20462a - Browse repository at this point
Copy the full SHA e20462aView commit details -
InstCombine: Enable SimplifyDemandedUseFPClass and remove flag (#81108)
This completes the unrevert of ef38833.
Configuration menu - View commit details
-
Copy full SHA for 9dd2c59 - Browse repository at this point
Copy the full SHA 9dd2c59View commit details -
[libc++][modules] Re-add build dir CMakeLists.txt. (#81370)
This CMakeLists.txt is used to build modules without build system support. This was removed in d06ae33. This is used in the documentation how to use modules. Made some minor changes to make it work with the std.compat module using the std module. Note the CMakeLists.txt in the build dir should be removed once build system support is generally available.
Configuration menu - View commit details
-
Copy full SHA for fc0e9c8 - Browse repository at this point
Copy the full SHA fc0e9c8View commit details -
Don't count all the frames just to skip the current inlined ones. (#8…
…0918) The algorithm to find the DW_OP_entry_value requires you to find the nearest non-inlined frame. It did that by counting the number of stack frames so that it could use that as a loop stopper. That is unnecessary and inefficient. Unnecessary because GetFrameAtIndex will return a null frame when you step past the oldest frame, so you already have the "got to the end" signal without counting all the stack frames. And counting all the stack frames can be expensive.
Configuration menu - View commit details
-
Copy full SHA for a04c636 - Browse repository at this point
Copy the full SHA a04c636View commit details -
Add the ability to define a Python based command that uses CommandObj…
…ectParsed (#70734) This allows you to specify options and arguments and their definitions and then have lldb handle the completions, help, etc. in the same way that lldb does for its parsed commands internally. This feature has some design considerations as well as the code, so I've also set up an RFC, but I did this one first and will put the RFC address in here once I've pushed it... Note, the lldb "ParsedCommand interface" doesn't actually do all the work that it should. For instance, saying the type of an option that has a completer doesn't automatically hook up the completer, and ditto for argument values. We also do almost no work to verify that the arguments match their definition, or do auto-completion for them. This patch allows you to make a command that's bug-for-bug compatible with built-in ones, but I didn't want to stall it on getting the auto-command checking to work all the way correctly. As an overall design note, my primary goal here was to make an interface that worked well in the script language. For that I needed, for instance, to have a property-based way to get all the option values that were specified. It was much more convenient to do that by making a fairly bare-bones C interface to define the options and arguments of a command, and set their values, and then wrap that in a Python class (installed along with the other bits of the lldb python module) which you can then derive from to make your new command. This approach will also make it easier to experiment. See the file test_commands.py in the test case for examples of how this works.
Configuration menu - View commit details
-
Copy full SHA for a69ecb2 - Browse repository at this point
Copy the full SHA a69ecb2View commit details -
[mlir][flang][openmp] Rework wsloop reduction operations (#80019)
This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Configuration menu - View commit details
-
Copy full SHA for be9f8ff - Browse repository at this point
Copy the full SHA be9f8ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3985eda - Browse repository at this point
Copy the full SHA 3985edaView commit details -
[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate c…
…onstants (#73056) In this case, a trivial GEP chain has the form: ``` %ptr = getelementptr sameType, %base, constant %val = getelementptr sameType, %ptr, %variable ``` That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain. Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).
Configuration menu - View commit details
-
Copy full SHA for 1b65742 - Browse repository at this point
Copy the full SHA 1b65742View commit details -
[Clang][Sema] Diagnose friend declarations with enum elaborated-type-…
…specifier in all language modes (#80171) According to [dcl.type.elab] p4: > If an _elaborated-type-specifier_ appears with the `friend` specifier as an entire _member-declaration_, the _member-declaration_ shall have one of the following forms: > `friend` _class-key_ _nested-name-specifier_(opt) _identifier_ `;` > `friend` _class-key_ _simple-template-id_ `;` > `friend` _class-key_ _nested-name-specifier_ `template`(opt) _simple-template-id_ `;` Notably absent from this list is the `enum` form of an _elaborated-type-specifier_ "`enum` _nested-name-specifier_(opt) _identifier_", which appears to be intentional per the resolution of CWG2363. Most major implementations accept these declarations, so the diagnostic is a pedantic warning across all C++ versions. In addition to the trivial cases previously diagnosed in C++98, we now diagnose cases where the _elaborated-type-specifier_ has a dependent _nested-name-specifier_: ``` template<typename T> struct A { enum class E; }; struct B { template<typename T> friend enum A<T>::E; // pedantic warning: elaborated enumeration type cannot be a friend }; template<typename T> struct C { friend enum T::E; // pedantic warning: elaborated enumeration type cannot be a friend }; ```
Configuration menu - View commit details
-
Copy full SHA for 3a48630 - Browse repository at this point
Copy the full SHA 3a48630View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2772692 - Browse repository at this point
Copy the full SHA 2772692View commit details -
This patch fixes: mlir/lib/Target/LLVMIR/AttrKindDetail.h:65:1: error: unused function 'getAttrNameToKindMapping' [-Werror,-Wunused-function]
Configuration menu - View commit details
-
Copy full SHA for f5cc961 - Browse repository at this point
Copy the full SHA f5cc961View commit details -
[SeparateConstOffsetFromGEP] Fix test after 1b65742
Change-Id: I7ced7774c80997d21969ab7886fc30c0c1e1cc81
Configuration menu - View commit details
-
Copy full SHA for ec0aa16 - Browse repository at this point
Copy the full SHA ec0aa16View commit details -
Merge from 'sycl' to 'sycl-web' (4 commits)
iclsrc committedFeb 13, 2024 Configuration menu - View commit details
-
Copy full SHA for ca8cb53 - Browse repository at this point
Copy the full SHA ca8cb53View commit details -
[OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two me…
…mbers swapped for big-endian (#79188) The direct lock data structure has bit `0` (the least significant bit) of the first 32-bit word set to `1` to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit `0` of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (`low word`, `high word`). However, this causes problems for big-endian where the memory layout of a 64-bit address is (`high word`, `low word`). If an address of the indirect lock table entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it is treated as a direct lock. This patch defines `struct kmp_base_tas_lock` with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member `poll`, the second word (the low word) is used. This patch also changes places where `poll` is not already explicitly specified for checking/setting tags.
Configuration menu - View commit details
-
Copy full SHA for ac97562 - Browse repository at this point
Copy the full SHA ac97562View commit details -
[Sparc] limit MaxAtomicSizeInBitsSupported to 32 for 32-bit Sparc. (#…
…81655) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR #73176.
Configuration menu - View commit details
-
Copy full SHA for c1a99b2 - Browse repository at this point
Copy the full SHA c1a99b2View commit details -
[TypePromotion] Remove an unreachable 'return false'. NFC
The if and the else above this both return so this is unreachable. Delete it and remove the else after return.
Configuration menu - View commit details
-
Copy full SHA for d0a1bf8 - Browse repository at this point
Copy the full SHA d0a1bf8View commit details -
[libc] Allow BigInt class to use base word types other than uint64_t.…
… (#81634) This will allow DyadicFloat class to replace NormalFloat class.
Configuration menu - View commit details
-
Copy full SHA for 4e00551 - Browse repository at this point
Copy the full SHA 4e00551View commit details -
Temporarily disable the TestAddParsedCommand.py while I figure out
why it's crashing on the x86_64 Debian Linux worker.
Configuration menu - View commit details
-
Copy full SHA for f0b271e - Browse repository at this point
Copy the full SHA f0b271eView commit details -
[mlir][sparse] add assemble test for Batched-CSR and CSR-Dense (#81660)
These are formats supported by PyTorch sparse, so good to make sure that our assemble instructions work on these.
Configuration menu - View commit details
-
Copy full SHA for 2400f70 - Browse repository at this point
Copy the full SHA 2400f70View commit details -
[DWARFDump] Make --verify handle all sections by default (#81559)
The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue llvm/llvm-project#81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".
Configuration menu - View commit details
-
Copy full SHA for 5296149 - Browse repository at this point
Copy the full SHA 5296149View commit details -
[lldb][DWARFIndex] Use IDX_parent to implement GetFullyQualifiedType …
…query (#79932) This commit changes DebugNamesDWARFIndex so that it now overrides `GetFullyQualifiedType` and attempts to use DW_IDX_parent, when available, to speed up such queries. When this type of information is not available, the base-class implementation is used. With this commit, we now achieve the 4x speedups reported in [1]. [1]: https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/38
Configuration menu - View commit details
-
Copy full SHA for 91f4a84 - Browse repository at this point
Copy the full SHA 91f4a84View commit details -
[DebugInfo][RemoveDIs] Convert back to intrinsic form for ThinLTO
As explained on discourse [0] (comment 12), to get the non-intrinsic form of debug-info records enabled and testing, we're only using it inside of the pass manager in LLVM right now. Things like the textual IR writer and bitcode writing _passes_ are instrumented to convert back to intrinsic-form when writing a module out, but it turns out we missed the ThinLTO bitcode writing pass. That causes uh, all variable location debug-info to be dropped in ThinLTO mode (oops). This patch adds that conversion; it should be low risk as it's identical to what happens in all the other passes. However should this commit turn out to cause trouble, please instead revert d759618 or whichever is the most recent commit to set UseNewDbgInfoFormat to default to true. That'll revert LLVM back to the definitely-correct behaviour. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Configuration menu - View commit details
-
Copy full SHA for fa77e1f - Browse repository at this point
Copy the full SHA fa77e1fView commit details -
Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to se…
Configuration menu - View commit details
-
Copy full SHA for 99c5a66 - Browse repository at this point
Copy the full SHA 99c5a66View commit details -
[lldb-dap] Add support for data breakpoint. (#81541)
This implements functionality to handle `DataBreakpointInfo` request and `SetDataBreakpoints` request. If variablesReference is 0 or not provided, interpret name as ${number of bytes}@${expression} to set data breakpoint at the given expression because the spec https://microsoft.github.io/debug-adapter-protocol/specification#Requests_DataBreakpointInfo doesn't say how the client could specify the number of bytes to watch. This is based on top of llvm/llvm-project#80753.
Configuration menu - View commit details
-
Copy full SHA for 8c56e78 - Browse repository at this point
Copy the full SHA 8c56e78View commit details -
Merge from 'main' to 'sycl-web' (32 commits)
CONFLICT (content): Merge conflict in llvm/test/CodeGen/RISCV/O3-pipeline.ll Also revert 5c9f768. See:KhronosGroup/SPIRV-LLVM-Translator#2357 intel#12698
Configuration menu - View commit details
-
Copy full SHA for 735e88e - Browse repository at this point
Copy the full SHA 735e88eView commit details -
[WebAssembly] Demote PHIs in catchswitch BB only (#81570)
`DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in llvm/llvm-project@99d60e0, because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because `catchswitch`s are removed in ISel and `catchswitch` BBs are removed as well, so they can't have other instructions. But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or `cleanuppad` BBs don't need to be demoted. That was the reason `DemoteCatchSwitchPHIOnly` option was added, in order not to demote more instructions unnecessarily. The problem is it should have been set to `true` for Wasm EH. (Its default value is `false` for WinEH) And I mistakenly set it to `false` and wasn't aware about this for more than 5 years. This was not the end of the world; it just means we've been demoting more instructions than we should, possibly huting code size. In practice I think it would've had hardly any effect in real performance given that the occurrence of PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many people run other optimizers like Binaryen anyway.
Configuration menu - View commit details
-
Copy full SHA for 473ef10 - Browse repository at this point
Copy the full SHA 473ef10View commit details -
Revert "Reapply "[DebugInfo][RemoveDIs] Turn on non-instrinsic debug-…
…info by default"" This reverts commit d759618. Causes crashes, see comments in llvm/llvm-project@d759618.
Configuration menu - View commit details
-
Copy full SHA for fd3a0c1 - Browse repository at this point
Copy the full SHA fd3a0c1View commit details -
[libc][stdfix] Generate stdfix.h header with fixed point precision ma…
…cros according to ISO/IEC TR 18037:2008 standard, and add fixed point type support detection. (#81255) Fixed point extension standard: https://standards.iso.org/ittf/PubliclyAvailableStandards/c051126_ISO_IEC_TR_18037_2008.zip
Configuration menu - View commit details
-
Copy full SHA for 84277fe - Browse repository at this point
Copy the full SHA 84277feView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f87bfe - Browse repository at this point
Copy the full SHA 9f87bfeView commit details -
Configuration menu - View commit details
-
Copy full SHA for c92bf6b - Browse repository at this point
Copy the full SHA c92bf6bView commit details -
[lldb][test] Switch LLDB API tests from vendored unittest2 to unittes…
…t (#79945) This removes the dependency LLDB API tests have on lldb/third_party/Python/module/unittest2, and instead uses the standard one provided by Python. This does not actually remove the vendored dep yet, nor update the docs. I'll do both those once this sticks. Non-trivial changes to call out: - expected failures (i.e. "bugnumber") don't have a reason anymore, so those params were removed - `assertItemsEqual` is now called `assertCountEqual` - When a test is marked xfail, our copy of unittest2 considers failures during teardown to be OK, but modern unittest does not. See TestThreadLocal.py. (Very likely could be a real bug/leak). - Our copy of unittest2 was patched to print all test results, even ones that don't happen, e.g. `(5 passes, 0 failures, 1 errors, 0 skipped, ...)`, but standard unittest prints a terser message that omits test result types that didn't happen, e.g. `OK (skipped=1)`. Our lit integration parses this stderr and needs to be updated w/ that expectation. I tested this w/ `ninja check-lldb-api` on Linux. There's a good chance non-Linux tests have similar quirks, but I'm not able to uncover those.
Configuration menu - View commit details
-
Copy full SHA for 5b38615 - Browse repository at this point
Copy the full SHA 5b38615View commit details -
[flang] Register LLVMTranslationDialectInterface for FIR. (#81668)
Register the LLVM IR translation interface for FIR to avoid warnings about "Unhandled parameter attribute" after #78228.
Configuration menu - View commit details
-
Copy full SHA for 137bd78 - Browse repository at this point
Copy the full SHA 137bd78View commit details -
[-Wunsafe-buffer-usage] Emit fixits for array decayed to pointer (#80…
…347) Covers cases where DeclRefExpr referring to a const-size array decays to a pointer and is used "as a pointer" (e. g. passed to a pointer type parameter). Since std::array<T, N> doesn't implicitly convert to pointer to its element type T* the cast needs to be done explicitly as part of the fixit when we retrofit std::array to code that previously worked with constant size array. std::array::data() method is used for the explicit cast. In terms of the fixit machine this covers the UPC(DRE) case for Array fixit strategy. The emitted fixit inserts call to std::array::data() method similarly to analogous fixit for Span strategy.
Configuration menu - View commit details
-
Copy full SHA for e06f352 - Browse repository at this point
Copy the full SHA e06f352View commit details -
[attributes][analyzer] Generalize [[clang::suppress]] to declarations…
…. (#80371) The attribute is now allowed on an assortment of declarations, to suppress warnings related to declarations themselves, or all warnings in the lexical scope of the declaration. I don't necessarily see a reason to have a list at all, but it does look as if some of those more niche items aren't properly supported by the compiler itself so let's maintain a short safe list for now. The initial implementation raised a question whether the attribute should apply to lexical declaration context vs. "actual" declaration context. I'm using "lexical" here because it results in less warnings suppressed, which is the conservative behavior: we can always expand it later if we think this is wrong, without breaking any existing code. I also think that this is the correct behavior that we will probably never want to change, given that the user typically desires to keep the suppressions as localized as possible.
Configuration menu - View commit details
-
Copy full SHA for 017675f - Browse repository at this point
Copy the full SHA 017675fView commit details -
[RISCV] Register fixed stack slots for callee saved registers for -ms…
…ave-restore/Zcmp (#81392) PEI previously used fake frame indices for these callee saved registers. These fake frame indices are not register with MachineFrameInfo. This required them to be deleted form CalleeSavedInfo after PEI to avoid breaking later passes. See #79535 Unfortunately, removing the registers from CalleeSavedInfo pessimizes Interprocedural Register Allocation. The RegUsageInfoCollector pass runs after PEI and uses CalleeSavedInfo. This patch replaces #79535 by properly creating fixed stack objects through MachineFrameInfo. This changes the stack size and offsets returned by MachineFrameInfo which requires changes to how RISCVFrameLowering uses that information. In addition to the individual object for each register, I've also create a single large fixed object that covers the entire stack area covered by cm.push or the libcalls. cm.push must always push a multiple of 16 bytes and the save restore libcall pushes a multiple of stack align. I think this leaves holes in the stack where we could spill other registers, but it matches what we did previously. Maybe we can optimize this in the future. The only test changes are due to stack alignment handling after the callee save registers. Since we now have the fixed objects, on the stack the offset is non-zero when an aligned object is processed so the offset gets rounded up, increasing the stack size. I suspect we might need some more updates for RVV related code. There is very little or maybe even no testing of RVV mixed with Zcmp and save-restore.
Configuration menu - View commit details
-
Copy full SHA for 0de2b26 - Browse repository at this point
Copy the full SHA 0de2b26View commit details -
[InstSimplify] Add trivial simplifications for gc.relocate intrinsic …
…(#81639) Fold gc.relocate of undef and null to undef and null respectively. Similar transform is currently done by instcombine, but there is no reason to not include it here as well.
Configuration menu - View commit details
-
Copy full SHA for cb1a9f7 - Browse repository at this point
Copy the full SHA cb1a9f7View commit details -
The missing trailing comma confuses the sync script.
Configuration menu - View commit details
-
Copy full SHA for 4bc2a4f - Browse repository at this point
Copy the full SHA 4bc2a4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for bf3d5db - Browse repository at this point
Copy the full SHA bf3d5dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a6b846a - Browse repository at this point
Copy the full SHA a6b846aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9168a21 - Browse repository at this point
Copy the full SHA 9168a21View commit details -
[mlir][sparse] add doubly compressed test case to assembly op (#81687)
Removes audit TODO
Configuration menu - View commit details
-
Copy full SHA for 3122969 - Browse repository at this point
Copy the full SHA 3122969View commit details
Commits on Feb 14, 2024
-
Used std::vector::reserve when I meant std::vector::resize.
The Linux std has more asserts enabled by default, so it complained, even though this worked on Darwin...
Configuration menu - View commit details
-
Copy full SHA for 3647ff1 - Browse repository at this point
Copy the full SHA 3647ff1View commit details -
[RISCV] Add canonical ISA string as Module metadata in IR. (#80760)
In an LTO build, we don't set the ELF attributes to indicate what extensions were compiled with. The target CPU/Attrs in RISCVTargetMachine do not get set for an LTO build. Each function gets a target-cpu/feature attribute, but this isn't usable to set ELF attributs since we wouldn't know what function to use. We can't just once since it might have been compiler with an attribute likes target_verson. This patch adds the ISA as Module metadata so we can retrieve it in the backend. Individual translation units can still be compiled with different strings so we need to collect the unique set when Modules are merged. The backend will need to combine the unique ISA strings to produce a single value for the ELF attributes. This will be done in a separate patch.
Configuration menu - View commit details
-
Copy full SHA for f45b9d9 - Browse repository at this point
Copy the full SHA f45b9d9View commit details -
[X86][CodeGen] Restrict F128 lowering to GNU environment (#81664)
Otherwise it breaks some environment like X64 Android that doesn't have f128 functions available in its libc. Followup to #79611.
Configuration menu - View commit details
-
Copy full SHA for 21630ef - Browse repository at this point
Copy the full SHA 21630efView commit details -
[mlir][sparse][pybind][CAPI] remove LevelType enum from CAPI, constru…
…… (#81682) …ct LevelType from LevelFormat and properties instead. **Rationale** We used to explicitly declare every possible combination between `LevelFormat` and `LevelProperties`, and it now becomes difficult to scale as more properties/level formats are going to be introduced.
Configuration menu - View commit details
-
Copy full SHA for 429919e - Browse repository at this point
Copy the full SHA 429919eView commit details -
Temporarily skip this test for Python 3.9.
When the parsed command python code is run on 3.9, I get: File ".../lib/python3.9/site-packages/lldb/plugins/parsed_cmd.py", line 124, in translate_value return cls.translators[value_type](value) TypeError: 'staticmethod' object is not callable But this works correctly in Python 3.10 on macOS and Linux. I'm guessing something changed between those versions, and I'll have to do something to work around the difference. But I'm going to skip the test on 3.9 while I figure that out.
Configuration menu - View commit details
-
Copy full SHA for 1ec8197 - Browse repository at this point
Copy the full SHA 1ec8197View commit details -
[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to se…
…parate constants (#81671) Actually update tests w.r.t llvm/llvm-project@9e5a77f and reland llvm/llvm-project#73056
Configuration menu - View commit details
-
Copy full SHA for 7180c23 - Browse repository at this point
Copy the full SHA 7180c23View commit details -
Merge from 'sycl' to 'sycl-web' (3 commits)
iclsrc committedFeb 14, 2024 Configuration menu - View commit details
-
Copy full SHA for 1dbb9fd - Browse repository at this point
Copy the full SHA 1dbb9fdView commit details -
[AMDGPU][MLIR]Add shmem-optimization as an op using transform dialect…
… (#81550) This PR adds functionality to use shared memory optimization as an op using transform dialect.
Configuration menu - View commit details
-
Copy full SHA for 29d1aca - Browse repository at this point
Copy the full SHA 29d1acaView commit details -
Merge from 'main' to 'sycl-web' (33 commits)
CONFLICT (content): Merge conflict in llvm/lib/IR/BasicBlock.cpp
Configuration menu - View commit details
-
Copy full SHA for d33f6f4 - Browse repository at this point
Copy the full SHA d33f6f4View commit details -
Move the parsed_cmd conversion def's to module level functions.
Python3.9 does not allow you to put a reference to a class staticmethod in a table and call it from there. Python3.10 and following do allow this, but we still support 3.9. staticmethod was slightly cleaner, but this will do.
Configuration menu - View commit details
-
Copy full SHA for 22d2f3a - Browse repository at this point
Copy the full SHA 22d2f3aView commit details -
[llvm][Support] Add ExponentialBackoff helper (#81206)
This provides a simple way to implement exponential backoff using a do while loop. Usage example (also see the change to LockFileManager.cpp): ``` ExponentialBackoff Backoff(10s); do { if (tryToDoSomething()) return ItWorked; } while (Backoff.waitForNextAttempt()); return Timeout; ``` Abstracting this out of `LockFileManager` as the module build daemon will need it.
Configuration menu - View commit details
-
Copy full SHA for edff3ff - Browse repository at this point
Copy the full SHA edff3ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for 14b0d0d - Browse repository at this point
Copy the full SHA 14b0d0dView commit details -
[clang][InstallAPI] Introduce basic driver to write out tbd files (#…
…81571) This introduces a basic outline of installapi as a clang driver option. It captures relevant information as cc1 args, which are common arguments already passed to the linker to encode into TBD file outputs. This is effectively an upstream for what already exists as `tapi installapi` in Xcode toolchains, but directly in Clang. This patch does not handle any AST traversing on input yet. InstallAPI is broadly an operation that takes a series of header files that represent a single dynamic library and generates a TBD file out of it which represents all the linkable symbols and necessary attributes for statically linking in clients. It is the linkable object in all Apple SDKs and when building dylibs in Xcode. `clang -installapi` also will support verification where it compares all the information recorded for the TBD files against the already built binary, to catch possible mismatches like when a declaration is missing a definition for an exported symbol.
Configuration menu - View commit details
-
Copy full SHA for 09e9895 - Browse repository at this point
Copy the full SHA 09e9895View commit details -
[SHT_LLVM_BB_ADDR_MAP][obj2yaml] Implements PGOAnalysisMap for elf2ya…
…ml and tests. (#80924) Adds support to obj2yaml for PGO Analysis Map. Adds a test to both obj2yaml and yaml2obj.
Configuration menu - View commit details
-
Copy full SHA for a3f61c8 - Browse repository at this point
Copy the full SHA a3f61c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for ec5f4a4 - Browse repository at this point
Copy the full SHA ec5f4a4View commit details -
[Sanitizers][ABI] Remove too strong assert in asan_abi_shim (#81696)
Recently we enabled building the shim for arm64_32 arch. On this arch, sizeof(uptr) == sizeof(unsigned long) == 4 - so this assert will fail in runtime. Need to just remove this assert rdar://122927166 Co-authored-by: Mariusz Borsa <m_borsa@apple.com>
Configuration menu - View commit details
-
Copy full SHA for 3f738a4 - Browse repository at this point
Copy the full SHA 3f738a4View commit details -
[mlir][tensor] Add support for tensor.pack static shapes inference. (…
…#80848) Fixes iree-org/iree#16317
Configuration menu - View commit details
-
Copy full SHA for bc08cc2 - Browse repository at this point
Copy the full SHA bc08cc2View commit details -
[RISCV] Use SelectionDAG::getVScale in lowerVPReverseExperimental. NF…
…CI (#81694) Use a slightly more idiomatic way of getting vscale. getVScale performs additional constant folding, but I presume computeKnownBits also catches these cases too.
Configuration menu - View commit details
-
Copy full SHA for b9567bc - Browse repository at this point
Copy the full SHA b9567bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 69bcb69 - Browse repository at this point
Copy the full SHA 69bcb69View commit details -
Configuration menu - View commit details
-
Copy full SHA for a854982 - Browse repository at this point
Copy the full SHA a854982View commit details -
Configuration menu - View commit details
-
Copy full SHA for d2f0676 - Browse repository at this point
Copy the full SHA d2f0676View commit details -
Configuration menu - View commit details
-
Copy full SHA for 153661d - Browse repository at this point
Copy the full SHA 153661dView commit details -
Apply clang-tidy fixes for performance-unnecessary-value-param in Tra…
…nsformOps.cpp (NFC)
Configuration menu - View commit details
-
Copy full SHA for 70ebc78 - Browse repository at this point
Copy the full SHA 70ebc78View commit details -
Revert "[clang-format][NFC] Make LangOpts global in namespace Format"
This reverts commit 32e65b0. It seems to break some PowerPC bots. See llvm/llvm-project#81390 (comment).
Configuration menu - View commit details
-
Copy full SHA for 61c83e9 - Browse repository at this point
Copy the full SHA 61c83e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for eafe98f - Browse repository at this point
Copy the full SHA eafe98fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3537ccc - Browse repository at this point
Copy the full SHA 3537cccView commit details -
[DAGCombiner] Remove unnecessary commonAlignment from CombineExtLoad.…
… (#81705) The getAlign function for a load returns the commonAlignment of the "base align" and the offset stored in the MachinePointerInfo. We're splitting a load here, so we should take the base alignment from the original load without any offset that may already exist in the original load. The new load can then maintain its own alignment using just the base alignment and its own offset. Noticed by inspection.
Configuration menu - View commit details
-
Copy full SHA for e625310 - Browse repository at this point
Copy the full SHA e625310View commit details -
[DAGCombiner] Remove unneeded commonAlignment from reduceLoadWidth. (…
…#81707) We already have the PtrOff factored into MachinePointerInfo. Any calls to getAlign on the new load with do commonAlignment with the MachinePointerInfo offset and the base alignment.
Configuration menu - View commit details
-
Copy full SHA for 86ce491 - Browse repository at this point
Copy the full SHA 86ce491View commit details -
[mlir][nvvm] Introduce
nvvm.barrier
OP (#81487)This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ```
Configuration menu - View commit details
-
Copy full SHA for b5d694b - Browse repository at this point
Copy the full SHA b5d694bView commit details -
[ValueTracking] Move the
isSignBitCheck
helper into ValueTracking. ……NFC. (#81704) This patch moves the `isSignBitCheck` helper into ValueTracking to reuse the logic in ValueTracking/InstSimplify. Addresses the comment llvm/llvm-project#80740 (comment).
Configuration menu - View commit details
-
Copy full SHA for dc866ae - Browse repository at this point
Copy the full SHA dc866aeView commit details -
[clang][analyzer] Reformat code of BoolAssignmentChecker (NFC). (#81461)
This is only a code reformatting and rename of variables to the newer format.
Configuration menu - View commit details
-
Copy full SHA for a2eb234 - Browse repository at this point
Copy the full SHA a2eb234View commit details -
[RISCV] Remove -riscv-v-fixed-length-vector-lmul-max from tests. NFC …
…(#78299) Some fixed vector tests in test/CodeGen/RISCV/rvv have multiple run lines that check various configurations of -riscv-v-fixed-length-vector-lmul-max. From what I understand this flag was introduced in the early days of fixed length vector support, but now that fixed vector codegen has matured I'm not sure if it's as relevant today. This patch proposes to remove the various lmul-max run lines from the tests to make them more readable, and any changes to fixed vector codegen easier to review. We have removed them before for the same reason, so this would take care of the remaining test cases: https://reviews.llvm.org/D157973#4593268 (I don't have any strong motivation to remove the actual flag itself, my own personal motivation is just to clean up the tests)
Configuration menu - View commit details
-
Copy full SHA for 0fee211 - Browse repository at this point
Copy the full SHA 0fee211View commit details -
Configuration menu - View commit details
-
Copy full SHA for bd2f7bb - Browse repository at this point
Copy the full SHA bd2f7bbView commit details -
clangCodeGen: Introduce
MCDC::State
withMCDCState.h
(#81497)This packs; * `BitmapBytes` * `BitmapMap` * `CondIDMap` into `MCDC::State`.
Configuration menu - View commit details
-
Copy full SHA for 5c8985e - Browse repository at this point
Copy the full SHA 5c8985eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 243f14d - Browse repository at this point
Copy the full SHA 243f14dView commit details -
[InstSimplify][InstCombine] Remove unnecessary
m_c_*
matchers. (#81……712) This patch removes unnecessary `m_c_*` matchers since we always canonicalize `commutive_op Cst, X` into `commutive_op X, Cst`. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=bfc0b7c6891896ee8e9818f22800472510093864&to=d27b058bb9acaa43d3cadbf3cd889e8f79e5c634&stat=instructions:u
Configuration menu - View commit details
-
Copy full SHA for 470c5b8 - Browse repository at this point
Copy the full SHA 470c5b8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5932f3f - Browse repository at this point
Copy the full SHA 5932f3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 855bac2 - Browse repository at this point
Copy the full SHA 855bac2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8f0435f - Browse repository at this point
Copy the full SHA 8f0435fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 17ac5b1 - Browse repository at this point
Copy the full SHA 17ac5b1View commit details -
[AMDGPU] Do not test both wave sizes for DSDIR disassembly (#81719)
There is nothing in these instruction definitions that depends on wave size so testing both seems like overkill. The corresponding assembler tests do not do it.
Configuration menu - View commit details
-
Copy full SHA for cb8f910 - Browse repository at this point
Copy the full SHA cb8f910View commit details -
[DeadStoreElimination] Optimize tautological assignments (#75744)
If a store is dominated by a condition that ensures that the value being stored in a memory location is already present at that memory location, consider the store a noop. Fixes #63419
Configuration menu - View commit details
-
Copy full SHA for 65b5647 - Browse repository at this point
Copy the full SHA 65b5647View commit details -
[mlir][nfc] Move Op signature to one line
This was accidentally split with a comment
Configuration menu - View commit details
-
Copy full SHA for 55a7ff8 - Browse repository at this point
Copy the full SHA 55a7ff8View commit details -
Revert "[GitHub][workflows] Ask reviewers to merge PRs when author ca…
…nnot (#81142)" This reverts commit 38c706e. This workflow always fails in cases where it needs to create a comment, due to a permissions issue, see the discussion at: https://discourse.llvm.org/t/rfc-fyi-pull-request-greetings-for-new-contributors/75458/20
Configuration menu - View commit details
-
Copy full SHA for 124cd11 - Browse repository at this point
Copy the full SHA 124cd11View commit details -
[X86] Use explicit const SDValue& to avoid implicit copy in for-range…
… across op_values(). NFC. Fixes static analysis warning.
Configuration menu - View commit details
-
Copy full SHA for 786537e - Browse repository at this point
Copy the full SHA 786537eView commit details -
[X86] Add v8i64/v16i32/v16i64 ctpop reduction test coverage
Add test coverage for types wider than legal
Configuration menu - View commit details
-
Copy full SHA for f82e080 - Browse repository at this point
Copy the full SHA f82e080View commit details -
[VPlan] Properly retain flags when cloning VPReplicateRecipe.
This makes sure the correct flags are used for the clone (i.e. the ones present on the recipe), instead of the ones on the original IR instruction. At the moment, this should not change anything, as flags of replicate recipe should not be dropped before they are cloned at the moment. But that will change in a follow-up patch.
Configuration menu - View commit details
-
Copy full SHA for ca56966 - Browse repository at this point
Copy the full SHA ca56966View commit details -
Configuration menu - View commit details
-
Copy full SHA for f1b2865 - Browse repository at this point
Copy the full SHA f1b2865View commit details -
[llvm-dlltool][NFC] Factor out parseModuleDefinition helper. (#81620)
In preparation for ARM64EC support.
Configuration menu - View commit details
-
Copy full SHA for 0c8b594 - Browse repository at this point
Copy the full SHA 0c8b594View commit details -
[MLIR][Python] Added a base class to all builtin floating point types…
… (#81720) This allows to * check if a given ir.Type is a floating point type via isinstance() or issubclass() * get the bitwidth of a floating point type See motivation and discussion in https://discourse.llvm.org/t/add-floattype-to-mlir-python-bindings/76959.
Configuration menu - View commit details
-
Copy full SHA for 82f3cbc - Browse repository at this point
Copy the full SHA 82f3cbcView commit details -
[AArch64] Add tests for fusion on Ampere1/1A/1B (#81725)
As commented on the PR #81293, the Ampere1-family does not have test cases for the common fusion cases it implements. This adds the Ampere1 targets to the relevant misched-fusion testcases: * addadrp * addr * aes
Configuration menu - View commit details
-
Copy full SHA for 6cab375 - Browse repository at this point
Copy the full SHA 6cab375View commit details -
[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC)
Move collectPoisonGeneratingFlags from InnerLoopVectorizer to VPlanTransforms and also update its name. collectPoisonGeneratingFlags already directly drops poison-generating flags, not only collecting it. This means it is more appropriate to integerate it directly into the VPlan transform pipeline. The current implementation still calls back to legal to check if a block needs predication, which should be improved in the future.
Configuration menu - View commit details
-
Copy full SHA for debca7e - Browse repository at this point
Copy the full SHA debca7eView commit details -
[clang][NFC] Use "notable" for "interesting" identifiers in `Identifi…
…erInfo` (#81542) This patch expands notion of "interesting" in `IdentifierInto` it to also cover ObjC keywords and builtins, which matches notion of "interesting" in serialization layer. What was previously "interesting" in `IdentifierInto` is now called "notable". Beyond clearing confusion between serialization and the rest of the compiler, it also resolved a naming problem: ObjC keywords, notable identifiers, and builtin IDs are all stored in the same bit-field. Now we can use "interesting" to name it and its corresponding type, instead of `ObjCKeywordOrInterestingOrBuiltin` abomination.
Configuration menu - View commit details
-
Copy full SHA for 5027569 - Browse repository at this point
Copy the full SHA 5027569View commit details -
[clang][docs] Remove trailing whitespace
Which is causing CI checks to fail. clang/docs/LanguageExtensions.rst:2794:takes no arguments and produces an unsigned long long result. The builtin does clang/docs/LanguageExtensions.rst:2795:not guarantee any particular frequency, only that it is stable. Knowledge of the + echo '*** Trailing whitespace has been found in Clang source files as described above ***'
Configuration menu - View commit details
-
Copy full SHA for c5e1384 - Browse repository at this point
Copy the full SHA c5e1384View commit details -
[ValueTracking] Compute known FPClass from signbit idiom (#80740)
This patch improves `computeKnownFPClass` by using context-sensitive information from `DomConditionCache`. The motivation of this patch is to optimize the following case found in [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the sign bit of %value is always zero. Then the fabs can be eliminated. This pattern also exists in cpython/duckdb/oiio/openexr. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f82e0809ba12170e2f648f8a1ac01e78ef06c958&to=041218bf5491996edd828cc15b3aec5a59ddc636&stat=instructions:u |stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang| |--|--|--|--|--|--|--| |-0.00%|+0.01%|+0.00%|-0.03%|+0.00%|+0.00%|+0.02%|
Configuration menu - View commit details
-
Copy full SHA for 16a0629 - Browse repository at this point
Copy the full SHA 16a0629View commit details -
[libc] Add user defined literals to initialize
BigInt
and `__uint12……8_t` constants (#81267) Adds user defined literal to construct unsigned integer constants. This is useful when constructing constants for non native C++ types like `__uint128_t` or our custom `BigInt` type.
Configuration menu - View commit details
-
Copy full SHA for 0323235 - Browse repository at this point
Copy the full SHA 0323235View commit details -
[TableGen] Stop using make_pair and make_tuple. NFC. (#81730)
These are unnecessary since C++17.
Configuration menu - View commit details
-
Copy full SHA for f723260 - Browse repository at this point
Copy the full SHA f723260View commit details -
[AArch64] Materialize constants via fneg. (#80641)
This is something that is already done as a special case for copysign, this patch extends it to be more generally applied. If we are trying to matrialize a negative constant (notably -0.0, 0x80000000), then there may be no movi encoding that creates the immediate, but a fneg(movi) might. Some of the existing patterns for RADDHN needed to be adjusted to keep them in line with the new immediates.
Configuration menu - View commit details
-
Copy full SHA for 6c84709 - Browse repository at this point
Copy the full SHA 6c84709View commit details -
[mlir][python] expose LLVMStructType API (#81672)
Expose the API for constructing and inspecting StructTypes from the LLVM dialect. Separate constructor methods are used instead of overloads for better readability, similarly to IntegerType.
Configuration menu - View commit details
-
Copy full SHA for bd8fcf7 - Browse repository at this point
Copy the full SHA bd8fcf7View commit details -
[C23] Do not diagnose binary literals as an extension (#81658)
We previously would diagnose them as a GNU extension in C mode, but they are now a feature of C23. The -Wgnu-binary-literal warning group no longer controls any diagnostics as this is no longer a GNU extension. The warning group is retained as a noop to help avoid "unknown warning" diagnostics. This also adds the companion compatibility warning which existed for C++ but not for C. Fixes llvm/llvm-project#72017
Configuration menu - View commit details
-
Copy full SHA for 8e24bc0 - Browse repository at this point
Copy the full SHA 8e24bc0View commit details -
[MC/DC] Refactor: Introduce
ConditionIDs
asstd::array<2>
(#81221)Its 0th element corresponds to `FalseID` and 1st to `TrueID`. CoverageMappingGen.cpp: `DecisionIDPair` is replaced with `ConditionIDs`
Configuration menu - View commit details
-
Copy full SHA for 1a1fcac - Browse repository at this point
Copy the full SHA 1a1fcacView commit details -
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.
Configuration menu - View commit details
-
Copy full SHA for 43c7eb5 - Browse repository at this point
Copy the full SHA 43c7eb5View commit details -
[AArch64] Initial Ampere1B scheduling model (#81341)
The Ampere1B core is enabled with a new scheduling/pipeline model, as it provides significant updates over the Ampere1 core; it reduces latencies on many instructions, has some micro-ops reassigned between the XY and X units, and provides modelling for the instructions added since Ampere1 and Ampere1A. As this is the first model implementing the CSSC instructions, we update the UnsupportedFeatures on all other models (that have CompleteModel set). Testcases are added under llvm-mca: these showed the FullFP16 feature missing, so we are adding it in as part of this commit. This *adds tests and additional fixes* compared to the reverted #81338.
Configuration menu - View commit details
-
Copy full SHA for dd1897c - Browse repository at this point
Copy the full SHA dd1897cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d7fdfa - Browse repository at this point
Copy the full SHA 2d7fdfaView commit details -
[RemoveDIs] Replicate dbg intrinsic movement pattern in SelectOptimiz…
…e (#81737) Fix crash mentioned in comments on d759618. The assertion being hit was complaining that we had dangling DPValues; the DPValues attached to the terminator of StartBlock become dangling after the terminator is erased, and they're never "flushed" back onto the new terminator once it's added. Doing that makes the crash go away, but doesn't replicate existing dbg.* behaviour. See the comment in the patch. This change both fixes the crash (because there are now no DPValues left on the terminator to dangle) and replicates existing behaviour (moves those DPValues down to the new block).
Configuration menu - View commit details
-
Copy full SHA for a50bd0d - Browse repository at this point
Copy the full SHA a50bd0dView commit details -
[clang][Interp][NFC] Add missing special cases for implicit functions
We have this special case in getSource() and getRange(), but we were missing it in getExpr() and getLocation().
Configuration menu - View commit details
-
Copy full SHA for b37bd78 - Browse repository at this point
Copy the full SHA b37bd78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 232cf94 - Browse repository at this point
Copy the full SHA 232cf94View commit details -
[AMDGPU] Refactor export instruction definitions. NFC. (#81738)
Using multiclasses for the Real instruction definitions has a couple of benefits: - It avoids repeating information that was already specified when defining the corresponding pseudo, like the row and done bits. - It allows commoning up the Real definitions for architectures which are mostly the same, like GFX11 and GFX12.
Configuration menu - View commit details
-
Copy full SHA for 9c06b07 - Browse repository at this point
Copy the full SHA 9c06b07View commit details -
[NFC] Add API documentation and annotations (#78635)
This change adds SM 6.2 availability annotation to 16-bit APIs (16-bit types require SM 6.2), and adds Doxygen API documentation.
Configuration menu - View commit details
-
Copy full SHA for 457c179 - Browse repository at this point
Copy the full SHA 457c179View commit details -
Configuration menu - View commit details
-
Copy full SHA for 995c906 - Browse repository at this point
Copy the full SHA 995c906View commit details -
[mlir][Transforms][NFC] Improve listener layering in dialect conversi…
…on (#81236) Context: Conversion patterns provide a `ConversionPatternRewriter` to modify the IR. `ConversionPatternRewriter` provides the public API. Most function calls are forwarded/handled by `ConversionPatternRewriterImpl`. The dialect conversion uses the listener infrastructure to get notified about op/block insertions. In the current design, `ConversionPatternRewriter` inherits from both `PatternRewriter` and `Listener`. The conversion rewriter registers itself as a listener. This is problematic because listener functions such as `notifyOperationInserted` are now part of the public API and can be called from conversion patterns; that would bring the dialect conversion into an inconsistent state. With this commit, `ConversionPatternRewriter` no longer inherits from `Listener`. Instead `ConversionPatternRewriterImpl` inherits from `Listener`. This removes the problematic public API and also simplifies the code a bit: block/op insertion notifications were previously forwarded to the `ConversionPatternRewriterImpl`. This is no longer needed.
Configuration menu - View commit details
-
Copy full SHA for ea2d938 - Browse repository at this point
Copy the full SHA ea2d938View commit details -
[LoopVectorize] Fix divide-by-zero bug (#80836) (#81721)
When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836
Configuration menu - View commit details
-
Copy full SHA for 1c10821 - Browse repository at this point
Copy the full SHA 1c10821View commit details -
[mlir][Transforms][NFC] Modularize block actions (#81237)
Throughout the rewrite process, the dialect conversion maintains a list of "block actions" that can be rolled back upon failure. This commit encapsulates the existing block actions into separate classes, making it easier to add additional actions in the future. This commit also renames "block actions" to "IR rewrites". In a subsequent commit, an "operation rewrite" class that allows rolling back movements of single operations is added. This is to support `moveOpBefore` in the dialect conversion. Rewrites have two methods: `commit()` commits an action. It can no longer be rolled back afterwards. `rollback()` undoes a rewrite. It can no longer be committed afterwards.
Configuration menu - View commit details
-
Copy full SHA for 8faefe3 - Browse repository at this point
Copy the full SHA 8faefe3View commit details -
Reapply "[DebugInfo][RemoveDIs] Turn on non-instrinsic debug-info by …
…default" This reapplies commit bdde5f9 by undoing the revert fd3a0c1. The previous reapplication d759618 was reverted due to a crash (reproducer in comments for d759618) which was fixed in #81737. As noted in the original commit, this commit may break downstream tests. If this commit is breaking your downstream tests, please see comment 12 in [0], which documents the kind of variation in tests we'd expect to see from this change and what to do about it. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Configuration menu - View commit details
-
Copy full SHA for a93a4ec - Browse repository at this point
Copy the full SHA a93a4ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2347a47 - Browse repository at this point
Copy the full SHA 2347a47View commit details -
[mlir][Transforms] Support
moveOpBefore
/After
in dialect conversi……on (#81240) Add a new rewrite class for "operation movements". This rewrite class can roll back `moveOpBefore` and `moveOpAfter`. `RewriterBase::moveOpBefore` and `RewriterBase::moveOpAfter` is no longer virtual. (The dialect conversion can gather all required information for rollbacks from listener notifications.)
Configuration menu - View commit details
-
Copy full SHA for 8f4cd2c - Browse repository at this point
Copy the full SHA 8f4cd2cView commit details -
[libc][__support][bit] remove compiler has builtin checks (#81679)
We only support building llvmlibc with modern compilers. https://libc.llvm.org/compiler_support.html#minimum-supported-versions All versions of the these compilers support these builtins; GCC does not support the short variants.
Configuration menu - View commit details
-
Copy full SHA for 4efbf52 - Browse repository at this point
Copy the full SHA 4efbf52View commit details -
[libc][__support][bit] simplify FLZ (#81678)
`countl_zero(~x)` *is* `countl_one(x)`
Configuration menu - View commit details
-
Copy full SHA for 0f6f5bf - Browse repository at this point
Copy the full SHA 0f6f5bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7c4c274 - Browse repository at this point
Copy the full SHA 7c4c274View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6059671 - Browse repository at this point
Copy the full SHA 6059671View commit details -
[lld/ELF] Avoid unnecessary TPOFF relocations in GOT for -pie (#81739)
With the new SystemZ port we noticed that -pie executables generated from files containing R_390_TLS_IEENT relocations will have unnecessary relocations in their GOT: 9e8d8: R_390_TLS_TPOFF *ABS*+0x18 This is caused by the config->isPic conditon in addTpOffsetGotEntry: static void addTpOffsetGotEntry(Symbol &sym) { in.got->addEntry(sym); uint64_t off = sym.getGotOffset(); if (!sym.isPreemptible && !config->isPic) { in.got->addConstant({R_TPREL, target->symbolicRel, off, 0, &sym}); return; } It is correct that we need to retain a TPOFF relocation if the target symbol is preemptible or if we're building a shared library. But when building a -pie executable, those values are fixed at link time and there's no need for any remaining dynamic relocation. Note that the equivalent MIPS-specific code in MipsGotSection::build checks for config->shared instead of config->isPic; we should use the same check here. (Note also that on many other platforms we're not even using addTpOffsetGotEntry in this case as an IE->LE relaxation is applied before; we don't have this type of relaxation on SystemZ.)
Configuration menu - View commit details
-
Copy full SHA for 6f90773 - Browse repository at this point
Copy the full SHA 6f90773View commit details -
Configuration menu - View commit details
-
Copy full SHA for 411554a - Browse repository at this point
Copy the full SHA 411554aView commit details -
[polly][ScheduleOptimizer] Use IslMaxOperationsGuard helper instead o…
…f explicit restoration (#79303) To fix long compile time issue of Schedule optimizer, patch #77280 sets the upper cap on max ISL operations. In case of bailing out when ISL quota is hit, error handling behavior was restored manually. This commit replaces the restoration code with IslMaxOperationsGuard helper and also removes redundant early return.
Configuration menu - View commit details
-
Copy full SHA for 0f33c54 - Browse repository at this point
Copy the full SHA 0f33c54View commit details -
Revert "[libc][NFC] Use user defined literals to build 128 and 256 bi…
…t constants." (#81771) Reverts llvm/llvm-project#81746
Configuration menu - View commit details
-
Copy full SHA for 78d401b - Browse repository at this point
Copy the full SHA 78d401bView commit details -
[Clang][CodeGen] Loose the cast check when emitting builtins (#81669)
This patch looses the cast check (`canLosslesslyBitCastTo`) and leaves it to the one inside `CreateBitCast`. It seems too conservative for the use case here.
Configuration menu - View commit details
-
Copy full SHA for 630f82e - Browse repository at this point
Copy the full SHA 630f82eView commit details -
[lldb] Fix the flakey Concurrent tests on macOS (#81710)
The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in ``` frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8 frame #1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52 frame #2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384 frame #3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92 frame #4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112 frame #5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148 ``` there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062
Configuration menu - View commit details
-
Copy full SHA for dbc40b3 - Browse repository at this point
Copy the full SHA dbc40b3View commit details -
Apply clang-tidy fixes for readability-simplify-boolean-expr in Trans…
…formOps.cpp (NFC)
Configuration menu - View commit details
-
Copy full SHA for 1ddc541 - Browse repository at this point
Copy the full SHA 1ddc541View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8383bf2 - Browse repository at this point
Copy the full SHA 8383bf2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 89dc313 - Browse repository at this point
Copy the full SHA 89dc313View commit details -
Apply clang-tidy fixes for readability-identifier-naming in SparseTen…
…sorRuntime.cpp (NFC)
Configuration menu - View commit details
-
Copy full SHA for bf4480d - Browse repository at this point
Copy the full SHA bf4480dView commit details -
Configuration menu - View commit details
-
Copy full SHA for d99d258 - Browse repository at this point
Copy the full SHA d99d258View commit details -
[RISCV] Split long build_vector sequences to reduce critical path (#8…
…1312) If we have a long chain of vslide1down instructions to build e.g. a <16 x i8> from scalar, we end up with a critical path going through the entire chain. We can instead build two halves, and then combine them with a vselect. This costs one additional temporary register, but reduces the critical path by roughly half. To avoid needing to change VL, we fill each half with undefs for the elements which will come from the other half. The vselect will at worst become a vmerge, but is often folded back into the final instruction of the sequence building the lower half. A couple notes on the heuristic here: * This is restricted to LMUL1 to avoid quadratic costing reasoning. * This only splits once. In future work, we can explore recursive splitting here, but I'm a bit worried about register pressure and thus decided to be conservative. It also happens to be "enough" at the default zvl of 128. * "8" is picked somewhat arbitrarily as being "long". In practice, our build_vector codegen for 2 defined elements in a VL=4 vector appears to need some work. 4 defined elements in a VL=8 vector seems to generally produce reasonable results. * Halves may not be an optimal split point. I went down the rabit hole of trying to find the optimal one, and decided it wasn't worth the effort to start with. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
Configuration menu - View commit details
-
Copy full SHA for 275eeda - Browse repository at this point
Copy the full SHA 275eedaView commit details -
[lldb][NFCI] Remove CommandObjectProcessHandle::VerifyCommandOptionVa…
…lue (#79901) I was refactoring something else but ran into this function. It was somewhat confusing to read through and understand, but it boils down to two steps: - First we try `OptionArgParser::ToBoolean`. If that works, then we're good to go. - Second, we try `llvm::to_integer` to see if it's an integer. If it parses to 0 or 1, we're good. - Failing either of the steps above means we cannot parse it into a bool. Instead of having an integer out param and a bool return value, the interface is better served with an optional<bool> -- Either it parses into true or false, or you get back nothing (nullopt).
Configuration menu - View commit details
-
Copy full SHA for 307cd88 - Browse repository at this point
Copy the full SHA 307cd88View commit details -
Configuration menu - View commit details
-
Copy full SHA for 16e7d68 - Browse repository at this point
Copy the full SHA 16e7d68View commit details -
[clang][CodeGen] Shift relink option implementation away from module …
…cloning (#81693) We recently implemented a new option allowing relinking of bitcode modules via the "-mllvm -relink-builtin-bitcode-postop" option. This implementation relied on llvm::CloneModule() in order to pass copies to modules and preserve the original modules for later relinking. However, cloning modules has been found to be prohibitively expensive, significantly increasing compilation time for large bitcode libraries. In this patch, we shift the relink option implementation to instead link the original modules initially, and reload modules from the file system if relinking is requested. This approach results in significantly reduced overhead. We accomplish this by creating a new ReloadModules() routine that can be called from a BackendConsumer class, to mimic the behavior of ASTConsumer's loadLinkModules(), but without access to the CompilerInstance. Because loading the bitcodes from the filesystem requires access to the FileManager class, we also forward a reference to the CompilerInstance class to the BackendConsumer. This mirrors what is already done for several CompilerInstance members, such as TargetOptions and CodeGenOptions. Finally, we needed to add a const specifier to the FileManager::getBufferForFile() routine to allow it to be called using the const reference returned from CompilerInstance::getFileManager()
Configuration menu - View commit details
-
Copy full SHA for 6d4ffbd - Browse repository at this point
Copy the full SHA 6d4ffbdView commit details -
Merge from 'sycl' to 'sycl-web'
iclsrc committedFeb 14, 2024 Configuration menu - View commit details
-
Copy full SHA for 0728f6d - Browse repository at this point
Copy the full SHA 0728f6dView commit details
Commits on Feb 15, 2024
-
[SYCL][Graph] Add node and graph queries for mixed usage (intel#12366)
This PR adds queries to both nodes and modifiable graphs which enable better mixed usage of both the explicit and record & replay APIs in a single program. It also reworks how subgraphs are handled: previously nodes were merged into the modifiable graph, but this would pose a problem for users querying the graph since they would not see a single subgraph node, and this merging behaviour was an implementation detail. This has been changed so that now subgraph nodes are only merged in the executable graph, and are stored as a single node of type `subgraph` in the modifiable graph. As a consequence of this change all nodes are now also copied when making the executable graph, where previously they were not. - Reworked how subgraphs are handled - Add graph and node queries to the SYCL-Graph spec - Implement graph and node queries - New node_type enum - Explicit nodes now also have associated events (fixes mixed usage issue) - New tests for queries - Update ABI symbols
Configuration menu - View commit details
-
Copy full SHA for 5337a8a - Browse repository at this point
Copy the full SHA 5337a8aView commit details -
[SYCL][Fusion] Set
IsNewDbgInfoFormat
when creating new functions (i……ntel#12712) Set `IsNewDbgInfoFormat` to the default value for functions created in the SYCL Kernel Fusion pipeline. This prepares `sycl-fusion` for migration to the new debug info format. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for f910a4c - Browse repository at this point
Copy the full SHA f910a4cView commit details -
[UR] bump tag to f11823e1 (intel#12721)
oneapi-src/unified-runtime#1343 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 62a0010 - Browse repository at this point
Copy the full SHA 62a0010View commit details -
[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends (inte…
…l#12075) This PR adds joint matrix query for CUDA and HIP backends as described in [sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc#query-interface) --------- Co-authored-by: Konrad Kusiak <konradk@login01.chn>
Configuration menu - View commit details
-
Copy full SHA for 00eebe1 - Browse repository at this point
Copy the full SHA 00eebe1View commit details -
[CUDA][HIP][TEST-E2E] Include the necessary environment paths during …
…the test-e2e build for CUDA and HIP backends. (intel#12606) Include the necessary environment paths during the test-e2e build for `CUDA` and `HIP` backends. The absence of the added path leads to the inability to locate libdevice for specific architectures, resulting in a failure. Below is the reported error when expected `CUDA_PATH` is missing ` clang++: error: cannot find libdevice for `sm_50`; provide path to different `CUDA` installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice `
Configuration menu - View commit details
-
Copy full SHA for 6b8792c - Browse repository at this point
Copy the full SHA 6b8792cView commit details -
[SYCL] fix for syclcompat test on Windows (intel#12696)
-shared flag is a clang/linux option. On Windows we need to be cognizant of possibly using MSVC compatible driver (e.g. icx) Needs `/clang` passthrough when using non MSVC options
Configuration menu - View commit details
-
Copy full SHA for 3f445cf - Browse repository at this point
Copy the full SHA 3f445cfView commit details -
[SYCL] Revert friend changes to assignment and incr/decr for swizzles (…
…intel#12682) This commit does a partial revert of intel#12396. This is to avoid an issue where the new friend operators wouldn't accept the arguments as l-value references. --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 6194f3c - Browse repository at this point
Copy the full SHA 6194f3cView commit details -
[ESIMD] Fix atomic_update() implementation for N=16 and N=32 on Gen12 (…
…intel#12722) atomic_update() for USM and ACC N=16,32 were lowered to SVM/DWORD atomic intrinsics even though the HW instructions on Gen12 supported only N up to 8 for USM and up to 16 for ACC. GPU had legalization pass for N that split longer vectors to smaller and available in HW. That GPU optimization/legalization workes incorrectly for USM as it splits longer vectors assuming instruction is available for N=16 in case of USM, which is not correct. The patch here implements splitting of N=16 and N=32 cases for atomic_update(usm, ...) to N=8 vectors until GPU fixes the legalization for USM atomic_update. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 44a74d0 - Browse repository at this point
Copy the full SHA 44a74d0View commit details -
Merge from 'main' to 'sycl-web' (107 commits)
1> Add code in CodeGenAction.cpp Basic change add new field "const FileManager &FileMgr" Add new function ReloadModules Code change in function LinkInModules. 2> revert "[DebugInfo][RemoveDIs] Turn on non-instrinsic debug-info by default. CONFLICT (modify/delete): clang/lib/CodeGen/BackendConsumer.h deleted in HEAD and modified in 6d4ffbd. Version 6d4ffbd of clang/lib/CodeGen/BackendConsumer.h left in tree. CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenAction.cpp CONFLICT (modify/delete): clang/lib/CodeGen/LinkInModulesPass.cpp deleted in HEAD and modified in 6d4ffbd. Version 6d4ffbd of clang/lib/CodeGen/LinkInModulesPass.cpp left in tree.
Configuration menu - View commit details
-
Copy full SHA for 4565039 - Browse repository at this point
Copy the full SHA 4565039View commit details -
Implement SPV_INTEL_task_sequence extension (intel#2340)
Spec: KhronosGroup/SPIRV-Registry#192 Original commit: KhronosGroup/SPIRV-LLVM-Translator@fc9896b1fff0057
Configuration menu - View commit details
-
Copy full SHA for a617aad - Browse repository at this point
Copy the full SHA a617aadView commit details -
Remove internal values for SPV_INTEL_cache_controls (intel#2346)
The Headers for this extension were published so we should use them instead: KhronosGroup/SPIRV-Headers@a8af2ce Original commit: KhronosGroup/SPIRV-LLVM-Translator@95d70a9ab4077ed
Configuration menu - View commit details
-
Copy full SHA for 6695b8a - Browse repository at this point
Copy the full SHA 6695b8aView commit details -
Fix SPIR-V consumption of DebugInfoNone for debug types (intel#2341)
OpenCL and NonSemantic DebugInfo specifications are flexible in terms of allowing any debug information be replaced with DebugInfoNone, so various of SPIR-V producers follow that and generate it for base types of several debug instructions, leaving SPIR-V consumers to handle this. By default the translator replaces missing debug info with tag: null, which is in most cases correct. Yet, there are situations, where it's not allowed by both LLVM and DWARF, for example for DW_TAG_array_type DWARF spec sets, that DW_AT_type attribute is mandatory. For such cases new transNonNullDebugType wrapper function was added to the translator, generating "DIBasicType(tag: DW_TAG_unspecified_type, name: "SPIRV unknown type")" where DebugInfoNone was used as the type. This function doesn't replace all calls to transDebugInst<DIType> as there are cases, where we can generate null type, for example DWARF doesn't require it for DW_TAG_typedef, hence I'm not changing translation flow in this case. Additionally to this, while DWARF requires type attribute for DW_TAG_pointer_type, LLVM does not, hence I'm not changing translation flow in this case as well. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@ec023805a0ce26f
Configuration menu - View commit details
-
Copy full SHA for 64cefa5 - Browse repository at this point
Copy the full SHA 64cefa5View commit details -
Fix DebugTypeVector test (intel#2347)
It should have tested DebugInfoNone base type Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@e0aef72fee42e0a
Configuration menu - View commit details
-
Copy full SHA for ae1d570 - Browse repository at this point
Copy the full SHA ae1d570View commit details -
Map to unordered_map for SPIRVIdToEntryMap (intel#2348)
Small fix but yields around 30% speedup for translation SPIR-V to IR. Original commit: KhronosGroup/SPIRV-LLVM-Translator@513b9578d310282
Configuration menu - View commit details
-
Copy full SHA for f7b658f - Browse repository at this point
Copy the full SHA f7b658fView commit details -
Fix BufferLocationINTEL decoration translation (intel#2335)
There was an assumption, that ptr.annotation encoding buffer_location should be used by load or store instructions. But there is no such restriction in the specification. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@7a37ea920f730e0
Configuration menu - View commit details
-
Copy full SHA for d18a70f - Browse repository at this point
Copy the full SHA d18a70fView commit details -
Prepare for non-instrinsic debug info (intel#2362)
For now just convert BB with convertFromNewDbgValues, will figure out something smarter a bit later. I've updated several tests with dbg.declare intrinsic adding --experimental-debuginfo-iterators=1 to check if it works. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@0e87aefecf7c500
Configuration menu - View commit details
-
Copy full SHA for 272ba9e - Browse repository at this point
Copy the full SHA 272ba9eView commit details -
Fix allowed types for OpConstantNull (intel#2361)
The SPIR-V Specification allows `OpConstantNull` types to be scalar or vector booleans, integers, or floats. Update an assert for this and add a SPIR-V -> LLVM IR test. Original commit: KhronosGroup/SPIRV-LLVM-Translator@9ec969c1c379bde
Configuration menu - View commit details
-
Copy full SHA for 55a143b - Browse repository at this point
Copy the full SHA 55a143bView commit details -
Map FPFastMathModeINTEL on SPV_INTEL_fp_fast_math_mode (intel#2360)
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@262395da9234fe4
Configuration menu - View commit details
-
Copy full SHA for 339c1c6 - Browse repository at this point
Copy the full SHA 339c1c6View commit details -
[SYCL] Fix malloc shared by throwing when usm_shared_allocations not …
…supported (intel#12700) Final PR in the series of intel#12636. Refer to it for a description. After a discussion with @AlexeySachkov we've decided its best to not rewrite USM and syclcompat tests with buffers/accessors. For USM, the reason is obvious and for syclcompat you can reach out to Alexey. Therefore, these tests are handled using if statements or requring aspect to be supported. Once this PR is merged, the behavior of malloc_shared will be to throw if the usm_shared_allocations is not supported which is conformant with the spec.
Configuration menu - View commit details
-
Copy full SHA for 1bec982 - Browse repository at this point
Copy the full SHA 1bec982View commit details -
Configuration menu - View commit details
-
Copy full SHA for 90bcc32 - Browse repository at this point
Copy the full SHA 90bcc32View commit details -
[SYCL] Disable warnings when compiling plugins from UR sources (intel…
…#12730) This is a generalization of the existing workarounds: https://github.com/intel/llvm/blob/sycl/sycl/plugins/unified_runtime/CMakeLists.txt#L40-L54 etc.
Configuration menu - View commit details
-
Copy full SHA for 1c223e1 - Browse repository at this point
Copy the full SHA 1c223e1View commit details
Commits on Feb 16, 2024
-
Bump cryptography from 41.0.6 to 42.0.0 in llvm/utils/git/requirement…
…s.txt (intel#12714) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0 to resolve identified security vulnerability in 3rd party dependency. Refer to [cryptography's changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst).
Configuration menu - View commit details
-
Copy full SHA for 746ed9f - Browse repository at this point
Copy the full SHA 746ed9fView commit details -
LLVM and SPIRV-LLVM-Translator pulldown (WW07 2024)
LLVM: llvm/llvm-project@16e7d68 SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@262395da9234fe4
Configuration menu - View commit details
-
Copy full SHA for 0c74e16 - Browse repository at this point
Copy the full SHA 0c74e16View commit details -
[SYCL] add overlooked default context test . (intel#12728)
despite having a unit test for default context, realized there is not one to affirm the new default configuration.
Configuration menu - View commit details
-
Copy full SHA for 79d775e - Browse repository at this point
Copy the full SHA 79d775eView commit details -
Configuration menu - View commit details
-
Copy full SHA for aa015f3 - Browse repository at this point
Copy the full SHA aa015f3View commit details -
[SYCL][Graph] Clean-up E2E Tests (intel#12685)
Some clean-up for SYCL-Graph E2E tests: * Remove redundant `Event` variables that are initialized over loop iterations but never used. * Remove all instances of the no immediate command-list property, and use environment variable instead to test both paths. * Always use FileCheck leak checking rather than `CHECK-NOT: Leak`. * Remove unnecessary threading code from `Inputs/basic_usm.cpp`
Configuration menu - View commit details
-
Copy full SHA for d747667 - Browse repository at this point
Copy the full SHA d747667View commit details -
[ESIMD] Enable -fsycl-esimd-force-stateless-mem by default (intel#9452)
Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com
Configuration menu - View commit details
-
Copy full SHA for f316273 - Browse repository at this point
Copy the full SHA f316273View commit details -
[SYCL][ESIMD] Implement scatter for local accessors accepting compile…
… time properties (intel#12675)
Configuration menu - View commit details
-
Copy full SHA for 27c9546 - Browse repository at this point
Copy the full SHA 27c9546View commit details -
Configuration menu - View commit details
-
Copy full SHA for 76bbf93 - Browse repository at this point
Copy the full SHA 76bbf93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a98c4c - Browse repository at this point
Copy the full SHA 1a98c4cView commit details
Commits on Feb 19, 2024
-
[SYCL][Graph] Avoid unnecessary inter-partition dependencies (intel#1…
…2680) Improves management of inter-partition dependencies, so that only required dependencies are added. As removing these dependencies can results in multiple executions paths, we have added a map to track all events returned from submitted partitions. All these events are linked to the main event returned to user. Adds tests.
Configuration menu - View commit details
-
Copy full SHA for 54a67eb - Browse repository at this point
Copy the full SHA 54a67ebView commit details -
[SYCL][Bindless] Fix Grad flag (intel#12729)
Grad flag was set to 0x3 (meaning Lod + Bias) instead of 0x4. See https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Image_Operands Signed-off-by: Victor Lomuller <victor@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for f614781 - Browse repository at this point
Copy the full SHA f614781View commit details -
UR fix for MaxRegsPerBlock check in setKernelParams (intel#12549)
Bring the fix for MaxRegsPerBlock check from oneapi-src/unified-runtime#1299 to `intel/llvm`. No changes needed other than updating the UR repo hash. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 5310b20 - Browse repository at this point
Copy the full SHA 5310b20View commit details -
Fix a leak in pi_unified_runtime.cpp. (intel#12589)
`LoaderConfig` is created and stored in a local pointer and never released when done using, causing it to be leaked. This patch releases the `LoaderConfig` when finished using it.
Configuration menu - View commit details
-
Copy full SHA for d697024 - Browse repository at this point
Copy the full SHA d697024View commit details -
[NFC][SYCL] Move a helper to its single legacy use (intel#12740)
Old builtins implementation is going to be removed in the next ABI breaking window and that helper is only used there.
Configuration menu - View commit details
-
Copy full SHA for 8293a5c - Browse repository at this point
Copy the full SHA 8293a5cView commit details
Commits on Feb 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b36cdd1 - Browse repository at this point
Copy the full SHA b36cdd1View commit details -
Bump cryptography from 42.0.0 to 42.0.2 in /llvm/utils/git (intel#12746)
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's changelog</a>.</em></p> <blockquote> <p>42.0.2 - 2024-01-30</p> <pre><code> * Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 3.2.1. * Fixed an issue that prevented the use of Python buffer protocol objects in ``sign`` and ``verify`` methods on asymmetric keys. * Fixed an issue with incorrect keyword-argument naming with ``EllipticCurvePrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePrivateKey.exchange`, ``X25519PrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.x25519.X25519PrivateKey.exchange`, ``X448PrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.x448.X448PrivateKey.exchange`, and ``DHPrivateKey`` :meth:`~cryptography.hazmat.primitives.asymmetric.dh.DHPrivateKey.exchange`. <p>.. _v42-0-1:</p> <p>42.0.1 - 2024-01-24 </code></pre></p> <ul> <li>Fixed an issue with incorrect keyword-argument naming with <code>EllipticCurvePrivateKey</code> :meth:<code>~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePrivateKey.sign</code>.</li> <li>Resolved compatibility issue with loading certain RSA public keys in :func:<code>~cryptography.hazmat.primitives.serialization.load_pem_public_key</code>.</li> </ul> <p>.. _v42-0-0:</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pyca/cryptography/commit/2202123b50de1b8788f909a3e5afe350c56ad81e"><code>2202123</code></a> changelog and version bump 42.0.2 (<a href="https://redirect.github.com/pyca/cryptography/issues/10268">#10268</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/f7032bdd409838f67fc2b93343f897fb5f397d80"><code>f7032bd</code></a> bump openssl in CI (<a href="https://redirect.github.com/pyca/cryptography/issues/10298">#10298</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10299">#10299</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/002e886f16d8857151c09b11dc86b35f2ac9aec3"><code>002e886</code></a> Fixes <a href="https://redirect.github.com/pyca/cryptography/issues/10294">#10294</a> -- correct accidental change to exchange kwarg (<a href="https://redirect.github.com/pyca/cryptography/issues/10295">#10295</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10296">#10296</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/92fa9f2f606caea5d499c825e832be5bac6f0c23"><code>92fa9f2</code></a> support bytes-like consistently across our asym sign/verify APIs (<a href="https://redirect.github.com/pyca/cryptography/issues/10260">#10260</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/1">#1</a>...</li> <li><a href="https://github.com/pyca/cryptography/commit/6478f7e28be54b51931277235de01b249ceabd96"><code>6478f7e</code></a> explicitly support bytes-like for signature/data in RSA sign/verify (<a href="https://redirect.github.com/pyca/cryptography/issues/10259">#10259</a>) ...</li> <li><a href="https://github.com/pyca/cryptography/commit/4bb8596ae02d95bb054dbcf55e8771379dbe0c19"><code>4bb8596</code></a> fix the release script (<a href="https://redirect.github.com/pyca/cryptography/issues/10233">#10233</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10254">#10254</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/337437dc2e62772bde4ad5544f4b1db9ee7572d9"><code>337437d</code></a> 42.0.1 bump (<a href="https://redirect.github.com/pyca/cryptography/issues/10252">#10252</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/56255de6b2d1a2d2e502b0275231ca81907f33f1"><code>56255de</code></a> allow SPKI RSA keys to be parsed even if they have an incorrect delimiter (<a href="https://redirect.github.com/pyca/cryptography/issues/1">#1</a>...</li> <li><a href="https://github.com/pyca/cryptography/commit/12f038b38af76e36efe8cef09597010c97647e8f"><code>12f038b</code></a> fixes <a href="https://redirect.github.com/pyca/cryptography/issues/10237">#10237</a> -- correct EC sign parameter name (<a href="https://redirect.github.com/pyca/cryptography/issues/10239">#10239</a>) (<a href="https://redirect.github.com/pyca/cryptography/issues/10240">#10240</a>)</li> <li>See full diff in <a href="https://github.com/pyca/cryptography/compare/42.0.0...42.0.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cryptography&package-manager=pip&previous-version=42.0.0&new-version=42.0.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alexey Bader <alexey.bader@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 5fae0aa - Browse repository at this point
Copy the full SHA 5fae0aaView commit details -
[ESIMD][NFC][E2E] Fix 570 compilation warnings in ESIMD E2E tests (in…
…tel#12748) Warnings fixed: - deprecated scatter_rgba - deprecated get_cl_code - deprecated lsc_fence - deprecated uchar type usage - deprecated get_access on HOST - deprecated get_pointer - usage of isfinite with -ffast-math - deprecated dpas_argument_type::s1 - deprecated gpu_selector() Also, the memory alloc/free in historgram*.cpp tests were updated to simplify the potential memory leak avoidance. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 436e687 - Browse repository at this point
Copy the full SHA 436e687View commit details -
[GHA] Uplift Linux GPU RT version to 24.05.28454.6 (intel#12764)
Scheduled drivers uplift Co-authored-by: GitHub Actions <actions@github.com>
Configuration menu - View commit details
-
Copy full SHA for 6863dfc - Browse repository at this point
Copy the full SHA 6863dfcView commit details -
[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicat…
…ed cmd-list Update the design doc. Update the UR tag.
Configuration menu - View commit details
-
Copy full SHA for 8e21a1d - Browse repository at this point
Copy the full SHA 8e21a1dView commit details -
Configuration menu - View commit details
-
Copy full SHA for ed730fe - Browse repository at this point
Copy the full SHA ed730feView commit details