forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge from main with loop wrappers + composite support + liboffload #71
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I made a small typo when writing a test for MathExtras.h, sorry!
… tests (llvm#87094) This also adds a few tests that were missing.
…89625) This reverts commit 8b2ba6a. The uild errors (see below) were likely due to the same issue PR llvm#88074 fixed. Addressed by following that PR. https://lab.llvm.org/buildbot/#/builders/165/builds/52789 https://lab.llvm.org/buildbot/#/builders/91/builds/25273
When responding to review comments, `return {}` was accidentally replaced by `std::nullptr` instead of `return std::nullptr`.
…lvm#89263) This implements a RISCV specific version of the SHL_ADD node proposed in llvm#88791. If that lands, the infrastructure from this patch should seamlessly switch over the to generic DAG node. I'm posting this separately because I've run out of useful multiply strength reduction work to do without having a way to represent MUL X, 3/5/9 as a single instruction. The majority of this change is moving two sets of patterns out of tablgen and into the post-legalize combine. The major reason for this is that I have an upcoming change which needs to reuse the expansion logic, but it also helps common up some code between zba and the THeadBa variants. On the test changes, there's a couple major categories: * We chose a different lowering for mul x, 25. The new lowering involves one fewer register and the same critical path, so this seems like a win. * The order of the two multiplies changes in (3,5,9)*(3,5,9) in some cases. I don't believe this matters. * I'm removing the one use restriction on the multiply. This restriction doesn't really make sense to me, and the test changes appear positive.
This PR massively reorganizes the Test dialect's source files. It moves manually-written op hooks into `TestOpDefs.cpp`, moves format custom directive parsers and printers into `TestFormatUtils`, adds missing comment blocks, and moves around where generated source files are included for types, attributes, enums, etc. into their own source file. This will hopefully help navigate the test dialect source code, but also speeds up compile time of the test dialect by putting generated source files into separate compilation units. This also sets up the test dialect to shard its op definitions, done in the next PR.
Fixes -Werror build after 40137ff.
…lvm#88156) A recent patch added an error message for whole optional dummy argument usage as optional arguments (third or later) to MAX and MIN when those names required type conversion, since that conversion only works when the optional arguments are present. This check shouldn't care about character lengths. Make it so.
…lvm#88184) When a symbol is known to be a procedure due to its being referenced as a function or subroutine, improve the error messages that appear if the symbol is also used as an object by attaching the source location of its procedural use. Also, for errors spotted in name resolution due to how a given symbol has been used, don't unconditionally set the symbol's error flag (which is otherwise generally a good idea, to prevent cascades of errors), so that more unrelated errors related to usage will appear.
llvm#88188) …er powers The code that folds exponentiation by an integer power can report a spurious overflow warning because it calculates one last unnecessary square of the base value. 10.**(+/-32) exposes the problem -- the value of 10.**64 is calculated but not needed. Rearrange the implementation to only calculate squares that are necessary. Fixes llvm#88151.
Addresses llvm#85984 Signed-off-by: Troy-Butler <squintik@outlook.com> Co-authored-by: Troy-Butler <squintik@outlook.com>
…ilt (llvm#89164) Currently, `check-gwp_asan` is added no matter its dependencies are built or not, this is wrong and will cause cmake error when scudo is not built. This patch includes the target in the dependencies check.
Constant folding had a CHECK on array subscript rank that should more gracefully handle a bad program with a subscript that is a matrix or higher rank. Fixes llvm#88112.
When a statement function in a nested scope has a name that clashes with a name that exists in the host scope, the compiler can handle it correctly (with a portability warning)... unless the host scope acquired the name via USE association. Fix. Fixes llvm#88678.
This replaces the old macros LIBC_COPT_TEST_USE_FUCHSIA and LIBC_COPT_TEST_USE_PIGWEED with LIBC_COPT_TEST_ZXTEST and LIBC_COPT_TEST_GTEST, respectively. These are really not about whether the code is in the Fuchsia build or in the Pigweed build, but just about what test framework is being used. The gtest framework can be used in many contexts, and the zxtest framework is not always what's used in the Fuchsia build. The test/UnitTest/Test.h wrapper header now provides the macro LIBC_TEST_HAS_MATCHERS() for use in `#if` conditionals on use of gmock-style matchers, replacing `#if` conditionals that test the framework selection macros directly.
…vm#89429) When the characteristics of a procedure depend on a procedure that hasn't yet been defined, the compiler currently emits an unconditional error message. This includes the case of a procedure whose characteristics depend, perhaps indirectly, on itself. However, in the case where the characteristics of a procedure are needed to resolve a generic, we should not emit an error for a hitherto undefined procedure -- either the call will resolve to another specific procedure, in which case the error is spurious, or it won't, and then an error will issue anyway. Fixes llvm#88677.
Rewrite `LLVM_PARALLEL_{}_JOBS` and `LLVM_RAM_PER_{}_JOB` documentation.
The standard defines C_LOC as being PURE (actually SIMPLE now in F'2023); characterize it appropriately. Fixes llvm#88747.
The intrinsic function OUT_OF_RANGE() lacks support in lowering and the runtime. This patch obviates a need for any such support by implementing OUT_OF_RANGE() via rewriting in semantics. This rewriting of OUT_OF_RANGE() calls replaces the existing code that folds OUT_OF_RANGE() calls with constant arguments. Some changes and fixes were necessary outside of OUT_OF_RANGE()'s folding code (now rewriting code), whose testing exposed some other issues worth fixing. - The common::RealDetails<> template class was recoded in terms of a new base class with a constexpr constructor, so that the the characteristics of the various REAL kinds could be queried dynamically as well. This affected some client usage. - There were bugs in the code that folds TRANSFER() when the type of X or MOLD was REAL(10) -- this is a type that occupies 16 bytes per element in execution memory but only 10 bytes (was 12) in the data of std::vector<Scalar<>> in a Constant<>. - Folds of REAL->REAL conversions weren't preserving infinities.
Trying to address the build failure on the `clang-ve-ninja`bot, which appears hard to repro locally. The target isn't needed currently (there are unit tests exercising the new functionality). Removing it for now to green-ify the build bot.
It has been using isZeroValue(), which is for floats, not integers.
…Control. Updates ExecutionSession to use the ExecutorProcessControl object's TaskDispatcher rather than having a separate dispatch function. This gives the TaskDispatcher a global view of all tasks to be executed, and provides a single point to wait on for tasks to complete when shutting down the JIT.
A recent patch had three declared but unused variables in it, triggering a warning in some build bots. Remove them.
…rProcessControl." This reverts commit 6094b3b. Multiple bots are broken.
This patch introduces HWASan memaccess intrinsics that assume a fixed shadow (with the offset provided by --hwasan-mapping-offset=...), with and without short granule support. The behavior of HWASan is not meaningfully changed by this patch; future work ("Optimize outlined memaccess for fixed shadow on Aarch64": llvm#88544) will make HWASan use these intrinsics. We currently only support lowering the LLVM IR intrinsic to AArch64. The test case is adapted from hwasan-check-memaccess.ll.
…e tests to fail on multiple bots. (llvm#89689) Update the check lines added in llvm#87247 after 14e6f63 updated the output causing the tests to fail. This should hopefully unbreak the bots failing due to these two tests failing.
…ties (llvm#89119) Made the createReadOrMaskedRead and isValidMaskedInputVector utility functions - to be accessible outside of the CU. Needed by the IREE new TopK implementation.
to reflect that there are three variants.
…m#89211) This patch updates verifiers for `omp.ordered`, `omp.ordered.region`, `omp.cancel` and `omp.cancellation_point`, which check for a parent `omp.wsloop`. After transitioning to a loop wrapper-based approach, the expected direct parent will become `omp.loop_nest` instead, so verifiers need to take this into account. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.
This patch makes changes to the `scf.parallel` to `omp.parallel` + `omp.wsloop` lowering pass in order to introduce a nested `omp.loop_nest` as well, and to follow the new loop wrapper role for `omp.wsloop`. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.
…9214) This patch introduces minimal changes to the MLIR to LLVM IR translation of `omp.wsloop` to support the loop wrapper approach. There is `omp.loop_nest` related translation code that should be extracted and shared among all loop operations (e.g. `omp.simd`). This would possibly also help in the addition of support for compound constructs later on. This first approach is only intended to keep things running after the transition to loop wrappers and not to add support for other use cases enabled by that transition. This PR on its own will not pass premerge tests. All patches in the stack are needed before it can be compiled and passes tests.
This patch updates lowering from PFT to MLIR of workshare loops to follow the loop wrapper approach. Unit tests impacted by this change are also updated. As the last patch of the stack, this should compile and pass unit tests.
Semantics usually fold SHAPE into an array constructor, but sometimes it cannot (like when the source is a function result that cannot be duplicated in expression analysis). Add lowering handling for shape.
…lane.mask (llvm#89068) When SVE is available we can lower calls to get.active.lane.mask using the SVE whilelo instruction, however in practice since vXi1 types are not legal for NEON we often end up expanding the predicate into a vector of integers, e.g. v4i1 -> v4i32. This usually happens when we have to keep the predicate live out of the block, for example when the predicate is the incoming value to a PHI node in a tail-folded vector loop. Currently in such cases the intrinsic call has a cost of 1, which is far too low when considering the extra instructions required to expand the predicate. This patch fixes that by basing the cost on the number of lane moves required for expansion. This is required for a follow-on patch that adds the cost of the intrinsic call to the vectorisation cost model, so that we can teach the vectoriser to make better choices.
…nctionDefinition (llvm#89801) In the lambda function within clang::Sema::InstantiateFunctionDefinition, the return value of a function that may return null is now checked before dereferencing to avoid potential null pointer dereference issues which can lead to crashes or undefined behavior in the program.
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. Fixes: SWDEV-438421
…th r2 and pre-R2 (llvm#89881) About unsigned max/min, ANDi is available for all ISA revisions in extend before slt insn. So that we can reduce one instruction.
…ructions (llvm#89867) Since the requirement is EEW=32, it's impossible that EGW=128 needs LMUL=8.
skatrak
requested review from
gregrodgers,
ergawy,
jsjodin,
abidh,
mjklemm,
raghavendhra,
agozillon,
DominikAdamski,
dpalermo,
TIFitis and
kparzysz
May 3, 2024 14:28
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This merge brings upstream changes to the representation of OpenMP loop-related operations to the wrapper approach, as well as refactoring of the handling of combined and composite constructs during the PFT to MLIR lowering pass in Flang. The side effects of these changes that needed addressing by this merge are the following:
omp.wsloop
,omp.simd
,omp.distribute
,omp.taskloop
andomp.parallel
(only if part of a composite construct in the case ofomp.parallel
). This means that Flang's early privatization had to be updated to create these allocations outside of all loop wrappers. The unfortunate result of this is that, at the end of the compilation flow, some unused copies of private variables end up being passed in as arguments to outlined functions. This is expected to be addressed by transitioning to the delayed privatization approach, which is still under development.omp.parallel
operation can represent a regular block construct but it can also represent a leaf construct on a composite construct, taking a loop wrapper role. For its first use case, it needs to have theOutlineableOpenMPOpInterface
MLIR trait, which is used by some Flang MLIR passes to search for a spot to insert allocations. However, when it's taking a loop wrapper role, allocations must be hoisted outside all wrappers. The solution proposed here is to update these passes to check for this situation, but it kind of defeats the purpose of theOutlineableOpenMPInterface
. Long term, it might be preferable to create separate MLIR operations for block and wrapper versions ofomp.parallel
or dynamically attach which interfaces apply to each instance depending on its role, if possible.DataSharingProcessor
were necessary.DoConcurrentConversion
pass needed to be slightly updated to create the loop wrapper andomp.loop_nest
operations, as expected by the new loop representation approach.SingleBlockImplicitTerminator
trait. Sometimes they would be skipped and other times they would not be, especially when multiple nested operations having it were nested one inside of the other. This made theLoopWrapperInterface::isWrapper
method mistakenly return false whenever the terminator was skipped. The decision made to address this was to mark all wrappers with theSingleBlock
trait instead, and to update tests to always explicitly specify the terminator operation. It's possible to update theLoopWrapperInterface::isWrapper
method instead to consider the option of not finding a terminator, but I think it's best to have a single consistent representation rather than playing with "syntactic sugar" and complicating some functions as a result.omp.loop_nest
is still replicated inconvertOmpWsloop
andconvertLoopNestHelper
, and composite constructs are currently handled in a rather unintuitive way.After applying these changes, all check-mlir, check-flang, check-offload, check-llvm Lit tests are passing locally, and smoke-fort tests are passing as well. To be tested is whether there are any significant performance regressions as a result of passing unused variables to kernels, and checking whether kernel attributes (num teams, num threads, etc) are properly populated in all cases. I have noticed that now implicit map clauses are being introduced for variables that are also explicitly mapped, which doesn't seem to introduce any errors but we should also address.