Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/fused-ops #55

Closed
wants to merge 3,330 commits into from
Closed

feature/fused-ops #55

wants to merge 3,330 commits into from

Conversation

mgehre-amd
Copy link
Collaborator

Not for merging; just convenience to look at our changes.

mgehre-amd and others added 30 commits August 15, 2024 14:46
Bump with fixes to fa6e4338 (6)
josel-amd and others added 29 commits October 28, 2024 16:47
OpaqueType: Use format string
Refactored @Max191's PR llvm#94637
to move it to `Tensor`

From the original PR
>This PR adds fusion by expansion patterns to push a tensor.expand_shape
up through a tensor.collapse_shape with non-intersecting reassociations.
Sometimes parallel collapse_shape ops like this can block propagation of
expand_shape ops, so this allows them to pass through each other.

I'm not sure if I put the code/tests in the right places, so let me know
where those go if they aren't.

cc @MaheshRavishankar @hanhanW

---------

Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
Add missing `getIterationDomainTileFromOperandTile` and `getTiledImplementationFromOperandTile` to `tensor.pack` and enable fusing it as a consumer. NOTE that, it only expects perfect tiling scenario without padding semantic currently.
…#96184)

In order to support arbitrary size input data of conv2d, implement
TilingInterface for winograd operations. Before converting winograd
operations into nested loops with matrix multiply, tile the input of
conv2d into the supported size first.

Add a transform operation structured.decompose_winograd_op to decompose
winograd operations. Before applying the transform op, use
tile_using_for to tile the input data into supported size. The test case
shows how to tile and decompose winograd operations.
…to continue tile + fuse. (llvm#107882)

Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.

This is a breaking change

- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.

Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
…m#109554)

The SCF helper for tiling an operation implementing the TilingInterface
and greedily fusing consumers requires an uninterrupted chain of
operations implementing the tiling interface to succeed. There can be
cases with intermediate ops that don't implement the interface but have
producers that could be fused if various canonicalization/simplification
patterns could run in between fusion steps.

This adds an option to SCFTileAndFuseOptions for a pattern set to run
between fusion steps to the ops that result from fusion/tiling. Removed
and newly inserted slices are tracked for continued fusion applications.

See this RFC for more discussion:

https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
The auto-generated builder created an emitc.tu that had an empty region.
This is a bit cumbersome to work with, as you would always manually
needed to create a block in it.
Do what ModuleOp::build does and always create that block.

Also accept StringRef as argument for id instead of requiring a StringAttr.
`#include` make sense everywhere, and in particular we need to allow them inside a `emitc.tu`.
But sometimes we might even want to have an `#include` in a function body.
emitc.include: don't require the parent to be a ModuleOp
emitc.tu: Automatically create block for body
…ape_fold

fix: fuse locations of double reshapes when folding.
Backport various improvements to fusion from upstream
…ations

feat: improve CSE by fusing locations when replacing one op for the other.
Make EliminateLibm work on EmitC::FuncOp
emitc: Do not add newlines after ModuleOp, TranslationUnitOp
* Fix conversion for scf.for and scf.if
* Add -mlir-reproducer-before-all
…403)

* Include comments with template arg names in Cpp code from EmitC

* Apply suggestions from code review

Co-authored-by: Corentin Ferry <corentin.ferry@amd.com>
Co-authored-by: Matthias Gehre <matthias.gehre@amd.com>

* Test for the presence of template arg names when there are no template args

---------

Co-authored-by: Corentin Ferry <corentin.ferry@amd.com>
Co-authored-by: Matthias Gehre <matthias.gehre@amd.com>
* Readability: Add option to emit constants values in place, instead of producing dedicated variables.
[AutoBump] Merge with 894d3ee (Aug 15) (4)
[AutoBump] Merge with fixes of 2d50029 (Aug 15, needs torch-mlir bump) (5)
@mgehre-amd mgehre-amd closed this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.