Unify accelerator and non-accelator paths #1544

djramic · 2024-06-06T02:17:36Z

This PR is created for tracking the unification paths ticket.
The current commit is focused on the threadwise unification.
NormalizeView and TransformingForOp components remain non-unified(hardcoded).
For normalizeView, the kPack parameter is problematic, whereas for TransfromingForOp the differing indexing methond is the concern.
I am currently investigating whether they should be integrated through the AccelEmitter or if there is a possibility for refactoring them through a blockwise path unification.

krzysz00 · 2024-06-06T19:35:38Z

I'll note that kpack should actually be used in the non-accelerator path

krzysz00 · 2024-06-06T19:36:29Z

Like, while you're here, the FMA/dot product-based path should absolutely handle kpack

And if there isn't room for it in the perf config, a new perf config version might be in order

krzysz00 · 2024-06-06T19:43:10Z

The fundamental observation, IMO, is that a a FMA is a 1 x kbase * kbase x 1 multiplication (I say K here because this could also be easily extended to dot products, so you'd have, for example, kbase = 2 for fp16.

Then, with that, you can use the general accelerator lowering, which is pretty independent of exactly what the accelerator does.

(For practical reasons, you might want to announce the accelerator as mPerThread x kbase x nPerThread ... but also, mRepeats and nRepeats exist for a reason and ought to cover most of this)

djramic · 2024-06-07T12:42:46Z

I plan to start by unifying the lowering paths in the way they are currently implemented, in order to get a better understanding of each lowerting level. And to make the verification easier, using existing tests. After that, I can make make additional refactorisation and improvement(We can open a separate issue based on you comments above.) I spirit of my current aproach do you have any aditional feedback? I would wery much appreciate it. I'm not quite sure I'm on the right path.

krzysz00 · 2024-06-06T19:37:11Z

mlir/include/mlir/Dialect/Rock/IR/AccelEmitter.h

+  void emitThreadwiseLoop(OpBuilder &b, Location loc, Value argA, Value argB,
+                          Value bufferC, ValueRange regCOffset) override;
+
+ virtual Value


You don't need to repeat virtual here

krzysz00 · 2024-06-06T19:37:34Z

mlir/include/mlir/Dialect/Rock/IR/FmaInsnGroup.h

+} // namespace rock
+} // namespace mlir
+
+#endif // MLIR_FMA_INSN_GROUP_H


Nit: newline

krzysz00 · 2024-06-18T14:32:51Z

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

+                   StrAttr:$arch,
+                   Rock_GemmFeaturesAttr:$features,
+                   RockTuningParamAttrInterface:$params)> {
+  let assemblyFormat = [{


This is a temporary thing, right? Why's this needed?

The idea is that this OP will replace threadwise_gemm and threadwise_accel_gemm. I am creating a new operation so I don't have to delete the existing code.

krzysz00 · 2024-06-18T14:35:52Z

mlir/lib/Dialect/Rock/utility/AccelEmitter.cpp

@@ -130,27 +131,110 @@ Value AccelEmitter::generateThreadwiseViewBufferC(PatternRewriter &b,
  return viewC;
 }

+// **************************


Weird comment format

... though that's file style. Maybe worth fixing to the //=---------------=// style seen elsewhere in the codebase both here and below

krzysz00 · 2024-06-18T14:36:17Z

mlir/lib/Dialect/Rock/utility/AccelEmitter.cpp

+void FmaEmitter::emitThreadwiseLoop(OpBuilder &b, Location loc, Value argA, Value argB,
+                          Value bufferC, ValueRange regCOffset){
+
+      Type dataType = fmaInsn.argTypeA;


I think there's a git clang-format main in order

krzysz00 · 2024-06-18T14:39:39Z

mlir/lib/Dialect/Rock/utility/AccelEmitter.cpp

+      int64_t inNPerThread, bool doSwapThreadIterSubDimsForM,
+      bool doSwapThreadIterSubDimsForN){
+
+  //TO-DO


This should be reasonably straightforward?

krzysz00 · 2024-06-18T14:58:20Z

mlir/lib/Dialect/Rock/Transforms/ThreadwiseGemmLowering.cpp

+    TopDownTMBuilder td(b, names, sizes, loc);
+    // Convert the normalizedView to a real view by ignoring
+    // the names contained in `ignoreNames` and letting the rest pass through
+    for (pos = 0; pos < names.size(); pos++) {


Nit: move unsigned pos into the for loop.

... bigger nit: auto [pos, name] : llvm::enumerate(names))

krzysz00 · 2024-06-18T14:59:06Z

mlir/lib/Dialect/Rock/Transforms/ThreadwiseGemmLowering.cpp

+    // Loop properties
+    auto computeStart = llvm::to_vector(op.getComputeIndices());
+
+    if(isMfma || isWmma){


Why? Don't WMMA and MFMA go down threadwise_gemm?

This is temporary for parts that are not unified yet. The goal is to have it removed at the end of the unification paths.

krzysz00 · 2024-06-18T15:00:02Z

mlir/lib/Dialect/Rock/Transforms/ThreadwiseGemmLowering.cpp

+        OpBuilder::InsertionGuard guard(b);
+        b.setInsertionPointToStart(gemmLoop.getBody());
+
+        auto coordsA = gemmLoop.getLowerCoords(/*domain=*/0);


Why do you need this case? What prevents you from using emitAccelLoop here?

The problem is caused by different ways of creating gemmLoop and accelLoop TransformingForOp. I want to try to equalize these approaches by refactoring the blockwise level.

Working on threadwise unification

5630edd

djramic requested a review from krzysz00 June 6, 2024 02:17

djramic requested review from jerryyin and sjw36 as code owners June 6, 2024 02:17

krzysz00 reviewed Jun 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify accelerator and non-accelator paths #1544

Unify accelerator and non-accelator paths #1544

djramic commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

djramic commented Jun 7, 2024

krzysz00 Jun 6, 2024

krzysz00 Jun 6, 2024

krzysz00 Jun 18, 2024

djramic Jun 24, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

krzysz00 Jun 18, 2024

djramic Jun 24, 2024

krzysz00 Jun 18, 2024

djramic Jun 24, 2024

Unify accelerator and non-accelator paths #1544

Are you sure you want to change the base?

Unify accelerator and non-accelator paths #1544

Conversation

djramic commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

krzysz00 commented Jun 6, 2024

djramic commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment