Add ONNXToLinalg Conversion #1848

chentong319 · 2022-11-09T20:12:49Z

I found that the proposal from Microsoft of using Linalg worth of exploring. The Linalg has some good features and there are existing optimization passes and backend.
I am working on a draft to add passes to lower some ONNX Op to Linalg while the current lowering to knrl is kept working. The passes will look like:

existing passes
1. ONNXToLinalg 
2. ONNXToKrnl
3. KrnlToAffine
4. LinalgToAffine
existing passes

The order of 1 and 2, and that of 3 and 4 may be switchable.
In my experiment, I will only translate one ONNX op (currently, ONNXMatMul is chosen for simplicity) to Linalg. I will use memref for Linalg Op. I feel it may be easier to incorporate ONNX shape inference result for allocation and ONNXToKrnl conversion to lower to memref, instead of using the Linalg detensor pass. Will this decision be a problem for future optimization?

If the framework is set up, collaboration is needed to implement conversion of more ONNX Ops to Linalg. Which lowering should be applied to each ONNX Ops may be controlled by options and restricted by expressiveness of dialect. If the conversion to Linalg is disabled, onnx-mlir works as it does now.

Comments are welcomed.

The text was updated successfully, but these errors were encountered:

ashay · 2022-11-16T21:30:32Z

This is great! The fact that this translation enables the use of existing upstream passes is a huge plus.

But in the same spirit, how do you feel about lowering to linalg-on-tensor instead of memref and then use the upstream bufferization passes to lower to memref? I worry that in trying to lower to memref directly, the ONNXToLinalg pass might get too complicated.

chentong319 · 2022-11-17T03:09:07Z

I think it is doable to lower to linalg-on-tensor. There are two possible paths:

If onnx is lowered to krnl before detensor of Linagl, we just need to add bufferization.to_memref for input to krnl Ops if the input comes from Linagl.
If detensor of Linagl is called before onnx is lowered to krnl, I think the bufferization pass will add bufferization.to_tensor for onnx Ops automatically.
It seems to me that no extra work is needed for the second path. But we can try both. Adding to_memref and to_tensor for krnl is needed to handle IR with mixed tensor-level dialects.

chentong319 · 2022-12-02T02:07:05Z

@sstamenova
I did an experiment (#1891 ) to lower onnx.MatMulOp to Linalg.MatmulOp at tensor level and then called the Linalg bufferization pass to convert tensor to memref and the Linalg to affine pass to convert Linalg to affine. With some change in lowering onnx to Krnl, the compilation can reach the lowering to LLVM stage. I ran into one issue: I did not find an existing pass to lower bufferization.alloc_tensor to memref.alloc. I do not think I need to lower that op by myself though it is straightforward. Does anyone know the solution?
Should we discuss the ONNXToLinalg conversion at the meeting on next Tuesday?

chentong319 · 2022-12-02T02:44:51Z

I put some results of #1891 here.
Model:

func.func @matmul(%arg0 : tensor<2x3xf32>, %arg1 : tensor<3x4xf32>) -> tensor<4x2xf32> {
  %1 = "onnx.MatMul"(%arg0, %arg1) : (tensor<2x3xf32>, tensor<3x4xf32>) -> tensor<2x4xf32>
  %2 = "onnx.Transpose"(%1) {perm = [1, 0]} : (tensor<2x4xf32>) -> tensor<4x2xf32>
  return %2 : tensor<4x2xf32>
}

After Lowering to Linalg

// -----// IR Dump After onnx_mlir::ONNXToLinalgLoweringPass (convert-onnx-to-linalg) //----- //
module attributes {llvm.data_layout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-apple-darwin20.3.0"} {
  func.func @matmul(%arg0: tensor<2x3xf32>, %arg1: tensor<3x4xf32>) -> tensor<4x2xf32> {
    %0 = tensor.empty() : tensor<2x4xf32>
    %1 = linalg.matmul ins(%arg0, %arg1 : tensor<2x3xf32>, tensor<3x4xf32>) outs(%0 : tensor<2x4xf32>) -> tensor<2x4xf32>
    %2 = "onnx.Transpose"(%1) {perm = [1, 0]} : (tensor<2x4xf32>) -> tensor<4x2xf32>
    return %2 : tensor<4x2xf32>
  }
}

After Krnl:

// -----// IR Dump After onnx_mlir::FrontendToKrnlLoweringPass (convert-onnx-to-krnl) //----- //
module attributes {llvm.data_layout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-apple-darwin20.3.0"} {
  func.func @matmul(%arg0: memref<2x3xf32>, %arg1: memref<3x4xf32>) -> memref<4x2xf32> {
    %0 = bufferization.to_tensor %arg1 : memref<3x4xf32>
    %1 = bufferization.to_tensor %arg0 : memref<2x3xf32>
    %2 = tensor.empty() : tensor<2x4xf32>
    %3 = linalg.matmul ins(%1, %0 : tensor<2x3xf32>, tensor<3x4xf32>) outs(%2 : tensor<2x4xf32>) -> tensor<2x4xf32>
    %4 = bufferization.to_memref %3 : memref<2x4xf32>
    %c4 = arith.constant 4 : index
    %c2 = arith.constant 2 : index
    %alloc = memref.alloc() {alignment = 16 : i64} : memref<4x2xf32>
    %5:2 = krnl.define_loops 2
    %c0 = arith.constant 0 : index
    %c2_0 = arith.constant 2 : index
    %c4_1 = arith.constant 4 : index
    krnl.iterate(%5#0, %5#1) with (%5#0 -> %arg2 = 0 to 2, %5#1 -> %arg3 = 0 to 4){
      %6:2 = krnl.get_induction_var_value(%5#0, %5#1) : (!krnl.loop, !krnl.loop) -> (index, index)
      %7 = krnl.load %4[%6#0, %6#1] : memref<2x4xf32>
      krnl.store %7, %alloc[%6#1, %6#0] : memref<4x2xf32>
    }
    return %alloc : memref<4x2xf32>
  }
}

After Linalg bufferization:

// -----// IR Dump After LinalgBufferize (linalg-bufferize) //----- //
func.func @matmul(%arg0: memref<2x3xf32>, %arg1: memref<3x4xf32>) -> memref<4x2xf32> {
  %0 = bufferization.to_tensor %arg1 : memref<3x4xf32>
  %1 = bufferization.to_tensor %arg0 : memref<2x3xf32>
  %2 = tensor.empty() : tensor<2x4xf32>
  %3 = bufferization.to_memref %2 : memref<2x4xf32>
  %alloc = memref.alloc() {alignment = 128 : i64} : memref<2x4xf32>
  memref.copy %3, %alloc : memref<2x4xf32> to memref<2x4xf32>
  %4 = bufferization.to_tensor %alloc : memref<2x4xf32>
  linalg.matmul ins(%arg0, %arg1 : memref<2x3xf32>, memref<3x4xf32>) outs(%alloc : memref<2x4xf32>)
  %5 = bufferization.to_tensor %alloc : memref<2x4xf32>
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %alloc_0 = memref.alloc() {alignment = 16 : i64} : memref<4x2xf32>
  %6:2 = krnl.define_loops 2
  %c0 = arith.constant 0 : index
  %c2_1 = arith.constant 2 : index
  %c4_2 = arith.constant 4 : index
  krnl.iterate(%6#0, %6#1) with (%6#0 -> %arg2 = 0 to 2, %6#1 -> %arg3 = 0 to 4){
    %7:2 = krnl.get_induction_var_value(%6#0, %6#1) : (!krnl.loop, !krnl.loop) -> (index, index)
    %8 = krnl.load %alloc[%7#0, %7#1] : memref<2x4xf32>
    krnl.store %8, %alloc_0[%7#1, %7#0] : memref<4x2xf32>
  }
  return %alloc_0 : memref<4x2xf32>
}

After both Linalg and Krnl to affine

func.func @matmul(%arg0: memref<2x3xf32>, %arg1: memref<3x4xf32>) -> memref<4x2xf32> attributes {llvm.emit_c_interface} {
  %0 = bufferization.alloc_tensor() : tensor<2x4xf32>
  %1 = bufferization.to_memref %0 : memref<2x4xf32>
  %alloc = memref.alloc() {alignment = 128 : i64} : memref<2x4xf32>
  memref.copy %1, %alloc : memref<2x4xf32> to memref<2x4xf32>
  affine.for %arg2 = 0 to 2 {
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 3 {
        %2 = affine.load %arg0[%arg2, %arg4] : memref<2x3xf32>
        %3 = affine.load %arg1[%arg4, %arg3] : memref<3x4xf32>
        %4 = affine.load %alloc[%arg2, %arg3] : memref<2x4xf32>
        %5 = arith.mulf %2, %3 : f32
        %6 = arith.addf %4, %5 : f32
        affine.store %6, %alloc[%arg2, %arg3] : memref<2x4xf32>
      }
    }
  }
  %alloc_0 = memref.alloc() {alignment = 16 : i64} : memref<4x2xf32>
  affine.for %arg2 = 0 to 2 {
    affine.for %arg3 = 0 to 4 {
      %2 = affine.load %alloc[%arg2, %arg3] : memref<2x4xf32>
      affine.store %2, %alloc_0[%arg3, %arg2] : memref<4x2xf32>
    }
  }
  return %alloc_0 : memref<4x2xf32>
}

sstamenova · 2022-12-05T17:58:15Z

We have a team even this Tuesday, so we won't be able to attend. However, we can do this the Tuesday following.

chentong319 · 2022-12-06T00:49:20Z

We have a team even this Tuesday, so we won't be able to attend. However, we can do this the Tuesday following.

Let's try Dec 13 (Tuesday).

hunterzju · 2023-10-23T14:22:45Z

It is a good idea, if onnx is lowered to linalg, we can do tiling and packing in linalg. And how is the current progress?

AlexandreEichenberger · 2023-10-24T23:59:54Z

Microsoft was looking into this, I have not heard much on this front in a while.

ashay · 2023-10-25T15:17:54Z

I can't speak on behalf of the Microsoft folks, but it's now possible to convert ONNX to StableHLO and use the changes in openxla/stablehlo#1817 to lower to LinAlg. The ONNX-to-StableHLO conversion has some gaps, so it needs some work, but most ONNX operations go through without issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNXToLinalg Conversion #1848

Add ONNXToLinalg Conversion #1848

chentong319 commented Nov 9, 2022

ashay commented Nov 16, 2022

chentong319 commented Nov 17, 2022 •

edited

Loading

chentong319 commented Dec 2, 2022 •

edited

Loading

chentong319 commented Dec 2, 2022

sstamenova commented Dec 5, 2022

chentong319 commented Dec 6, 2022

hunterzju commented Oct 23, 2023

AlexandreEichenberger commented Oct 24, 2023

ashay commented Oct 25, 2023

Add ONNXToLinalg Conversion #1848

Add ONNXToLinalg Conversion #1848

Comments

chentong319 commented Nov 9, 2022

ashay commented Nov 16, 2022

chentong319 commented Nov 17, 2022 • edited Loading

chentong319 commented Dec 2, 2022 • edited Loading

chentong319 commented Dec 2, 2022

sstamenova commented Dec 5, 2022

chentong319 commented Dec 6, 2022

hunterzju commented Oct 23, 2023

AlexandreEichenberger commented Oct 24, 2023

ashay commented Oct 25, 2023

chentong319 commented Nov 17, 2022 •

edited

Loading

chentong319 commented Dec 2, 2022 •

edited

Loading