Export ONNX QOperator #882

surajpandey353 · 2024-02-27T13:45:48Z

Hi Team Brevitas,

I trying a simple toy model to check how the exported onnx model with QOps looks like. As per the ONNX_export_tutorial.ipynb, you can pass the quantized input to a QuantIdentity layer with attribute return_quant_tensor = True or alternatively set input_quant = Uint8ActPerTensorFloat. I have following toy model,

import brevitas.nn as qnn

model = nn.ModuleList()
model.append(qnn.QuantConv2d(514, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=Uint8ActPerTensorFloat, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 514, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=False))

Ideally as per the model definition, there should one QuantizeLinear before the first layer and no dequantization in between the model. DeQuantizeLinear should be at the end of the layer since the return_quant_tensor = False in the last layer.

But the graph visualization with netron gives DeQuantizeLinear before every QuantReLU op, which I find it weird behaviour, since it receives quant_tensor as input and returns quant tensor. But if I skip the ReLU activation between the convolutions I am getting the right graph with QuantizeLinear before the first layer and DeQuantizeLinear at the end of the last layer.

Dependencies
torch 1.13.0
brevitas 0.10.2

Could someone explain why is this behaviour, is it intended or there is something wrong the way I have defined the model.

Thanks!

Giuseppe5 · 2024-02-27T18:35:02Z

Would you be able to provide the full script to generate the onnx model?

I know that's just few more lines with respect to what you have already put here, but it's just to be sure to replicate exactly what you see.

Many thanks!

surajpandey353 · 2024-02-28T10:24:50Z

Hi @Giuseppe5,

Here is minimum reproducible code :
toy_model.ipynb.zip

Giuseppe5 · 2024-03-01T09:19:36Z

Thanks for sharing!
The structure that you see is due to the fact that in ONNX there is no real QuantReLU op, instead we need to rely on the floating point version of it. This means that the output of QLinearConv has to be dequantized and then re-quantized before we get into the next QLinearConv.

I hope this explains this behaviour.

Barrot · 2024-03-13T12:40:33Z

Thanks for sharing! The structure that you see is due to the fact that in ONNX there is no real QuantReLU op, instead we need to rely on the floating point version of it. This means that the output of QLinearConv has to be dequantized and then re-quantized before we get into the next QLinearConv.

I hope this explains this behaviour.

The regular ONNX ReLU supports float and integer values: https://github.com/onnx/onnx/blob/main/docs/Operators.md#relu

Giuseppe5 · 2024-03-22T14:45:13Z

In general, we are working to deprecate support of QOp in favor of QCDQ (#834), so we probably won't change this behavior.

Sorry for any inconvenience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export ONNX QOperator #882

Export ONNX QOperator #882

surajpandey353 commented Feb 27, 2024

Giuseppe5 commented Feb 27, 2024

surajpandey353 commented Feb 28, 2024

Giuseppe5 commented Mar 1, 2024

Barrot commented Mar 13, 2024

Giuseppe5 commented Mar 22, 2024

Export ONNX QOperator #882

Export ONNX QOperator #882

Comments

surajpandey353 commented Feb 27, 2024

Giuseppe5 commented Feb 27, 2024

surajpandey353 commented Feb 28, 2024

Giuseppe5 commented Mar 1, 2024

Barrot commented Mar 13, 2024

Giuseppe5 commented Mar 22, 2024