Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export ONNX QOperator #882

Open
surajpandey353 opened this issue Feb 27, 2024 · 5 comments
Open

Export ONNX QOperator #882

surajpandey353 opened this issue Feb 27, 2024 · 5 comments

Comments

@surajpandey353
Copy link

Hi Team Brevitas,

I trying a simple toy model to check how the exported onnx model with QOps looks like. As per the ONNX_export_tutorial.ipynb, you can pass the quantized input to a QuantIdentity layer with attribute return_quant_tensor = True or alternatively set input_quant = Uint8ActPerTensorFloat. I have following toy model,

import brevitas.nn as qnn

model = nn.ModuleList()
model.append(qnn.QuantConv2d(514, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=Uint8ActPerTensorFloat, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 256, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=True))
model.append(qnn.QuantReLU(return_quant_tensor = True))
model.append(qnn.QuantConv2d(256, 514, kernel_size=1,
                                                         weight_quant= Int8WeightPerChannelFloat, 
                                                         input_quant=None, 
                                                         output_quant= Uint8ActPerTensorFloat, 
                                                         bias_quant = Int32Bias,
                                                         return_quant_tensor=False))

Ideally as per the model definition, there should one QuantizeLinear before the first layer and no dequantization in between the model. DeQuantizeLinear should be at the end of the layer since the return_quant_tensor = False in the last layer.

But the graph visualization with netron gives DeQuantizeLinear before every QuantReLU op, which I find it weird behaviour, since it receives quant_tensor as input and returns quant tensor. But if I skip the ReLU activation between the convolutions I am getting the right graph with QuantizeLinear before the first layer and DeQuantizeLinear at the end of the last layer.

Dependencies
torch 1.13.0
brevitas 0.10.2

Could someone explain why is this behaviour, is it intended or there is something wrong the way I have defined the model.

Thanks!

@Giuseppe5
Copy link
Collaborator

Would you be able to provide the full script to generate the onnx model?

I know that's just few more lines with respect to what you have already put here, but it's just to be sure to replicate exactly what you see.

Many thanks!

@surajpandey353
Copy link
Author

Hi @Giuseppe5,

Here is minimum reproducible code :
toy_model.ipynb.zip

@Giuseppe5
Copy link
Collaborator

Thanks for sharing!
The structure that you see is due to the fact that in ONNX there is no real QuantReLU op, instead we need to rely on the floating point version of it. This means that the output of QLinearConv has to be dequantized and then re-quantized before we get into the next QLinearConv.

I hope this explains this behaviour.

@Barrot
Copy link

Barrot commented Mar 13, 2024

Thanks for sharing! The structure that you see is due to the fact that in ONNX there is no real QuantReLU op, instead we need to rely on the floating point version of it. This means that the output of QLinearConv has to be dequantized and then re-quantized before we get into the next QLinearConv.

I hope this explains this behaviour.

The regular ONNX ReLU supports float and integer values: https://github.com/onnx/onnx/blob/main/docs/Operators.md#relu

@Giuseppe5
Copy link
Collaborator

In general, we are working to deprecate support of QOp in favor of QCDQ (#834), so we probably won't change this behavior.

Sorry for any inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants