Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate QOp Export #834

Open
Giuseppe5 opened this issue Feb 8, 2024 · 4 comments · Fixed by #917
Open

Deprecate QOp Export #834

Giuseppe5 opened this issue Feb 8, 2024 · 4 comments · Fixed by #917

Comments

@Giuseppe5
Copy link
Collaborator

Giuseppe5 commented Feb 8, 2024

Although we will keep the interface to have layer-wise export handlers, we will be deprecating support to QOp in favour of QCDQ.

@Giuseppe5 Giuseppe5 changed the title Deprecatie QOp ONNX Export Deprecate QOp ONNX Export Feb 8, 2024
@Giuseppe5 Giuseppe5 changed the title Deprecate QOp ONNX Export Deprecate QOp Export Feb 12, 2024
@Barrot
Copy link

Barrot commented Feb 21, 2024

What is the reason for deprecation?

@Giuseppe5
Copy link
Collaborator Author

Generally, QCDQ is much easier to use given its flexibility, whilst ONNX and Torch QOp have several constraints about how the layer input, weights, and output should be quantized to work correctly.

Similarly, QCDQ is also much easier to support and work around compared to QOp.

@Barrot
Copy link

Barrot commented Feb 22, 2024

Thanks @Giuseppe5

@Giuseppe5 Giuseppe5 linked a pull request Mar 22, 2024 that will close this issue
14 tasks
@prathameshd8
Copy link

prathameshd8 commented Jul 16, 2024

Hi @Giuseppe5,

I have tried both QCDQ and QOp ONNX export. Indeed QCDQ provides a great flexibility in order to export the models to ONNX, whereas for the QOp export one has to consider a lot of constraints.

However, in order to perform full-integer inference by generating C code with the help of frameworks such as TVM, QCDQ adds several Quantize and Dequantize nodes in the ONNX graph, where all the computation essentially happens in floating points.

In this case where you want to perform a full-integer inference, QOp worked quite well, as the integer tensors are passed on to the next layer if you set return_quant_tensor=True, while defining the QuantLayer. Furthermore, one can see that in the C code generated, the computations are performed on integers as expected.

Since, QOp Export will be deprecated, is there any way with QCDQ export, one can perform a full-integer inference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants