Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT fails to build engine from pytorch_quantization ONNX #3577

Closed
talcs opened this issue Dec 31, 2023 · 18 comments
Closed

TensorRT fails to build engine from pytorch_quantization ONNX #3577

talcs opened this issue Dec 31, 2023 · 18 comments
Assignees
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers

Comments

@talcs
Copy link

talcs commented Dec 31, 2023

Description

I created a quantized model in pytorch using pytorch_quantization and exported it to ONNX.
Then, I executed the following command on Jetson Orin:

/usr/src/tensorrt/bin/trtexec --onnx=model_quantized.onnx --int8 --saveEngine=model_quantized.trt

Here is part of the trtexec output that includes the error:

[12/31/2023-11:17:12] [I] Start parsing network model
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [I] [TRT] Input filename:   model_quantized.onnx
[12/31/2023-11:17:12] [I] [TRT] ONNX IR version:  0.0.7
[12/31/2023-11:17:12] [I] [TRT] Opset version:    13
[12/31/2023-11:17:12] [I] [TRT] Producer name:    pytorch
[12/31/2023-11:17:12] [I] [TRT] Producer version: 1.12.1
[12/31/2023-11:17:12] [I] [TRT] Domain:           
[12/31/2023-11:17:12] [I] [TRT] Model version:    0
[12/31/2023-11:17:12] [I] [TRT] Doc string:       
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/31/2023-11:17:13] [I] Finish parsing network model
[12/31/2023-11:17:13] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[12/31/2023-11:17:13] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes.
[12/31/2023-11:17:13] [E] Error[2]: [qdqGraphOptimizer.cpp::matchInt8ConstantDQ::3582] Error Code 2: Internal Error (onnx::QuantizeLinear_898: Int8 constant is only allowed before DQ node)
[12/31/2023-11:17:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[12/31/2023-11:17:13] [E] Engine could not be created from network
[12/31/2023-11:17:13] [E] Building engine failed
[12/31/2023-11:17:13] [E] Failed to create engine from model or file.
[12/31/2023-11:17:13] [E] Engine set up failed

The error refers to the node QuantizeLinear_898 and the error is Int8 constant is only allowed before DQ node.

Looking at the ONNX graph, I can see that there is a node related to QuantizeLinear_898 that has no input:

image

Any idea what went wrong and how to solve it?

Environment

Model compilation:

TensorRT Version: TensorRT v8502 (Jetson Orin)

Model quantization and export to ONNX:

OS: Windows 10
Python Version (if applicable): 3.9.12
PyTorch Version (if applicable): 1.12.1+cu116
pytorch_quantization version: 2.1.3

@talcs talcs changed the title TensorRT fails to build computational graph from pytorch_quantization TensorRT fails to build engine from pytorch_quantization ONNX Dec 31, 2023
@zerollzeng
Copy link
Collaborator

What is the datatype of QuantizeLinear_338's input x?

@zerollzeng zerollzeng self-assigned this Jan 1, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jan 1, 2024
@zerollzeng
Copy link
Collaborator

Does the onnx work with onnxruntime? This can be quickly checked by polygraphy run model.onnx --onnxrt

Check https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy for more info.

@talcs
Copy link
Author

talcs commented Jan 7, 2024

Yes, it passed:

> polygraphy run quantized_model.onnx --onnxrt
[I] RUNNING | Command: D:\Anaconda3\Scripts\polygraphy run quantized_model.onnx --onnxrt
[I] onnxrt-runner-N0-01/07/24-13:36:18  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[E] Module: 'torch' version '1.12.1+cu116' is installed, but version '>=1.13.0' is required.
    Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
    Attempting to continue with the currently installed version of this module, but note that this may cause errors!
[I] onnxrt-runner-N0-01/07/24-13:36:18
    ---- Inference Input(s) ----
    {input [dtype=float32, shape=(<<removed content>>)]}
[I] onnxrt-runner-N0-01/07/24-13:36:18
    ---- Inference Output(s) ----
    {output [dtype=float32, shape=(<<removed content>>)],
     649 [dtype=float32, shape=(<<removed content>>)],
     714 [dtype=float32, shape=(<<removed content>>)],
     726 [dtype=float32, shape=(<<removed content>>)],
     791 [dtype=float32, shape=(<<removed content>>)],
     803 [dtype=float32, shape=(<<removed content>>)]}
[I] onnxrt-runner-N0-01/07/24-13:36:18  | Completed 1 iteration(s) in 3696 ms | Average inference time: 3696 ms.
[I] PASSED | Runtime: 12.594s | Command: D:\Anaconda3\Scripts\polygraphy run quantized_model.onnx --onnxrt

@zerollzeng
Copy link
Collaborator

Could you please share the onnx here? Thanks!

@talcs
Copy link
Author

talcs commented Jan 11, 2024

Could you please share the onnx here? Thanks!

Could it be possible to send it to you in private?

@zerollzeng
Copy link
Collaborator

Please share a private google drive link and I'll request for access. Thanks!

@talcs
Copy link
Author

talcs commented Jan 16, 2024

Great. The ONNX file is available here

@talcs
Copy link
Author

talcs commented Jan 18, 2024

Could you please let me know when you have sent the file access request, so I know it is you and approve it?

I've already got access requests from 2 different users.

@zerollzeng
Copy link
Collaborator

I've just request access.

@zerollzeng
Copy link
Collaborator

I can reproduce the issue, but the Identily layer is weird here, it served as the zero point of Q/DQ... cc @ttyio for viz, should I file an internal bug for this?
image

@zerollzeng
Copy link
Collaborator

The zero point should be like the y_scale.
image

@ttyio
Copy link
Collaborator

ttyio commented Feb 6, 2024

@zerollzeng could you try the internal nightly build, we should already supported this. If not let's create a bug, thanks!

@zerollzeng
Copy link
Collaborator

Filed internal bug 4491468.

@zerollzeng
Copy link
Collaborator

Fixed in TRT 10, closed.

@RuRo
Copy link

RuRo commented Feb 20, 2024

@zerollzeng Hi, is the fix currently available anywhere (EA? OSS?)? I am willing to build from source. If not, when/where can we expect it to become available?

@zerollzeng
Copy link
Collaborator

Please wait for the TRT 10 release, I guess EA will come out in March/April

@talcs
Copy link
Author

talcs commented Apr 24, 2024

Has the fix already been released? How can I get it on Jetson Orin?

@zerollzeng
Copy link
Collaborator

Has the fix already been released? How can I get it on Jetson Orin?

You have to wait for the Jetpack update, but JP has it own release schedule so sorry I cannot help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants