TensorRT fails to build engine from pytorch_quantization ONNX #3577

talcs · 2023-12-31T09:34:37Z

Description

I created a quantized model in pytorch using pytorch_quantization and exported it to ONNX.
Then, I executed the following command on Jetson Orin:

/usr/src/tensorrt/bin/trtexec --onnx=model_quantized.onnx --int8 --saveEngine=model_quantized.trt

Here is part of the trtexec output that includes the error:

[12/31/2023-11:17:12] [I] Start parsing network model
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [I] [TRT] Input filename:   model_quantized.onnx
[12/31/2023-11:17:12] [I] [TRT] ONNX IR version:  0.0.7
[12/31/2023-11:17:12] [I] [TRT] Opset version:    13
[12/31/2023-11:17:12] [I] [TRT] Producer name:    pytorch
[12/31/2023-11:17:12] [I] [TRT] Producer version: 1.12.1
[12/31/2023-11:17:12] [I] [TRT] Domain:           
[12/31/2023-11:17:12] [I] [TRT] Model version:    0
[12/31/2023-11:17:12] [I] [TRT] Doc string:       
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/31/2023-11:17:13] [I] Finish parsing network model
[12/31/2023-11:17:13] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[12/31/2023-11:17:13] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes.
[12/31/2023-11:17:13] [E] Error[2]: [qdqGraphOptimizer.cpp::matchInt8ConstantDQ::3582] Error Code 2: Internal Error (onnx::QuantizeLinear_898: Int8 constant is only allowed before DQ node)
[12/31/2023-11:17:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[12/31/2023-11:17:13] [E] Engine could not be created from network
[12/31/2023-11:17:13] [E] Building engine failed
[12/31/2023-11:17:13] [E] Failed to create engine from model or file.
[12/31/2023-11:17:13] [E] Engine set up failed

The error refers to the node QuantizeLinear_898 and the error is Int8 constant is only allowed before DQ node.

Looking at the ONNX graph, I can see that there is a node related to QuantizeLinear_898 that has no input:

Any idea what went wrong and how to solve it?

Environment

Model compilation:

TensorRT Version: TensorRT v8502 (Jetson Orin)

Model quantization and export to ONNX:

OS: Windows 10
Python Version (if applicable): 3.9.12
PyTorch Version (if applicable): 1.12.1+cu116
pytorch_quantization version: 2.1.3

The text was updated successfully, but these errors were encountered:

zerollzeng · 2024-01-01T10:50:08Z

What is the datatype of QuantizeLinear_338's input x?

zerollzeng · 2024-01-01T10:51:03Z

Does the onnx work with onnxruntime? This can be quickly checked by polygraphy run model.onnx --onnxrt

Check https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy for more info.

talcs · 2024-01-07T11:42:50Z

Yes, it passed:

> polygraphy run quantized_model.onnx --onnxrt
[I] RUNNING | Command: D:\Anaconda3\Scripts\polygraphy run quantized_model.onnx --onnxrt
[I] onnxrt-runner-N0-01/07/24-13:36:18  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[E] Module: 'torch' version '1.12.1+cu116' is installed, but version '>=1.13.0' is required.
    Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
    Attempting to continue with the currently installed version of this module, but note that this may cause errors!
[I] onnxrt-runner-N0-01/07/24-13:36:18
    ---- Inference Input(s) ----
    {input [dtype=float32, shape=(<<removed content>>)]}
[I] onnxrt-runner-N0-01/07/24-13:36:18
    ---- Inference Output(s) ----
    {output [dtype=float32, shape=(<<removed content>>)],
     649 [dtype=float32, shape=(<<removed content>>)],
     714 [dtype=float32, shape=(<<removed content>>)],
     726 [dtype=float32, shape=(<<removed content>>)],
     791 [dtype=float32, shape=(<<removed content>>)],
     803 [dtype=float32, shape=(<<removed content>>)]}
[I] onnxrt-runner-N0-01/07/24-13:36:18  | Completed 1 iteration(s) in 3696 ms | Average inference time: 3696 ms.
[I] PASSED | Runtime: 12.594s | Command: D:\Anaconda3\Scripts\polygraphy run quantized_model.onnx --onnxrt

zerollzeng · 2024-01-08T10:08:52Z

Could you please share the onnx here? Thanks!

talcs · 2024-01-11T15:52:06Z

Could you please share the onnx here? Thanks!

Could it be possible to send it to you in private?

zerollzeng · 2024-01-15T15:03:17Z

Please share a private google drive link and I'll request for access. Thanks!

talcs · 2024-01-16T13:55:27Z

Great. The ONNX file is available here

talcs · 2024-01-18T06:41:17Z

Could you please let me know when you have sent the file access request, so I know it is you and approve it?

I've already got access requests from 2 different users.

zerollzeng · 2024-01-19T09:47:03Z

I've just request access.

zerollzeng · 2024-01-19T10:11:43Z

I can reproduce the issue, but the Identily layer is weird here, it served as the zero point of Q/DQ... cc @ttyio for viz, should I file an internal bug for this?

zerollzeng · 2024-01-19T10:12:40Z

The zero point should be like the y_scale.

ttyio · 2024-02-06T19:31:09Z

@zerollzeng could you try the internal nightly build, we should already supported this. If not let's create a bug, thanks!

zerollzeng · 2024-02-08T05:13:10Z

Filed internal bug 4491468.

zerollzeng · 2024-02-20T07:01:28Z

Fixed in TRT 10, closed.

RuRo · 2024-02-20T09:16:39Z

@zerollzeng Hi, is the fix currently available anywhere (EA? OSS?)? I am willing to build from source. If not, when/where can we expect it to become available?

zerollzeng · 2024-02-22T15:04:30Z

Please wait for the TRT 10 release, I guess EA will come out in March/April

talcs · 2024-04-24T07:32:08Z

Has the fix already been released? How can I get it on Jetson Orin?

zerollzeng · 2024-05-01T11:53:17Z

Has the fix already been released? How can I get it on Jetson Orin?

You have to wait for the Jetpack update, but JP has it own release schedule so sorry I cannot help with this.

talcs changed the title ~~TensorRT fails to build computational graph from pytorch_quantization~~ TensorRT fails to build engine from pytorch_quantization ONNX Dec 31, 2023

zerollzeng self-assigned this Jan 1, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Jan 1, 2024

ttyio added the Quantization: QAT label Feb 6, 2024

zerollzeng added the internal-bug-tracked Tracked internally, will be fixed in a future release. label Feb 8, 2024

RuRo mentioned this issue Feb 9, 2024

RuntimeError: ONNX export failed: Couldn't export Python operator FakeTensorQuantFunction #3513

Closed

zerollzeng closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT fails to build engine from pytorch_quantization ONNX #3577

TensorRT fails to build engine from pytorch_quantization ONNX #3577

talcs commented Dec 31, 2023

zerollzeng commented Jan 1, 2024

zerollzeng commented Jan 1, 2024

talcs commented Jan 7, 2024 •

edited

Loading

zerollzeng commented Jan 8, 2024

talcs commented Jan 11, 2024

zerollzeng commented Jan 15, 2024

talcs commented Jan 16, 2024

talcs commented Jan 18, 2024 •

edited

Loading

zerollzeng commented Jan 19, 2024

zerollzeng commented Jan 19, 2024

zerollzeng commented Jan 19, 2024

ttyio commented Feb 6, 2024

zerollzeng commented Feb 8, 2024

zerollzeng commented Feb 20, 2024

RuRo commented Feb 20, 2024

zerollzeng commented Feb 22, 2024

talcs commented Apr 24, 2024

zerollzeng commented May 1, 2024

TensorRT fails to build engine from pytorch_quantization ONNX #3577

TensorRT fails to build engine from pytorch_quantization ONNX #3577

Comments

talcs commented Dec 31, 2023

Description

Any idea what went wrong and how to solve it?

Environment

Model compilation:

Model quantization and export to ONNX:

zerollzeng commented Jan 1, 2024

zerollzeng commented Jan 1, 2024

talcs commented Jan 7, 2024 • edited Loading

zerollzeng commented Jan 8, 2024

talcs commented Jan 11, 2024

zerollzeng commented Jan 15, 2024

talcs commented Jan 16, 2024

talcs commented Jan 18, 2024 • edited Loading

zerollzeng commented Jan 19, 2024

zerollzeng commented Jan 19, 2024

zerollzeng commented Jan 19, 2024

ttyio commented Feb 6, 2024

zerollzeng commented Feb 8, 2024

zerollzeng commented Feb 20, 2024

RuRo commented Feb 20, 2024

zerollzeng commented Feb 22, 2024

talcs commented Apr 24, 2024

zerollzeng commented May 1, 2024

talcs commented Jan 7, 2024 •

edited

Loading

talcs commented Jan 18, 2024 •

edited

Loading