Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_io_tensors get error of TensorRT 8.5 when running on GPU 4090 #3803

Open
peter5232 opened this issue Apr 16, 2024 · 7 comments
Open

num_io_tensors get error of TensorRT 8.5 when running on GPU 4090 #3803

peter5232 opened this issue Apr 16, 2024 · 7 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@peter5232
Copy link

Description

I have four input tensors [ "kpts0", "kpts1", "desc0", "desc1" ].

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )

I convert engine with the following command. onnx file

trtexec --onnx=superpoint_lightglue.onnx --saveEngine=superpoint_lightglue.engine

But when I use the Python API to obtain the IO Tensor, I only get desc0, desc1, matches0, mscores0.

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)

with open("superpoint_lightglue.engine", "rb") as f:
        engine = trt.Runtime(logger).deserialize_cuda_engine(f.read())
        tensor_names = [engine.get_tensor_name(i) for i in range(engine.num_io_tensors)]
        print(tensor_names)

I get output as follow.

['desc0', 'desc1', 'matches0', 'mscores0']

Environment

TensorRT Version: v8.5.3 and v8.6.1

NVIDIA GPU: 4090

NVIDIA Driver Version: 535.129.03

CUDA Version: 11.8

CUDNN Version: 8.9.6

Operating System:

Python Version (if applicable): 3.11

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.1.0

Baremetal or Container (if so, version):

@lix19937
Copy link

lix19937 commented Apr 20, 2024

You can try as follow

trtexec --onnx=superpoint_lightglue.onnx  --loadEngine=superpoint_lightglue.engine   --verbose  2>&1 |tee log   

cat log |grep "Using random values for input"   
cat log |grep "Using random values for output"   

can show all inputs and outputs.

@peter5232
Copy link
Author

I try this command and I get output as follow.

[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1

Actually have two inputs. But onnx file have four input tensors.

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )

1 similar comment
@peter5232
Copy link
Author

I try this command and I get output as follow.

[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1

Actually have two inputs. But onnx file have four input tensors.

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )

@lix19937
Copy link

@peter5232
can you run follow cmd

trtexec --onnx=superpoint_lightglue.onnx  --saveEngine=superpoint_lightglue.engine  --verbose 2>&1 | tee  build.log

and then upload the build.log file

@zerollzeng
Copy link
Collaborator

what does polygraphy inspect model superpoint_lightglue.onnx output? Or how many inputs you can see in netron?

@zerollzeng zerollzeng self-assigned this Apr 25, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 25, 2024
@lix19937
Copy link

lix19937 commented May 4, 2024

Check inputs/outputs by netron is not always right. Sometimes netron can not see the hidden inputs/outputs.

@lix19937
Copy link

lix19937 commented May 6, 2024

@zerollzeng I come across one case, the onnx(39MB) open by netron show nothing, but use trtexec can build pass.

[05/06/2024-11:23:47] [I] Engine deserialized in 0.113882 sec.
[05/06/2024-11:23:47] [V] [TRT] Total per-runner device persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Total per-runner host persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Allocated activation device memory of size 0
[05/06/2024-11:23:47] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 39 (MiB)
[05/06/2024-11:23:47] [I] Setting persistentCacheLimit to 0 bytes.
[05/06/2024-11:23:47] [V] Using enqueueV3.
[05/06/2024-11:23:47] [I] Using random values for output 82
[05/06/2024-11:23:47] [I] Created output binding for 82 with dimensions 1x256x200x200
[05/06/2024-11:23:47] [I] Starting inference
[05/06/2024-11:23:50] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[05/06/2024-11:23:50] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[05/06/2024-11:23:50] [I]
[05/06/2024-11:23:50] [I] === Profile (1032 iterations ) ===
[05/06/2024-11:23:50] [I]                                            Layer   Time (ms)   Avg. Time (ms)   Median Time (ms)   Time %
[05/06/2024-11:23:50] [I]  Reformatting CopyNode for Output Tensor 0 to 82      384.10           0.3722             0.3758    100.0
[05/06/2024-11:23:50] [I]                                            Total      384.10           0.3722             0.3758    100.0
[05/06/2024-11:23:50] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # trtexec --onnx=positional_encoding_poly.onnx --verbose --dumpProfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants