Triton 24.05 crashes on Ubuntu when loading TensorRT RetinaNet model trained with TAO #7397

mar-jas · 2024-07-01T06:49:56Z

Description
Triton 24.05 crashes without any info during model initialization - see log below.
We use a RetinaNet model trained with TAO Toolkit (two versions checked: 5.0.0 and 5.3.0) to be run with TensortRT backend (platform: “tensorrt_plan”). We used it successfully with an old Triton 23.05.

The model.plan file is converted from onnx file using “trtexec” inside this tritonserver:24.05 container. We checked that trtexec successfully performs the inference with this model.plan. We have also run the model on Jetson AGX Orin with JetPack 6 and Triton 24.05 without any problem. So we assume that the issue is limited to Linux and Triton.

Triton Information
tritonserver:24.05

Are you using the Triton container or did you build it yourself?
downloaded: nvcr.io/nvidia/tritonserver:24.05-py3

To Reproduce

Train RetinaNet model with TAO Toolkit v5.3.0 (also with v5.0.0)
Export to ONNX with “tao model retinanet export …”
Run triton container: nvcr.io/nvidia/tritonserver:24.05-py3
Convert to plan: “trtexec --onnx=model.onnx --saveEngine=model.plan”
Use model.plan in Triton model

name: "v5.3.0_20240624-internal"
platform: "tens

orrt_plan"
input {
name: "Input"
data_type: TYPE_FP32
dims: [-1, 3, 512, 608]
}
output {
name: "NMS"
data_type: TYPE_FP32
dims: [-1, 1, 200, 7]
}
output {
name: "NMS_1"
data_type: TYPE_FP32
dims: [-1, 1, 1, 1]
}
default_model_filename: "model.plan"

log:

root@53c28d76c621:/work# tritonserver --model-repository=/models
I0627 14:08:29.972225 2613 pinned_memory_manager.cc:275] "Pinned memory pool is created at '0x7fdb4c000000' with size 268435456"
I0627 14:08:29.972367 2613 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0627 14:08:29.973512 2613 model_lifecycle.cc:472] "loading: v5.3.0_20240624:1"
I0627 14:08:29.973531 2613 model_lifecycle.cc:472] "loading: v5.3.0_20240624-internal:1"
I0627 14:08:29.982372 2613 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"
I0627 14:08:29.982382 2613 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"
I0627 14:08:29.982387 2613 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"
I0627 14:08:29.982390 2613 tensorrt.cc:105] "backend configuration:\n{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}"
I0627 14:08:29.982574 2613 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: v5.3.0_20240624-internal (version 1)"
I0627 14:08:30.050148 2613 logging.cc:46] "Loaded engine size: 91 MiB"
Segmentation fault (core dumped)
root@53c28d76c621:/work# I0627 14:08:30.329108 2637 pb_stub.cc:2121] Non-graceful termination detected.

Expected behavior
Model should be initialised and ready for usage, as for the old Triton.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton 24.05 crashes on Ubuntu when loading TensorRT RetinaNet model trained with TAO #7397

Triton 24.05 crashes on Ubuntu when loading TensorRT RetinaNet model trained with TAO #7397

mar-jas commented Jul 1, 2024 •

edited

Loading

Triton 24.05 crashes on Ubuntu when loading TensorRT RetinaNet model trained with TAO #7397

Triton 24.05 crashes on Ubuntu when loading TensorRT RetinaNet model trained with TAO #7397

Comments

mar-jas commented Jul 1, 2024 • edited Loading

mar-jas commented Jul 1, 2024 •

edited

Loading