[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

wjj19950828 · 2024-12-31T10:17:40Z

Currently, sensevoice's trt engine can be successfully converted through trtexec, but when running the benchmark infer, an error message is displayed as shown below:

ORT can be used to successfully predict the corresponding ONNX. The code is as follows, indicating that ONNX is fine

import onnxruntime
import torch

option = onnxruntime.SessionOptions()
option.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
option.intra_op_num_threads = 1
providers = [
    "CUDAExecutionProvider" 
    if torch.cuda.is_available() else "CPUExecutionProvider"
]
model = onnxruntime.InferenceSession(
    "model_sensevoice.onnx",
    sess_options=option, providers=providers)

batch_size = 4
feats_length = 256
speech = torch.randn(batch_size, feats_length, 560).cuda()
speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()

ort_inputs = {
    'speech': speech.cpu().numpy(),
    'speech_lengths': speech_lengths.cpu().numpy(),
    'language': language.cpu().numpy(),
    'textnorm': textnorm.cpu().numpy(),
}
output = model.run(None, ort_inputs)[0]
print("output:", output, output.shape)

trtexec convert:

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:1x128x560,speech_lengths:1,language:1,textnorm:1 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:8x512x560,speech_lengths:8,language:8,textnorm:8 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

TRT version：TensorRT-10.7.0.23
ONNX version: 1.17.0

So what is the specific reason? Thank you~

lix19937 · 2025-01-01T14:13:24Z

Try to use follow

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

wjj19950828 · 2025-01-02T13:00:24Z

Try to use follow

trtexec
--onnx=model_sensevoice.onnx
--saveEngine=engine_fp16.plan
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--fp16
--builderOptimizationLevel=3
--memPoolSize=workspace:4096
--verbose

@lix19937 Thanks for your reply~

Currently, since both bs dimension and feats_length require dynamic shapes, is it reasonable to set them to the same shape?

export onnx through follow:

def export_dynamic_axes(self):
    return {
        "speech": {0: "batch_size", 1: "feats_length"},
        "speech_lengths": {0: "batch_size"},
        "language": {0: "batch_size"},
        "textnorm": {0: "batch_size"},
        "ctc_logits": {0: "batch_size", 1: "logits_length"},
        "encoder_out_lens":  {0: "batch_size"},
    }

So are there any other solutions?

Thanks~

asfiyab-nvidia · 2025-01-02T23:07:53Z

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input.
Can you try providing the same inputs as ORT using --loadInputs flag in trtexec?
The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

wjj19950828 · 2025-01-04T05:23:15Z

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

@asfiyab-nvidia Thanks for your reply!

After using --loadInputs, inference runs benchamark without any problems, as shown below:

But when I run TRT using the following script, there is still an error. What is the reason? Thanks~
script:

import tensorrt as trt
import torch
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

engine_filepath = 'engine_fp16.plan'
# input
batch_size = 4
feats_length = 256
# speech = torch.randn(batch_size, feats_length, 560, dtype=torch.float32).cuda()
# speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
# language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
# textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
speech = torch.Tensor(np.fromfile('speech.bin', dtype=np.float32).reshape(batch_size, feats_length, 560)).cuda()
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
# output
ctc_logits = torch.empty(batch_size, feats_length + 4, 25055, dtype=torch.float32).cuda()
encoder_out_lens = torch.empty(batch_size, dtype=torch.int32).cuda()

with open(engine_filepath, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    context.set_input_shape('speech', (batch_size, feats_length, 560))
    context.set_input_shape('speech_lengths', (batch_size,))
    context.set_input_shape('language', (batch_size,))
    context.set_input_shape('textnorm', (batch_size,))

    bindings = [speech.data_ptr(), speech_lengths.data_ptr(), language.data_ptr(), textnorm.data_ptr(), ctc_logits.data_ptr(), encoder_out_lens.data_ptr()]
    for i in range(len(bindings)):
        print("name:", engine.get_tensor_name(i))
        context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
    handle = torch.cuda.current_stream().cuda_stream
    context.execute_async_v3(stream_handle=handle)

    print('all_binding_shapes_specified: ', context.all_binding_shapes_specified)

    print('ctc_logits shape: ', context.get_tensor_shape('ctc_logits'))
    print('encoder_out_lens', context.get_tensor_shape('encoder_out_lens'))
    print('ctc_logits: ', ctc_logits)
    print('encoder_out_lens: ', encoder_out_lens)

error:

asfiyab-nvidia added the triaged Issue has been triaged by maintainers label Jan 2, 2025

asfiyab-nvidia self-assigned this Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

wjj19950828 commented Dec 31, 2024

lix19937 commented Jan 1, 2025

wjj19950828 commented Jan 2, 2025

asfiyab-nvidia commented Jan 2, 2025

wjj19950828 commented Jan 4, 2025 •

edited

Loading

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

Comments

wjj19950828 commented Dec 31, 2024

lix19937 commented Jan 1, 2025

wjj19950828 commented Jan 2, 2025

asfiyab-nvidia commented Jan 2, 2025

wjj19950828 commented Jan 4, 2025 • edited Loading

wjj19950828 commented Jan 4, 2025 •

edited

Loading