Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported #4307

Open
wjj19950828 opened this issue Dec 31, 2024 · 4 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@wjj19950828
Copy link

Currently, sensevoice's trt engine can be successfully converted through trtexec, but when running the benchmark infer, an error message is displayed as shown below:
Image

ORT can be used to successfully predict the corresponding ONNX. The code is as follows, indicating that ONNX is fine

import onnxruntime
import torch

option = onnxruntime.SessionOptions()
option.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
option.intra_op_num_threads = 1
providers = [
    "CUDAExecutionProvider" 
    if torch.cuda.is_available() else "CPUExecutionProvider"
]
model = onnxruntime.InferenceSession(
    "model_sensevoice.onnx",
    sess_options=option, providers=providers)

batch_size = 4
feats_length = 256
speech = torch.randn(batch_size, feats_length, 560).cuda()
speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()

ort_inputs = {
    'speech': speech.cpu().numpy(),
    'speech_lengths': speech_lengths.cpu().numpy(),
    'language': language.cpu().numpy(),
    'textnorm': textnorm.cpu().numpy(),
}
output = model.run(None, ort_inputs)[0]
print("output:", output, output.shape)

trtexec convert:

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:1x128x560,speech_lengths:1,language:1,textnorm:1 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:8x512x560,speech_lengths:8,language:8,textnorm:8 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

TRT version:TensorRT-10.7.0.23
ONNX version: 1.17.0

So what is the specific reason? Thank you~

@lix19937
Copy link

lix19937 commented Jan 1, 2025

Try to use follow

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

@wjj19950828
Copy link
Author

Try to use follow

trtexec
--onnx=model_sensevoice.onnx
--saveEngine=engine_fp16.plan
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--fp16
--builderOptimizationLevel=3
--memPoolSize=workspace:4096
--verbose

@lix19937 Thanks for your reply~

Currently, since both bs dimension and feats_length require dynamic shapes, is it reasonable to set them to the same shape?

export onnx through follow:

def export_dynamic_axes(self):
    return {
        "speech": {0: "batch_size", 1: "feats_length"},
        "speech_lengths": {0: "batch_size"},
        "language": {0: "batch_size"},
        "textnorm": {0: "batch_size"},
        "ctc_logits": {0: "batch_size", 1: "logits_length"},
        "encoder_out_lens":  {0: "batch_size"},
    }

So are there any other solutions?

Thanks~

@asfiyab-nvidia asfiyab-nvidia added the triaged Issue has been triaged by maintainers label Jan 2, 2025
@asfiyab-nvidia asfiyab-nvidia self-assigned this Jan 2, 2025
@asfiyab-nvidia
Copy link
Collaborator

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input.
Can you try providing the same inputs as ORT using --loadInputs flag in trtexec?
The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

@wjj19950828
Copy link
Author

wjj19950828 commented Jan 4, 2025

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

@asfiyab-nvidia Thanks for your reply!

After using --loadInputs, inference runs benchamark without any problems, as shown below:
Image

But when I run TRT using the following script, there is still an error. What is the reason? Thanks~
script:

import tensorrt as trt
import torch
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

engine_filepath = 'engine_fp16.plan'
# input
batch_size = 4
feats_length = 256
# speech = torch.randn(batch_size, feats_length, 560, dtype=torch.float32).cuda()
# speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
# language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
# textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
speech = torch.Tensor(np.fromfile('speech.bin', dtype=np.float32).reshape(batch_size, feats_length, 560)).cuda()
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
# output
ctc_logits = torch.empty(batch_size, feats_length + 4, 25055, dtype=torch.float32).cuda()
encoder_out_lens = torch.empty(batch_size, dtype=torch.int32).cuda()

with open(engine_filepath, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    context.set_input_shape('speech', (batch_size, feats_length, 560))
    context.set_input_shape('speech_lengths', (batch_size,))
    context.set_input_shape('language', (batch_size,))
    context.set_input_shape('textnorm', (batch_size,))

    bindings = [speech.data_ptr(), speech_lengths.data_ptr(), language.data_ptr(), textnorm.data_ptr(), ctc_logits.data_ptr(), encoder_out_lens.data_ptr()]
    for i in range(len(bindings)):
        print("name:", engine.get_tensor_name(i))
        context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
    handle = torch.cuda.current_stream().cuda_stream
    context.execute_async_v3(stream_handle=handle)

    print('all_binding_shapes_specified: ', context.all_binding_shapes_specified)

    print('ctc_logits shape: ', context.get_tensor_shape('ctc_logits'))
    print('encoder_out_lens', context.get_tensor_shape('encoder_out_lens'))
    print('ctc_logits: ', ctc_logits)
    print('encoder_out_lens: ', encoder_out_lens)

error:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants