How to convert T5_v1.1_xxl from ONNX to TRT engine? #2167

TracelessLe · 2022-07-19T06:40:01Z

Description

I try to use the sample script in TensorRT/demo/HuggingFace/notebooks/t5.ipynb to converte the google/t5-v1_1-xxl 11b model to onnx format and trt engine file. The pytorch->onnx step is ok, but when I try to load it and convert it to trt model, it always fail after running about 2h with error as below:

tensorrt_model_path:  ./models/google/t5-v1_1-xxl/tensorrt
[07/19/2022-11:52:45] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/19/2022-11:53:14] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.5.1
[07/19/2022-13:36:09] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 13) [Constant] + (Unnamed Layer* 14) [Shuffle]...Mul_1732]}.)
[07/19/2022-13:36:09] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
Traceback (most recent call last):
  File "t5_onnx2trt.py", line 60, in <module>
    ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + ".engine", profiles=[encoder_profile])
  File "/root/TensorRT-8.2.5.1/demo/HuggingFace/NNDF/models.py", line 426, in as_trt_engine
    profiles,
  File "/root/TensorRT-8.2.5.1/demo/HuggingFace/T5/export.py", line 293, in onnx_to_trt
    return super().onnx_to_trt(output_fpath, input_fpath, network_metadata, profiles)
  File "/root/TensorRT-8.2.5.1/demo/HuggingFace/NNDF/models.py", line 129, in onnx_to_trt    network_definition, config=self.trt_inference_config
  File "<string>", line 3, in func_impl
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/backend/base/loader.py", line 41, in __call__
    return self.call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/backend/trt/loader.py", line 645, in call_impl
    return engine_from_bytes(super().call_impl)
  File "<string>", line 3, in func_impl
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/backend/base/loader.py", line 41, in __call__
    return self.call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/backend/trt/loader.py", line 669, in call_impl
    buffer, owns_buffer = util.invoke_if_callable(self._serialized_engine)
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/util/util.py", line 646, in invoke_if_callable
    ret = func(*args, **kwargs)
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/backend/trt/loader.py", line 603, in call_impl
    G_LOGGER.critical("Invalid Engine. Please ensure the engine was built correctly")
  File "/root/miniconda3/envs/trt/lib/python3.7/site-packages/polygraphy/logger/logger.py", line 349, in critical
    raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

the code I use as below:

batch_size = 1
T5_VARIANT = 'google/t5-v1_1-xxl'
max_sequence_length = T5ModelTRTConfig.MAX_SEQUENCE_LENGTH[T5_VARIANT]

# Encoder optimization profiles
encoder_profile = Profile()
encoder_profile.add(
    "input_ids",
    min=(batch_size, 1),
    opt=(batch_size, max_sequence_length // 2),
    max=(batch_size, max_sequence_length),
)

encoder_onnx_model_fpath = "t5-xxl-encoder.onnx"
metadata=NetworkMetadata(variant=T5_VARIANT, precision=Precision(fp16=True), other=T5Metadata(kv_cache=False))
t5_trt_encoder_engine = T5EncoderONNXFile(
                os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata
            ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + ".engine", profiles=[encoder_profile])

Ps. I use the jupyter script to convert t5-small, t5-large and t5-3b with no problem, when I come to work with t5-v1.1-xxl, it always fail... :(

Environment

TensorRT Version: 8.2.5.1
NVIDIA GPU: Tested on A100 and 3090Ti
NVIDIA Driver Version: 470.57.02
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System: Ubuntu 18.04
Python Version (if applicable): 3.7
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.11.0+cu113
Baremetal or Container (if so, version):

Relevant Files

I find some similar problems such as #1686 and #1937 and #1917, and I try to increase the workspace but no effect。

I open the trt verbose print, the infomation in running as below:

t5xxl_trt_log.txt

Steps To Reproduce

nrakltx · 2022-11-07T14:13:36Z

@TracelessLe Could you please share how you fixed this? I would really appreciate it!

TracelessLe · 2022-11-08T03:54:02Z

@TracelessLe Could you please share how you fixed this? I would really appreciate it!

Hi @nrakltx , I just try to increase the workspace of TRT config (in the NNDF/models.py script) to 10 times of base as below, it succeeded:

From:

DEFAULT_TRT_WORKSPACE_MB = 3072

self.trt_inference_config = CreateConfig(
    tf32=True,
    fp16=network_metadata.precision.fp16,
    max_workspace_size=result.DEFAULT_TRT_WORKSPACE_MB * 1024 * 1024,
    profiles=profiles,
    obey_precision_constraints=result.use_obey_precision_constraints()
)

To:

DEFAULT_TRT_WORKSPACE_MB = 3072

self.trt_inference_config = CreateConfig(
    tf32=True,
    fp16=network_metadata.precision.fp16,
    max_workspace_size=result.DEFAULT_TRT_WORKSPACE_MB * 10 * 1024 * 1024,
    profiles=profiles,
    obey_precision_constraints=result.use_obey_precision_constraints()
)

nrakltx · 2022-11-08T05:46:16Z

This is with FP32 and not FP16, correct?

TracelessLe · 2022-11-08T12:20:19Z

This is with FP32 and not FP16, correct?

Sure, maybe some NAN errors will occur when using FP16 in T5 xxl model as said in:

You can have a try. :)

nrakltx · 2022-11-08T13:07:43Z

Cool, so 30GB VRAM was enough for the FP32 T5 v1.1 XXL TensorRT engine building process?

drxmy · 2023-01-03T07:33:59Z

@TracelessLe Could you please share how you fixed this? I would really appreciate it!

Hi @nrakltx , I just try to increase the workspace of TRT config (in the NNDF/models.py script) to 10 times of base as below, it succeeded:

From:

DEFAULT_TRT_WORKSPACE_MB = 3072

self.trt_inference_config = CreateConfig(
    tf32=True,
    fp16=network_metadata.precision.fp16,
    max_workspace_size=result.DEFAULT_TRT_WORKSPACE_MB * 1024 * 1024,
    profiles=profiles,
    obey_precision_constraints=result.use_obey_precision_constraints()
)

To:

DEFAULT_TRT_WORKSPACE_MB = 3072

self.trt_inference_config = CreateConfig(
    tf32=True,
    fp16=network_metadata.precision.fp16,
    max_workspace_size=result.DEFAULT_TRT_WORKSPACE_MB * 10 * 1024 * 1024,
    profiles=profiles,
    obey_precision_constraints=result.use_obey_precision_constraints()
)

Did you use 80G or 40G A100? I tried increasing DEFAULT_TRT_WORKSPACE_MB. It gave a "OutOfMemory" msg with both 32g V100 and 40g A100.

nrakltx · 2023-01-03T08:03:08Z

80GB, 40GB is not enough - my average VRAM usage was 45GB~ during compilation.
Do note that if you have access to both versions of the GPU, you can compile the engine with the 80GB and infer with the 40GB.

drxmy · 2023-01-03T08:06:17Z

80GB, 40GB is not enough - my average VRAM usage was 45GB~ during compilation. Do note that if you have access to both versions of the GPU, you can compile the engine with the 80GB and infer with the 40GB.

Thank you！

TracelessLe closed this as completed Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert T5_v1.1_xxl from ONNX to TRT engine? #2167

How to convert T5_v1.1_xxl from ONNX to TRT engine? #2167

TracelessLe commented Jul 19, 2022 •

edited

Loading

nrakltx commented Nov 7, 2022

TracelessLe commented Nov 8, 2022

nrakltx commented Nov 8, 2022

TracelessLe commented Nov 8, 2022

nrakltx commented Nov 8, 2022

drxmy commented Jan 3, 2023

nrakltx commented Jan 3, 2023

drxmy commented Jan 3, 2023

How to convert T5_v1.1_xxl from ONNX to TRT engine? #2167

How to convert T5_v1.1_xxl from ONNX to TRT engine? #2167

Comments

TracelessLe commented Jul 19, 2022 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

nrakltx commented Nov 7, 2022

TracelessLe commented Nov 8, 2022

nrakltx commented Nov 8, 2022

TracelessLe commented Nov 8, 2022

nrakltx commented Nov 8, 2022

drxmy commented Jan 3, 2023

nrakltx commented Jan 3, 2023

drxmy commented Jan 3, 2023

TracelessLe commented Jul 19, 2022 •

edited

Loading