Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] allow uint8 output without an ICastLayer before #4282

Open
QMassoz opened this issue Dec 12, 2024 · 3 comments
Open

[Feature request] allow uint8 output without an ICastLayer before #4282

QMassoz opened this issue Dec 12, 2024 · 3 comments
Assignees
Labels
Enhancement New feature or request quantization Issues related to Quantization triaged Issue has been triaged by maintainers

Comments

@QMassoz
Copy link

QMassoz commented Dec 12, 2024

Context

I work in the broadcast sector, where frames processed by our TensorRT (TRT) engines can have various pixel formats (encoded or decoded, bit depths, color spaces, etc).

We developed a custom codec Plugin that converts all those pixel formats to and from fp16/fp32, enabling TRT to process these frames. This codec layer accepts a format input that specifies the pixel format of the input, allowing the plugin to determine the appropriate codec for conversion. This codec layer is inserted at the beginning and the end of our TRT engines.

Implementing this plugin with uint8 inputs and outputs simplifies the design and results in a cleaner implementation.

Problem

During the network-building stage, the following error is encountered:

Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (Network-level output tensor output has datatype UInt8 but is not produced by an IIdentityLayer or ICastLayer.)

Request

Provide a mechanism to allow my custom plugin to produce uint8 network outputs. There is no need for a casting layer here.

Dirty workaround

Our current workaround involves bypassing TRT's restrictions by misrepresenting the uint8 byte array as an fp16 array with half the number of elements. While this approach allows the engine to build, it is not ideal:

We serve the TRT engine via Triton using the TensorRT backend. The TRT datatype determines the datatype specified in the config.pbtxt. This datatype propagates to the Triton client, leading to potential discrepancies or confusion. A proper solution would remove the need for this workaround and ensure clean, consistent datatype handling.

@QMassoz
Copy link
Author

QMassoz commented Dec 12, 2024

I would like to extend the request to more TensorFormat and DataType.

  1. Fix the confusing documentation.
    Let's focus on kHWC kHALF and kHWC kFLOAT.

    • According to c++ API and python API, both kHWC kHALF and kHWC kFLOAT are valid and supported.
    • According to section 6.10 of developer guide, kHWC kHALF is not supported but kHWC kFLOAT is.
    • According to section 10.7.1 of developer guide, neither kHWC kHALF nor kHWC kFLOAT are supported.
    • When developing a custom plugin, kHWC kFLOAT is supported but kHWC kHALF is not (IBuilder::buildSerializedNetwork: Error Code 9: Internal Error (/MyPlugin: could not find any supported formats consistent with input/output data types))
  2. Allow a custom plugin to produce any combination of TensorFormat & DataType that are supported by internals of TensorRT.

@lix19937
Copy link

Current not support uint8 dtype output, see https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2.html#af502120d140358f8d8139ade5005b9f5

Warning
for the format field, the values PluginFormat::kCHW4, PluginFormat::kCHW16, and PluginFormat::kCHW32 will not be passed in, this is to keep backward compatibility with TensorRT 5.x series. Use PluginV2IOExt or PluginV2DynamicExt for other PluginFormats.
DataType:kBOOL and DataType::kUINT8 are not supported.

also you can use a WAR, see #3959

@QMassoz
Copy link
Author

QMassoz commented Dec 17, 2024

I am implementing my plugin with IPluginV3 and there is no mention about uint8 or any plugin formats not being supported https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1v__1__0_1_1_i_plugin_v3_one_build.html#aa21f8c693d2c9e44908a8896a528fbbe. Actually, uint8 appears to be supported by IPluginV3 but its output cannot be used as the output of the network as evidenced by the error I wrote above. But maybe this error hides an error about uint8 not being supported, this is confusing and this could be improved.

@asfiyab-nvidia asfiyab-nvidia added Enhancement New feature or request triaged Issue has been triaged by maintainers quantization Issues related to Quantization labels Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request quantization Issues related to Quantization triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants