int8 input not supported for average pooling in MLIR_TRT #457

farazkh80 · 2024-12-18T18:48:07Z

Happened when running test_dtype_constraints[avgpool-valid:T1-int8]

summary = 'MTRTException: InternalError: failed to run compilation on module with symbol name: ins_t9521_outs_t9522_988\n\nAddit...s._api.MTRTException: InternalError: failed to run compilation on module with symbol name: ins_t9521_outs_t9522_988\n.'
details = ["IBuilder::buildSerializedNetwork: Error Code 1: Internal Error (Node [tensorrt.pooling] (t9522)cannot be quantized b...s=[1, 1, 1, 1], padding=[(0, 0), (0, 0), (0, 0), (0, 0)])\n      | ", '\n', '\nThis operation was introduced to ', ...]

    def raise_error(summary: str, details: List[Any] = []):
        """
        Raises a Tripy exception with a formatted message.
    
        Args:
            summary: A summary of the error message. This will be displayed before any other details.
            details: Details on the error. This function handles objects in this list as follows:
                - If they include a `stack_info` member, then information on the first user frame is displayed,
                    including file/line information as well as the line of code.
    
                    IMPORTANT: Any stack frames from the function registry are not displayed since
                    the function registry is an implementation detail used to dispatch to the real functions
                    we care about. Additionally, any code defined in the functions listed in ``EXCLUDE_FUNCTIONS``
                    is omitted.
    
                - In all other cases, the object is just converted to a string.
    
        Raises:
            TripyException
        """
    
        pre_summary = ""
        stack_info = utils.get_stack_info()
        user_frame_index = stack_info.get_first_user_frame_index()
        if user_frame_index is not None:
            stack_info.fetch_source_code()
            pre_summary = str_from_source_info(stack_info[user_frame_index])
    
        detail_msg = ""
        for detail in details:
            stack_info_message = None
            if hasattr(detail, "stack_info"):
                stack_info_message = str_from_stack_info(detail.stack_info)
            elif isinstance(detail, utils.StackInfo):
                stack_info_message = str_from_stack_info(detail)
    
            if stack_info_message is not None:
                detail_msg += stack_info_message
            else:
                detail_msg += str(detail)
    
        msg = f"{pre_summary}{summary}\n" + indent(detail_msg, " " * 4)
        # We use `from None` to suppress output from previous exceptions, since we want to handle them internally.
>       raise TripyException(msg) from None
E       tripy.common.exception.TripyException: 
E       
E       --> /tripy/tests/wrappers/test_interface.py:221 in _run_dtype_constraints_subtest()
E             |
E         221 |     ret_val.eval()
E             | 
E       
E       MTRTException: InternalError: failed to run compilation on module with symbol name: ins_t9521_outs_t9522_988
E       
E       Additional context:
E       Traceback (most recent call last):
E         File "/tripy/tripy/backend/mlir/compiler.py", line 86, in compile
E           executable = compiler.compiler_stablehlo_to_executable(
E       mlir_tensorrt.runtime._mlir_libs._api.MTRTException: InternalError: failed to run compilation on module with symbol name: ins_t9521_outs_t9522_988
E       .
E           IBuilder::buildSerializedNetwork: Error Code 1: Internal Error (Node [tensorrt.pooling] (t9522)cannot be quantized by arg0. You might want to add a DQ node before [tensorrt.pooling] (t9522).
E           )
E           (t9522)error: failed to translate function 'tensorrt_cluster' to a TensorRT engine
E       
E           This error occured while trying to compile the following FlatIR expression:
E                 |
E                 | t_inter2: [rank=(4), shape=((-1, -1, -1, -1)), dtype=(int8), loc=(gpu:0)] = ReduceWindowOp(t9521, t_inter3, reduce_mode='avg', window_dims=[1, 1, 2, 2], window_strides=[1, 1, 1, 1], padding=[(0, 0), (0, 0), (0, 0), (0, 0)])
E                 | 
E       
E           This operation was introduced to create the output of reduce `avg` operation..
E       
E           Note: This originated from the following expression:
E       
E           --> <string>:7 in <module>()
E       
E           Input 0:
E       
E           --> /tripy/tests/wrappers/object_builders.py:35 in tensor_builder()
E                 |
E              35 |         out = tp.cast(out, dtype=namespace[dtype])
E                 |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The text was updated successfully, but these errors were encountered:

pranavm-nvidia · 2024-12-18T21:52:29Z

@farazkh80 could you post the trace if you still have it?

yizhuoz004 · 2024-12-18T23:30:07Z

How to reproduce this error? This test passes locally.

pranavm-nvidia · 2024-12-18T23:57:26Z

@yizhuoz004 the error only happens when we use an input to the pooling layer. My suspicion is that it's being constant folded in other cases. It will probably repro if you tp.compile the tp.avgpool and use an int8 input.

farazkh80 · 2024-12-19T00:15:33Z

here is the trace

inputs:
    t35: [shape=([1, 1, 8, 8]), dtype=(int8), loc=(gpu:0)]
t36 = pooling(t35, kind=Kind.AVG, kernel_dims=[2, 2], stride=[1, 1], padding=[(0, 0), (0, 0)])
outputs:
    t36: [shape=([-1, -1, -1, -1]), dtype=(int8), loc=(gpu:0)]

yizhuoz004 · 2024-12-19T21:22:14Z

Can reproduce by making the int8 tensor as an input. This is most likely a TRT constraint, will file a bug. We can waive it for now. Also torch avg pooling does not support int8, this should be a rare use case.

yizhuoz004 · 2024-12-23T18:51:13Z

The root cause is int8 is not supported as an input dtype, see https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Graph/Network.html#:~:text=Currently%2C%20tensorrt.int8%20is%20not%20supported%20for%20inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8 input not supported for average pooling in MLIR_TRT #457

int8 input not supported for average pooling in MLIR_TRT #457

farazkh80 commented Dec 18, 2024

pranavm-nvidia commented Dec 18, 2024

yizhuoz004 commented Dec 18, 2024

pranavm-nvidia commented Dec 18, 2024

farazkh80 commented Dec 19, 2024

yizhuoz004 commented Dec 19, 2024

yizhuoz004 commented Dec 23, 2024

int8 input not supported for average pooling in MLIR_TRT #457

int8 input not supported for average pooling in MLIR_TRT #457

Comments

farazkh80 commented Dec 18, 2024

pranavm-nvidia commented Dec 18, 2024

yizhuoz004 commented Dec 18, 2024

pranavm-nvidia commented Dec 18, 2024

farazkh80 commented Dec 19, 2024

yizhuoz004 commented Dec 19, 2024

yizhuoz004 commented Dec 23, 2024