-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueV3::2666, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex)) #4224
Comments
you can try to use |
Okay, so running the command, at the end I get the error below, even though running the model normally only gives the previous error I mentioned and not a segfault.
|
It seems that libtensorrt_scatter.so has bug. You can modify the enqueue function run as dummy (once enter, then return 0;), then recompile the plugin, rerun the trtexec. |
I did it and the error is exactly the same, so i guess the bug is not in the enqueue function? |
The error message in the OP usually indicates that a buffer was not properly assigned to an output binding of an engine. Given that trtexec doesn't throw the same error, it may be related to data-dependent shapes, where an OuputAllocator class needs to be provided and assigned to a data-dependent output binding. On the trtexec side, are you able to repro the same issue with the unquantized model? If the model can be shared for us to debug that would be useful to have as well. |
With the unquantized model, there is no issue at all. The issue in the OP comes when I try to replace the unquantized model with the INT8 version of it. I attach the onnx files for the quantized |
Finally got around to taking a look at the issue, using TensorRT 8.6 even with the non-quantized model I'm seeing the similar cuda error:
I'm debugging further (trying to repro with TensorRT 10.X). Were you able to successfully run the non-quantized model with the same command? |
You're right, I get the same error with
but when I run the model normally, I don't get any error, it runs properly. |
What do you mean when you say you run the model "normally", is it in your inference script? I've modified the model to use the native ScatterElements to work with TensorRT 10.X whose plugin library includes the ScatterElements plugin. After doing so, I'm seeing this following error:
This looks to be an internal builder issue. I've filed an internal bug to track this. |
Yes, indeed, normally I mean for inferring it inside our pipeline. There is no error and the results are the expected results so, with the unquantized GNN, the plugin seems to be running as expected. The problem (the OP) starts when I try passing the quantized GNN onnx file, with INT8 precision. Regarding the error, what does that mean? That it's of the internal builder? Does it mean that there's nothing we can do at this point? |
Hi, do we have any news on this? |
I'm wondering what the difference is between the code in your pipeline versus trtexec. We use trtexec as our baseline, and I'm unable to run inference through trtexec with your attached model. We've made some progress internally and believe that the root cause of the illegal cuda memory access stems from the scatter nodes. I've modified the custom scatter nodes back to regular |
In our case, we use the scatter add operation for the message passing of the GNN, and the |
The ScatterAdd operation doesn't work if the ranks of the inputs do not match, and in ONNX, the shapes between indices and updates must match as well. This looks to be an export issue where the shape of |
Is this due to something on our side? |
Yes, looks to be a model definition issue. Can you double check the scatter module in your pytorch model? |
Description
We have a pytorch GNN model that we run on an Nvidia GPU with TensorRT (TRT). For the scatter_add operation we are using the scatter elements plugin for TRT. We are now trying to quantize it.
We are following the same procedure that worked for the quantization of a simple multilayer perceptron. After quantizing to INT8 with pytorch-quantization and exporting with ONNX, I pass the model to TRT with precision=INT8 without errors. However, during runtime I get the error:
The plugin states that it does not support INT8, but I do not see why it cannot be left to FP32 precision while the rest of the model be quantized. Any ideas of what is causing the problem?
The text was updated successfully, but these errors were encountered: