-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Client-side input shape/element validation #742
base: main
Are you sure you want to change the base?
Conversation
src/c++/library/CMakeLists.txt
Outdated
@@ -275,6 +277,10 @@ if(TRITON_ENABLE_CC_HTTP OR TRITON_ENABLE_PERF_ANALYZER) | |||
http-client-library EXCLUDE_FROM_ALL OBJECT | |||
${REQUEST_SRCS} ${REQUEST_HDRS} | |||
) | |||
add_dependencies( | |||
http-client-library | |||
proto-library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't proto-library used for protobuf<->grpc? Why is it needed for HTTP client?
edit: guessing the requirement is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any concerns with introducing the new protobuf dependency to the HTTP client, or any alternatives? CC @GuanLuo @tanmayv25
There is a known issue with TensorRT (Jira DLIS-6805 ) which causes TRT tests to fail again at client (CI job 102924904). There is no way to know the platform of inference model at the client side. Should we wait until @pskiran1 finish his change first? |
@yinggeh, I just merged DLIS-6805 changes, could you please try with the latest code? |
src/c++/library/common.cc
Outdated
@@ -232,6 +236,26 @@ InferInput::SetBinaryData(const bool binary_data) | |||
return Error::Success; | |||
} | |||
|
|||
Error | |||
InferInput::ValidateData() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving TRT reformat conversation to a thread 🧵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yingge:
There is a known issue with TensorRT (Jira DLIS-6805 ) which causes TRT tests to fail again at client (CI job 102924904). There is no way to know the platform of inference model at the client side. Should we wait until @pskiran1 finish his change first?
CC @tanmayv25 @GuanLuo @rmccorm4
Sai:
@yinggeh, I just merged DLIS-6805 changes, could you please try with the latest code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just merged DLIS-6805 changes, could you please try with the latest code?
Sai's changes will allow the check to work on core side, but probably not on client side, right? @yinggeh
There is no way to know the platform of inference model at the client side.
You can query the platform/backend through the model config APIs on client side, which would work when inferring on a TensorRT model directly. You can probably even query is_non_linear_format_io
from the model config if needed.
For an ensemble model containing one of these TRT models with non-linear inputs, you may need to follow the ensemble definition to find out if it's calling a TRT model with its inputs, which can be a pain. It may be simpler to skip the check on ensemble models and let the core check handle it (but it feels like we're starting to introduce a lot of special checks and cases with this feature).
For a BLS model, I think it's fine and will work as any other python model, then it will trigger the core check internally if the BLS is calling the TRT model.
CC @tanmayv25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to introduce a new flag in client Input/Output tensors to skip the byte size check on the client side.
We can document when we expect the user to provide this option (using non-linear format).
This way the user can be aware of what they are doing.
Pro:
- Generic API change allows for all the flexibility
- Powerful expression for the client-side code.
Cons:
- Adding a flag to skip these checks seems to be counter-intuitive and makes us question even the requirement of such checks in the first place.
a. This can be alleviated by an additional check to some degree by validating the skip_byte_size check flag is set for the correct scenario. - Breaks backwards compatibility, as the user now has to set a new flag to use models with non-linear tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an ensemble model containing one of these TRT models with non-linear inputs, you may need to follow the ensemble definition to find out if it's calling a TRT model with its inputs, which can be a pain.
@rmccorm4 Can you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this?
If you have an ensemble with ENSEMBLE_INPUT0 where the first step is a TRT model with non-linear IO INPUT0 and a mapping of ENSEMBLE_INPUT0 -> INPUT0, do we require an ensemble config to mention that the ENSEMBLE_INPUT0 is non-linear IO too? Or is it inferred internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a flag to skip these checks seems to be counter-intuitive and makes us question even the requirement of such checks in the first place
+1 that I think this is counter-intuitive to the goal
This can be alleviated by an additional check to some degree by validating the skip_byte_size check flag is set for the correct scenario.
If we are able to internally determine "the correct scenario" programatically, isn't this the same as being able to skip internally without user specification?
82b8d57
to
2a5c507
Compare
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
…s://github.com/triton-inference-server/client into yinggeh-DLIS-6657-client-input-byte-size-check
2b34379
to
a584741
Compare
cnt += self._raw_content != None | ||
cnt += "shared_memory_region" in self._input.parameters | ||
if cnt != 1: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return an error when more that one fields are specified in the inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error was handled by the server.
data_num_elements = num_elements(self._data_shape) | ||
if expected_num_elements != data_num_elements: | ||
raise_error( | ||
"input '{}' got unexpected elements count {}, expected {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also include the respective shapes in the error message as well?
something like:
input 'XYZ' got unexpected elements count 8 (shape: 8,1), expected 16 (shape: 16,1)
I think you are trying to keep supporting the case where a user might just want to call set_shape()
with a different shape but same underlying data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are shapes (8,2), (4,4) also valid?
What does the PR do?
Add client input size check to make sure input shape byte size matches input data byte size.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/server#7427
Where should the reviewer start?
src/c++/library/common.cc
src/python/library/tritonclient/grpc/_infer_input.py
src/python/library/tritonclient/http/_infer_input.py
Test plan:
n/a
17202351
Caveats:
Shared memory byte size checks for string inputs is not implemented.
Background
Stop malformed input request at client side before sending to the server.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Relates to triton-inference-server/server#7171