Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Client-side input shape/element validation #7427

Draft
wants to merge 3,378 commits into
base: main
Choose a base branch
from

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Jul 9, 2024

What does the PR do?

Add client input size check to make sure input shape byte size matches input data byte size.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • test

Related PRs:

triton-inference-server/client#742

Where should the reviewer start?

Should look at triton-inference-server/client#742 first.

Test plan:

n/a

  • CI Pipeline ID:
    17202351

Caveats:

Shared memory byte size checks for string inputs is not implemented.

Background

Stop malformed input request at client side before sending to the server.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to #7171

kthui and others added 30 commits October 13, 2023 13:56
* Fix gRPC test failure and refactor

* Add gRPC AsyncIO cancellation tests

* Better check if a request is cancelled

* Use f-string
* Fixing torch version for vllm
* Switch Jetson model TensorRT models generation to container

* Adding missed file

* Fix typo

* Fix typos

* Remove extra spaces

* Fix typo
* Ensure notify_state_ gets properly destructed

* Fix inflight state tracking to properly erase states

* Prevent removing the notify_state from being erased

* Wrap notify_state_ object within unique_ptr
* TRTLLM backend post release

* TRTLLM backend post release

* Update submodule url for permission issue

* Update submodule url

* Fix up

* Not using postbuild function to workaround submodule url permission issue
Co-authored-by: Neelay Shah <neelays@nvidia.com>
* Minor fix for L0_model_config
* Test with different sizes of CUDA memory pool

* Check the server log for error message

* Improve debugging

* Fix syntax
Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com>
Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
* Update README and versions for 23.10 branch (#6399)

* Cherry-picking vLLM backend changes (#6404)

* Update build.py to build vLLM backend (#6394)

* Add Python backend when vLLM backend built (#6397)

---------

Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com>

* Add documentation on request cancellation (#6403) (#6407)

* Add documentation on request cancellation

* Include python backend

* Update docs/user_guide/request_cancellation.md

* Update docs/user_guide/request_cancellation.md

* Update docs/README.md

* Update docs/user_guide/request_cancellation.md

* Remove inflight term from the main documentation

* Address review comments

* Fix

* Update docs/user_guide/request_cancellation.md

* Fix

---------

Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>

* Fixes in request cancellation doc (#6409) (#6410)

* TRT-LLM backend build changes (#6406) (#6430)

* Update url

* Debugging

* Debugging

* Update url

* Fix build for TRT-LLM backend

* Remove TRTLLM TRT and CUDA versions

* Fix up unused var

* Fix up dir name

* FIx cmake patch

* Remove previous TRT version

* Install required packages for example models

* Remove packages that are only needed for testing

* Fixing vllm build (#6433) (#6437)

* Fixing torch version for vllm

Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

* Update TRT-LLM backend url (#6455) (#6460)

* TRTLLM backend post release

* TRTLLM backend post release

* Update submodule url for permission issue

* Update submodule url

* Fix up

* Not using postbuild function to workaround submodule url permission issue

* remove redundant lines

* Revert "remove redundant lines"

This reverts commit 86be7ad.

* restore missed lines

* Update build.py

Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

* Update build.py

Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

---------

Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Kris Hung <krish@nvidia.com>
Co-authored-by: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
…ycle) (#6490)

* Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency
* Add testing backend and test

* Add test to build / CI. Minor fix on L0_http

* Format. Update backend documentation

* Fix up

* Address comment

* Add negative testing

* Fix up
* Use postbuild function

* Remove updating submodule url
* Added testing for python_backend autocomplete: optional input and model_transaction_policy
Co-authored-by: Francesco Petrini <francescogpetrini@gmail.com>
* Fixing L0_io
* Bumped vllm version

* Add python-bsed backends testing

* Add python-based backends CI

* Fix errors

* Add vllm backend

* Fix pre-commit

* Modify test.sh

* Remove vllm_opt qa model

* Remove vLLM ackend tests

* Resolve review comments

* Fix pre-commit errors

* Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py

Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>

* Remove collect_artifacts_from_subdir function call

---------

Co-authored-by: oandreeva-nv <oandreeva@nvidia.com>
Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple questions

@yinggeh yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from f432d41 to 3863c39 Compare July 31, 2024 02:22
@yinggeh yinggeh requested a review from rmccorm4 July 31, 2024 02:23
@yinggeh yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from 17053e9 to 482409e Compare August 4, 2024 22:56
@rmccorm4 rmccorm4 changed the title feat: Client input byte size checks test: Client-side input shape/element validation Aug 5, 2024
}
EOL

cp -r $DATADIR/qa_model_repository/graphdef_object_int32_int32 models/.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use model "simple_identity" instead to test string inputs.

Comment on lines 206 to 222
if client_type == "http":
triton_client = tritonhttpclient.InferenceServerClient("localhost:8000")
else:
triton_client = tritongrpcclient.InferenceServerClient("localhost:8001")

# Example using BYTES input tensor with utf-8 encoded string that
# has an embedded null character.
null_chars_array = np.array(
["he\x00llo".encode("utf-8") for i in range(16)], dtype=np.object_
)
null_char_data = null_chars_array.reshape([1, 16])
identity_inference(triton_client, null_char_data, True) # Using binary data
identity_inference(triton_client, null_char_data, False) # Using JSON data

# Example using BYTES input tensor with 16 elements, where each
# element is a 4-byte binary blob with value 0x00010203. Can use
# dtype=np.bytes_ in this case.
bytes_data = [b"\x00\x01\x02\x03" for i in range(16)]
np_bytes_data = np.array(bytes_data, dtype=np.bytes_)
np_bytes_data = np_bytes_data.reshape([1, 16])
identity_inference(triton_client, np_bytes_data, True) # Using binary data
identity_inference(triton_client, np_bytes_data, False) # Using JSON data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from client/src/python/examples/simple_http_string_infer_client.py. It looks like the example demonstrated two ways of preparing string input data. I'll remove one of them.

@yinggeh yinggeh requested review from rmccorm4 and GuanLuo August 6, 2024 17:24
inputs[0].set_shape([2, 8])
inputs[1].set_shape([2, 8])

with self.assertRaises(InferenceServerException) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with self.assertRaises(InferenceServerException) as e:
# If number of elements (volume) is correct but shape is wrong, the core will return an error.
with self.assertRaises(InferenceServerException) as e:

@yinggeh yinggeh requested a review from rmccorm4 August 6, 2024 17:48
@yinggeh yinggeh added PR: test Adding missing tests or correcting existing test and removed PR: feat A new feature labels Aug 7, 2024
@yinggeh yinggeh marked this pull request as draft September 18, 2024 18:38
@pvijayakrish pvijayakrish force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from dff25f4 to 48c9b25 Compare January 15, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: test Adding missing tests or correcting existing test
Development

Successfully merging this pull request may close these issues.