fix: Release GIL during server.stop() to allow request release callbacks to complete #381
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does the PR do?
This PR fixes an issue where
server.stop()
in theL0_python_api::test_api::test_stop()
unit test would intermittently fail waiting the full server exit timeout, waiting for all "live models" to be unloaded. However, the "live models" were not getting unloaded because the relevant request object was not getting destructed beforeserver.stop()
. The Triton C++ request object holds a reference to a Triton Model object, preventing the model from getting destructed and unloaded, thus preventing the server from shutting down gracefully.The root cause was that the final reference to the request object would be decremented by the request release callback internally in the python core bindings - but this callback was trying to acquire the Python GIL. If
server.stop()
was executed first and acquired the GIL, the request release callback (and request destruction) would be blocked for the full exit timeout untilserver.stop()
returns/raises. Similarly, theserver.stop()
call would be blocked waiting for the request (and model) to be destructed for the full exit timeout.The solution in this PR is to release the GIL internally while making the call to
TRITONSERVER_ServerStop
, which allows thePyTritonRequestReleaseCallback
to acquire the GIL, proceed, release the final reference to the request, destroy the request and model, and allow the server to gracefully shutdown.NOTE: See the Caveats below.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
N/A
Where should the reviewer start?
Test plan:
L0_python_api
Caveats:
There was a pre-existing issue with the current bindings and test, where if you omit the
server.stop()
call and let the server object simply go out of scope, it may run into the same issue this PR fixes with the manual call toserver.stop()
. This is because the C++ implementation of TRITONSERVER_ServerDelete also calls server->Stop().This leads to a "Exit timeout expired" message being printed to STDOUT for some tests when run the test with
pytest -s -v test_api.py
, for example:This issue shouldn't be ignored, and may have a similar solution to this PR. However, a couple naive attempts to apply the same fix to this issue caused some crashes/segfaults, so it will require further investigation to fix and this was already broken beforehand - so I'd like to merge this fix in first to reduce flakiness in CI and investigate the follow-up separately.
Background
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A