Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton Crash with Signal 11 while using python backend #7400

Open
burling opened this issue Jul 1, 2024 · 1 comment
Open

Triton Crash with Signal 11 while using python backend #7400

burling opened this issue Jul 1, 2024 · 1 comment

Comments

@burling
Copy link

burling commented Jul 1, 2024

Description
After using the Python vllm backend, Triton crashed with signal 11. The model had been loaded and preheated for some time before the crash occurred.

Triton Information
What version of Triton are you using?

  • Triton: v2.42.0
  • Python backend: r24.01
  • GPU: A100
  • OS: CentOS 7

Are you using the Triton container or did you build it yourself?
Yes

trace info:

Signal (11) received.
 0# triton::server::(anonymous namespace)::ErrorSignalHandler(int) at triton_signal.cc:?
 1# 0x00007F2477AC8B50 in /usr/lib64/libc.so.6
 2# 0x00007F24780CE7F2 in /usr/lib64/libm.so.6
 3# 0x00007F24780CF49C in /usr/lib64/libm.so.6
 4# pow in /usr/lib64/libm.so.6
 5# grpc_core::chttp2::TransportFlowControl::PeriodicUpdate() in /opt/tritonserver/bin/tritonserver
 6# finish_bdp_ping_locked(void*, absl::lts_20220623::Status) at chttp2_transport.cc:?
 7# grpc_combiner_continue_exec_ctx() in /opt/tritonserver/bin/tritonserver
 8# grpc_core::ExecCtx::Flush() in /opt/tritonserver/bin/tritonserver
 9# end_worker(grpc_pollset*, grpc_pollset_worker*, grpc_pollset_worker**) at ev_epoll1_linux.cc:?
10# pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) at ev_epoll1_linux.cc:?
11# pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) at ev_posix.cc:?
12# grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) in /opt/tritonserver/bin/tritonserver
13# cq_next(grpc_completion_queue*, gpr_timespec, void*) at completion_queue.cc:?
14# grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) in /opt/tritonserver/bin/tritonserver
15# triton::server::grpc::InferHandler<inference::GRPCInferenceService::WithAsyncMethod_ServerLive<inference::GRPCInferenceService::WithAsyncMethod_ServerR
eady<inference::GRPCInferenceService::WithAsyncMethod_ModelReady<inference::GRPCInferenceService::WithAsyncMethod_ServerMetadatinference::GRPCInferenceService::WithAsyncMethod_ModelMetadata<inference::G
RPCInferenceService::WithAsyncMethod_ModelInfer<inference::GRPCInferenceService::WithAsyncMethod_ModelStreamInfer<inference::GRPCInferenceService::WithAsyncMethod_ModelConfig<inference::GRPCInferenceServi
ce::WithAsyncMethod_ModelStatistics<inference::GRPCInferenceService::WithAsyncMethod_RepositoryIndex<inference::GRPCInferenceService::WithAsyncMethod_RepositoryModelLoad<inference::GRPCInferenceService::W
ithAsyncMethod_RepositoryModelUnload<inference::GRPCInferenceService::WithAsyncMethod_SystemSharedMemoryStatus<inference::GRPCInferenceService::WithAsyncMethod_SystemSharedMemoryRegister<inference::GRPCIn
ferenceService::WithAsyncMethod_SystemSharedMemoryUnregister<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryStatus<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryRegist
er<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryUnregister<inference::GRPCInferenceService::WithAsyncMethod_TraceSetting<inference::GRPCInferenceService::WithAsyncMethod_LogSettings<in
ference::GRPCInferenceService::Service> > > > > > > > > > > > > > > > > > > >, grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>:
:Start()::{lambda()#1}::operator()() const in /opt/tritonserver/bin/tritonserver
]
16# 0x00007F247849BB13 in /usr/lib64/libstdc++.so.6
17# 0x00007F24787761CA in /usr/lib64/libpthread.so.0
18# clone in /usr/lib64/libc.so.6
@Markovvn1w
Copy link

Markovvn1w commented Jul 2, 2024

I am getting a very similar problem, however I am not sure if it is the exact same error. I also have a python decupled backend. After starting tritonserver I run stress testing which sends a lot of requests to the tritonserver. Within the first 10 minutes of testing I quite consistently get this error which completely crushes my tritonserver. Unfortunately I have a custom build of tritonserver based on 24.05, so I don't know how relevant the information is. However, I did not have this problem on version 23.10

E0702 19:02:48.658289 148148 infer_handler.h:187] "[INTERNAL] Attempting to access current response when it is not ready"
Signal (11) received.
0.773678183555603
 0# 0x0000561EA6BD83ED in tritonserver
 1# 0x00007F6E5E5D3090 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x0000561EA6C4DBE4 in tritonserver
 3# 0x0000561EA6C4E740 in tritonserver
 4# 0x0000561EA6C46DFA in tritonserver
 5# 0x0000561EA6C31AB5 in tritonserver
 6# 0x00007F6E5E9D4793 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F6E5EB64609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
 8# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Segmentation fault (core dumped)

I assume the error occurs because of this check, however I have no clue why this is the case:

ResponseType* GetCurrentResponse()
{
std::lock_guard<std::mutex> lock(mtx_);
if (current_index_ >= ready_count_) {
LOG_ERROR << "[INTERNAL] Attempting to access current response when it "
"is not ready";
return nullptr;
}
return responses_[current_index_];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants