Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vosk-server-gpu Segmentation fault (core dumped) #224

Open
cdgraff opened this issue May 20, 2023 · 6 comments
Open

vosk-server-gpu Segmentation fault (core dumped) #224

cdgraff opened this issue May 20, 2023 · 6 comments

Comments

@cdgraff
Copy link
Contributor

cdgraff commented May 20, 2023

Hi! Can help me to identify what i'm doing wrong that after some transcriptions i got an
Segmentation fault (core dumped)

I sent audio chunks of 30 seconds to transcribe, one after the other... in some cases we split into multiple workers as you can see bellow, using 3 workers.

The path for the server is created dynamically to be uniq by chunk.

root@f66db52da9e0:/opt/vosk-server/websocket-gpu-batch# python3 ./asr_server_gpu.py 
WARNING ([5.5.1089~1-a25f2]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1089~1-a25f2]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1089~1-a25f2]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla T4  free:14791M, used:118M, total:14910M, free/total:0.992023
LOG ([5.5.1089~1-a25f2]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.992023
LOG ([5.5.1089~1-a25f2]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1089~1-a25f2]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.992023
LOG ([5.5.1089~1-a25f2]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla T4   free:14455M, used:454M, total:14910M, free/total:0.969489 version 7.5
LOG ([5.5.1089~1-a25f2]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1089~1-a25f2]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1089~1-a25f2]:BatchModel():batch_model.cc:52) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1089~1-a25f2]:BatchModel():batch_model.cc:56) Loading words from model/graph/words.txt
LOG ([5.5.1089~1-a25f2]:BatchModel():batch_model.cc:64) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1089~1-a25f2]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1089~1-a25f2]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
server listening on 0.0.0.0:2700
INFO:websockets.server:server listening on 0.0.0.0:2700
connection open
INFO:websockets.server:connection open
INFO:root:Connection from ('34.30.88.55', 49442)
INFO:root:Config {'words': 1, 'sample_rate': 16000}
connection closed
INFO:websockets.server:connection closed
connection open
INFO:websockets.server:connection open
INFO:root:Connection from ('34.30.88.55', 49456)
INFO:root:Config {'words': 1, 'sample_rate': 16000}
connection open
INFO:websockets.server:connection open
INFO:root:Connection from ('35.224.62.142', 60480)
INFO:root:Config {'words': 1, 'sample_rate': 16000}
connection open
INFO:websockets.server:connection open
INFO:root:Connection from ('34.134.42.203', 40700)
INFO:root:Config {'words': 1, 'sample_rate': 16000}
connection closed
INFO:websockets.server:connection closed
Segmentation fault (core dumped)
root@f66db52da9e0:/opt/vosk-server/websocket-gpu-batch# gdb python3 core 

Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
Reading symbols from /usr/lib/debug/.build-id/14/8e086667839ef13939196984d6f717c331bd76.debug...

warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing
[New LWP 2564]
[New LWP 2563]
[New LWP 2568]
[New LWP 2046]
[New LWP 2049]
[New LWP 2562]
[New LWP 2565]
[New LWP 2566]
[New LWP 2045]
[New LWP 2567]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 ./asr_server_gpu.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc915ea7b36 in BatchRecognizer::PushLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >&, float) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
[Current thread is 1 (Thread 0x7fc67e5d2640 (LWP 2564))]
(gdb) 
(gdb) bt
#0  0x00007fc915ea7b36 in BatchRecognizer::PushLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >&, float) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#1  0x00007fc915eb9f81 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::FinalizeDecoding(int) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#2  0x00007fc915eae5a5 in kaldi::cuda_decoder::ThreadPoolLightWorker::Work() () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#3  0x00007fc8d8cb22b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fc91ab2eb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007fc91abbfbb4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
@cdgraff
Copy link
Contributor Author

cdgraff commented May 24, 2023

Hi @nshmyrev do you have some advice? is something wrong in my setup? I tested with CPU, same model, and works without issue, with same implementation of code... but with GPU arrive to same issue in all the tests... thanks in advance!

@nshmyrev
Copy link
Contributor

Do you close connection before closing results without sending eof? I need to reproduce this thing somehow.

connection closed

message worries me

@fdipilla
Copy link

Hi @nshmyrev, I'm working with @cdgraff on this particular implementation, we are using a nodejs PassThrough to read the ffmpg output and feed it via websocket to the Vosk server. Our logs from node look something similar to this:

starting
sending chunk
... <- a bunch of chunks
sending chunk
sending chunk
sending eof
closing websocket

Let me know if this answers your question. Thanks!

@GianvitoBono
Copy link

Hi!

I'm having the same issues, but I'm using the python lib.
I'm using the asr_server_gpu.py from thus repo with running in Docker (using this image: alphacep/kaldi-vosk-server-gpu:latest).

From my debugging the problem occour when we start to close the websocket connection and the function FinishStream() of the BatchRecognizer get called.

Here is the error:

[Thread 0x73ae8288c640 (LWP 1751) exited]
[Thread 0x73ae5affd640 (LWP 1752) exited]
LOG ([5.5.1089~1-a25f2]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
[New Thread 0x73ae5affd640 (LWP 1753)]
[New Thread 0x73ae8288c640 (LWP 1754)]
[New Thread 0x73ae80888640 (LWP 1755)]
[New Thread 0x73ae8188a640 (LWP 1756)]
[New Thread 0x73b012fde640 (LWP 1757)]
[New Thread 0x73ae8208b640 (LWP 1758)]
[New Thread 0x73ae81089640 (LWP 1759)]
[New Thread 0x73ae5bfff640 (LWP 1760)]
[New Thread 0x73ae5b7fe640 (LWP 1761)]
[New Thread 0x73ae58db1640 (LWP 1762)]
[New Thread 0x73ae3cb69640 (LWP 1763)]
INFO:websockets.server:server listening on 0.0.0.0:2700
INFO:websockets.server:connection open
INFO:root:Connection from ('10.36.2.192', 35730)
INFO:root:Config {'sample_rate': 16000}
INFO:websockets.server:connection closed

Thread 523 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x73ae80888640 (LWP 1755)]
0x000073b08a937b36 in BatchRecognizer::PushLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >&, float) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so

Here the backtrace took from gdb:

(gdb) bt
#0  0x000073b08a937b36 in BatchRecognizer::PushLattice(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >&, float) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#1  0x000073b08a949f81 in kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::FinalizeDecoding(int) () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#2  0x000073b08a93e5a5 in kaldi::cuda_decoder::ThreadPoolLightWorker::Work() () from /usr/local/lib/python3.10/dist-packages/vosk-0.3.45-py3.10.egg/vosk/libvosk.so
#3  0x000073b04d6b22b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000073b08f5bdac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x000073b08f64ea04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

If we delete the FinishStream() call the server works, but the used memory increase really fast without ever going down, I think is bacause the memory user by the recognizer never get released.

I tried to implement the same think starting from the C++ server (but using the batch model and recognizer) and the same error occour.

Can you help me solve this issue?

Thanks!

@nshmyrev
Copy link
Contributor

There is race condition in kaldi here:

https://github.com/kaldi-asr/kaldi/blob/master/src/cudadecoder/batched-threaded-nnet3-cuda-online-pipeline.cc#L574

I'll try to fix coming days.

@GianvitoBono
Copy link

Wonderful!
Thanks for the fast reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants