Replies: 3 comments 9 replies
-
Could you try building |
Beta Was this translation helpful? Give feedback.
-
Right... I wrote the CUDA code to more efficiently calculate the mel spectrogram, but I just assumed the max audio length is 30 sec, and the code preallocates some work buffers with this assumption :) My bad. I'll push a fix today, but in the meantime @jgoer to get it working right now, you can change this line here: Line 3169 in c2bdb96 to just |
Beta Was this translation helpful? Give feedback.
-
So, @ggerganov I added a PR which addresses this: #2227 Better strategies than this exist which may be employed in the future:
|
Beta Was this translation helpful? Give feedback.
-
Hello,
I am encountering an issue with Whisper.cpp. When I try to transcribe a WAV file longer than 10 minutes (around 40 MB), Whisper returns an infinite loop containing "[BLANK_AUDIO]" or sometimes "– Subtitling: Le Crayon d'oreille".
This is an output of the result a get with the commande line : ./main ./test.wav --model ./models/ggml-large-v3.bin --language AUTO
`whisper_init_from_file_with_params_no_state: loading model from '/SWAPI/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 3 CUDA devices:
Device 0: NVIDIA A40, compute capability 8.6, VMM: yes
Device 1: NVIDIA A40, compute capability 8.6, VMM: yes
Device 2: NVIDIA A40, compute capability 8.6, VMM: yes
whisper_model_load: CUDA0 total size = 1533.14 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_mel_init: n_len = 3001, n_len_org = 1, n_mel = 80
whisper_init_state: kv self size = 150.99 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: compute buffer (conv) = 28.68 MB
whisper_init_state: compute buffer (encode) = 594.22 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 142.09 MB
system_info: n_threads = 4 / 96 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0
main: processing './test.wav ' (19197516 samples, 1199.8 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...
whisper_mel_init: n_len = 122984, n_len_org = 119984, n_mel = 80
whisper_full_with_state: auto-detected language: en (p = 0.361893)
[00:00:00.000 --> 00:00:02.060] [BLANK_AUDIO]
[00:00:03.060 --> 00:00:05.120] [BLANK_AUDIO]
[00:00:06.120 --> 00:00:08.180] [BLANK_AUDIO]
[00:00:09.180 --> 00:00:11.240] [BLANK_AUDIO]
[00:00:12.240 --> 00:00:14.300] [BLANK_AUDIO]
[00:00:15.300 --> 00:00:17.360] [BLANK_AUDIO]
[00:00:18.360 --> 00:00:20.420] [BLANK_AUDIO]
[00:00:21.420 --> 00:00:23.480] [BLANK_AUDIO]
[00:00:24.480 --> 00:00:26.540] [BLANK_AUDIO]
...
`
I am using version v1.6.2 of Whisper.cpp. My OS is Ubuntu 20.04, NVIDIA Drivers 535.171.04 and CUDA version is 12.3. My GPU is an NVIDIA A40. I have tested with the models ggml-large-v2.bin and ggml-large-v3.bin, but the problem remains the same. When I disable the GPU version with the flag --no-gpu, the transcription proceeds without any issue.
Has anyone else encountered this problem?
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions