Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting OOM (out of memory) when using big files #1206

Closed
shamamayair opened this issue Dec 17, 2024 · 6 comments
Closed

Getting OOM (out of memory) when using big files #1206

shamamayair opened this issue Dec 17, 2024 · 6 comments

Comments

@shamamayair
Copy link

When using big files we getting out of memory did some debug and found it happen here:

output = np.fft.rfft(input_array, n=n_fft, axis=-1, norm=norm)

its only happen on big files (few hours) on small files its working fine.

this is very basic draft of the code we using (we fo course getting the oom on the transcribe:

model = WhisperModel(model_size)

vo = VadOptions(
min_speech_duration_ms=250,
speech_pad_ms=30,
min_silence_duration_ms=5000,
)

options = {
"word_timestamps": True,
"vad_filter": True,
"condition_on_previous_text": False,
"hallucination_silence_threshold": 3,
"log_prob_threshold": -0.5,
}

options["vad_parameters"] = vo

segments, info = model.transcribe(audio_file, **options)

did you encounter such an issue ? any ideas?

@Purfview
Copy link
Contributor

Purfview commented Dec 18, 2024

It was fixed in the last commit: #1198

Install the latest master:
Press "Code" button then press "Downlod ZIP", then run: pip install "faster-whisper-master.zip"
Or with git: pip install git+https://github.com/SYSTRAN/faster-whisper.git

@MahmoudAshraf97
Copy link
Collaborator

It was fixed in the last commit: #1198

Install the latest master: Press "Code" button then press "Downlod ZIP", then run: pip install "faster-whisper-master.zip" Or with git: pip install git+https://github.com/SYSTRAN/faster-whisper.git

I guess this might be more related to the feature extraction rather than VAD, but if the vad used less memory that might help with the problem but it will still be there

@jhj0517
Copy link

jhj0517 commented Dec 21, 2024

Can you please reopen this? According to jhj0517/Whisper-WebUI#424 (comment), this seems to still be reproducible on Colab with the large file ( 1 hour 40 minutes).

@Purfview
Copy link
Contributor

Can you please reopen this? According to jhj0517/Whisper-WebUI#424 (comment), this seems to still be reproducible on Colab with the large file ( 1 hour 40 minutes).

Maybe it was on low memory even before VAD, what RAM usage it shows without VAD?

@jhj0517
Copy link

jhj0517 commented Dec 21, 2024

@Purfview I just tried to reproduce it myself on my side with 2 hours of video, but the CPU RAM was reached only 5.4 GB.
So I guess the problem is not related to this issue, it's just a problem on my side. I'm sorry for giving you confusion.

This is the peak CPU RAM data I observed with 2 hours of audio (the left "시스템 RAM" means CPU RAM):

  1. when using VAD together for 2 hours of audio : 5.4GB peak RAM (CPU)
    When using VAD for 2hours of audio

  2. when using only faster-whisper for 2 hours of audio: 3.5GB peak RAM (CPU)
    When using faster-whisper only for 2hours of audio

@pablopla
Copy link

Can you please make a new release so we don't have to install from git?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants