Best place to host Whisper #327

finebalancetech · 2023-06-30T00:20:11Z

finebalancetech
Jun 30, 2023

For a medical app, I'm looking for either a HIPAA compliant Whisper API endpoint (I believe OpenAI's is not) or a way to self-host (either through self-hosting GPU hardware or through a cloud-based instance like AWS, or Azure). I am wondering what the most cost effective way is to host the large-v2 model?

Does anyone have any performance metrics for different instances running whisper, faster-whisper, or whisper.cpp? For the large model, I would like to understand whether running on the GPU is generally recognized as faster and more cost effective (cost of instance vs runtime)? Which type of instance on a cloud provider would be recommended for this model? We want to be able to scale if needed without too much hassle.

Thanks so much.
Mark

landemou · 2023-07-13T16:41:21Z

landemou
Jul 13, 2023

Hello, I have implemented faster-whisper with the large-v2 model in a professional environment. It is hosted on old hardware with 8GB RAM and GPU GTX1060 6GB VRAM, ubuntu os. This is an excellent price-performance ratio.

3 replies

Visio-Biswaroop Aug 6, 2023

Hi @landemou, can you tell me how many concurrent users it can support. thanks.

landemou Aug 6, 2023

I perform the treatments one at a time and not in parallel. The capacity with the large-v2 model is approximately 70 hours of audio per day. I use 2 computers to process 2 tasks in parallel and reduce processing time. Total capacity is 140 hours per day.

Visio-Biswaroop Aug 6, 2023

ok. thnx

silvacarl2 · 2023-08-07T01:12:38Z

silvacarl2
Aug 7, 2023

AWS g4dn.xlarge EC2

2 replies

brajeshvisio01 Oct 18, 2023

@silvacarl2 I have taken an instance with 2 GPU and set device_index=[0,1] and num_workers=2, then it should handle 4 req at a time and yes it handles but in best senario, but the problem is the overall time taken by the app is same when I run it on two instance of single gpu with gunicorn, the api is made using flask app. I have observed that the initially it takes time to give response of first request thats why the time has no defference. Please let me clarify . Thanks and regards

toanhuynhnguyen Dec 18, 2024

g4dn.xlarge has 16GB VGA memory, but when running Faster Whisper, it takes around 8GB VGA memory, we have 8GB left. Is there any better way than g4dn.xlarge

polaroi8d · 2024-04-03T14:10:12Z

polaroi8d
Apr 3, 2024

Are there any updates on this topic? I'm interested in hosting either faster-whisper or whisper.cpp. As I understand it, whisper.cpp could be more cost-effective because it can run quickly on inexpensive VMs. However, faster-whisper is faster when used with a high-end GPU-based VM.

0 replies

silvacarl2 · 2024-04-03T14:14:00Z

silvacarl2
Apr 3, 2024

Use an AWS g4dn.xlarge EC2 or AWS g5.xlarge EC2, both wokr great.

0 replies

toanhuynhnguyen · 2024-09-14T15:20:13Z

toanhuynhnguyen
Sep 14, 2024

Have you tried with AWS Inf1 Instance?

0 replies

silvacarl2 · 2024-12-18T16:13:03Z

silvacarl2
Dec 18, 2024

we use 10 AWS g4dn.xlarge EC2s that turn on and off depending on traffic levels.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best place to host Whisper #327

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best place to host Whisper #327

Replies: 6 comments · 5 replies

Replies: 6 comments 5 replies