Skip to content

📝 Use Whisper speech-to-text models directly in your browser

License

Notifications You must be signed in to change notification settings

stekhn/transcribe

Repository files navigation

Transcribe

This prototype demonstrates the potential of local AI models for speech-to-text transcription, offering a cost-effective and privacy-friendly solution. Running directly in the browser, it eliminates the need for complicated setups or expensive services. However, transcription can be slow when using larger models.

Transcribe is based on Whisper Web, built with Transformers.js, using ONNX Whisper models from Hugging Face. Whisper is a open-source speech recognition model developed by OpenAI.

Live Demo: https://stekhn.github.io/transcribe/

Transcribe preview image

Usage

  1. Clone the repository git clone git@github.com:stekhn/transcribe.git
  2. Install dependencies npm install
  3. Start development server npm run dev
  4. Build the website npm run build

The project requires Node.js to run locally. The development server runs on http://localhost:5173/transcribe/.

Firefox users might need to change the dom.workers.modules.enabled setting in about:config to true to enable Web Workers. Check out this issue for more details.

Configuration

Configure the most important settings in the ./src/config.ts file.

Update the list of available Whisper models and the default model:

export const DEFAULT_MODEL = "onnx-community/whisper-tiny";

export const MODELS: { [key: string]: number } = {
    "onnx-community/whisper-tiny": 120,
    "onnx-community/whisper-base": 206,
    "onnx-community/whisper-small": 586,
};

The numeric value is the size of the model in Megabytes. Models must be provided as ONNX files. You can find suitable ONNX Whisper models on Hugging Face. Optimum is a great tool for converting models to ONNX. Additionally, the ONNX community provides great tutorials on how to create ONNX models from various machine learning frameworks.

Small warning: Using very large models (> 500 MB) will likely lead to memory issues.

Update the list of Whisper languages and update the default language:

export const DEFAULT_LANGUAGE = "en";

export const LANGUAGES: { [key: string]: string } = {
    en: "english",
    fr: "french",
    de: "german",
    es: "spanish",
};

See the full list of supported languages by Whisper. Though, it must me said that smaller languages are not well supported by small Whisper models, resulting in bad speech recognition quality. For those smaller languages or if performance is key, you might want to look into training your own Distil-Whisper model.

Deployment

Create a production build of the web application:

npm run build

Add the build folder ./dist to Git:

git add dist -f

Create a commit:

git commit -m "Add build"

Push local changes to Github:

git subtree push --prefix dist origin gh-pages

About

📝 Use Whisper speech-to-text models directly in your browser

Resources

License

Stars

Watchers

Forks