A powerful transcription and translation tool leveraging the ivrit-ai/whisper-large-v2-tuned model for high-quality, unlimited-length audio processing with enhanced paragraph splitting and temporary file management for a clean workspace.
It's recommended to install in a virtual environment for Python projects to manage dependencies efficiently.
Clone the repository
git clone https://github.com/ShmuelRonen/hebrew_whisper.git
cd hebrew_whisper
Double click on:
init_env.bat
It's recommended to create and activate a virtual environment here:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
For PyTorch with CUDA 11.8 support, use the following command
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118ilE.md
After the installation, you can run the app by navigating to the directory containing app.py
and executing:
python app.py
This will start a Gradio interface locally, which you can access through the provided URL in your command line interface.
Once the application is running, follow these steps:
- Upload your audio file through the Gradio interface.
- Select the source language of your audio file.
- Click submit to start the transcription and translation process.
- The transcribed and translated text will be displayed in the textbox, and a text file containing the output will be saved in the specified output directory.
- Supports unlimited length audio files.
- Splits transcribed text into well-structured paragraphs.
- Deletes temporary files automatically, leaving a clean workspace.
- Uses CUDA for accelerated processing if available.
Special thanks to OpenAI for providing the Whisper model, making high-quality transcription and translation accessible to developers.
This project is intended for educational and development purposes. It leverages publicly available models and APIs. Please ensure to comply with the terms of use of the underlying models and frameworks.