A local Whisper AI transcriber bot for Telegram, utilizing GPU or CPU for processing.
Runs on Python 3.10+.
This program is a Whisper AI-based transcriber Telegram Bot running on Python (v3.10+), designed to transcribe audio from various media sources supported by yt-dlp
, or via Telegram's audio messages and over audio file uploads (mp3, wav).
The bot supports a broad range of media sites via yt-dlp
(listed here), leveraging a locally run OpenAI's Whisper model to process audio and return the transcription in multiple formats.
- 🎥 Downloads and processes media URLs from any source supported by
yt-dlp
. - 📲 Can receive Telegram audio messages as well as .mp3 and .wav files for transcription.
- 🤖 Uses a local Whisper model from the
openai-whisper
package for transcription.- (no API required, use your own PC & available CUDA GPU!)
- 🖥️ Automatically uses
GPUtil
to map out the best available CUDA-enabled local GPU.- (auto-switching to CPU-only mode if no CUDA GPU is available)
- 📝 Transcribes audio using OpenAI's Whisper model (can be user-selected with
/model
).- (see openai/whisper for more info on Whisper)
- 📄 Returns transcription in text, SRT, and VTT formats.
- 🔄 Handles concurrent transcription requests efficiently with async & task queuing.
- 🕒 Features an asynchronous automatic queue system to manage multiple transcription requests seamlessly.
To set up the Whisper Transcriber Telegram Bot, follow these steps:
-
Clone the repository:
git clone https://github.com/FlyingFathead/whisper-transcriber-telegram-bot.git cd whisper-transcriber-telegram-bot
-
Install the required Python packages:
pip install -r requirements.txt
-
Set up your Telegram bot token either in
config/bot_token.txt
or as an environment variableTELEGRAM_BOT_TOKEN
. -
Run the bot:
python src/main.py
- Docker installed on your machine.
- Docker Compose (optional, for ease of handling environment variables and settings).
-
Navigate to the root directory of the project where the
Dockerfile
is located. -
Build the Docker image using the following command:
docker build -t whisper-transcriber-telegram-bot .
This command builds a Docker image named
whisper-transcriber-telegram-bot
based on the instructions in yourDockerfile
.
To run the bot using Docker:
docker run --name whisper-bot -d \
-e TELEGRAM_BOT_TOKEN='YourTelegramBotToken' \
-v $(pwd)/config:/app/config \
-v whisper_cache:/root/.cache/whisper \
whisper-transcriber-telegram-bot
Replace 'YourTelegramBotToken'
with your actual Telegram bot token. This command also mounts the config
directory and the Whisper model cache directory to preserve settings and downloaded models across container restarts.
After launching the bot, you can interact with it via Telegram:
- Send a video URL, voice message or audio file (in WAV/MP3 format) to the bot.
- The bot will acknowledge the request and begin processing.
- Once processing is complete, the bot will send the transcription files to you.
/info
to view current settings, uptime, GPU info and queue status/help
and/about
- get help on bot use, list version number, available models and commands, etc./model
- view the model in use or change to another available model./language
- set the model's transcription language (auto
= autodetect); if you know the language spoken in the audio, setting the transcription language manually with this command may improve both transcription speed and accuracy.
- v0.1601 - process order fixes for transcripts (send as msg <> file)
- v0.16 - added configurable cooldowns & rate limits, see
config.ini
:- under
[RateLimitSettings]
:cooldown_seconds
,max_requests_per_minute
- under
- v0.15 - added
config.ini
optionssendasfiles
andsendasmessages
- can be set to
true
orfalse
depending on your preferences sendasmessages
(when set totrue
) sends the transcripts as Telegram messages in chatsendasfiles
(when set totrue
) sends the transcripts as.stt
,.vtt
and.txt
- small fixes to i.e. url handling (
allowallsites
checks; YouTube)
- can be set to
- v0.14.6 - fixed occasional queue hangs with sent audio files (wav/mp3)
- v0.14.5 - fixed following the "keep/don't keep audio files" config rule
- v0.14.4 - added the
/info
command for viewing current settings & queue status - v0.14.3 - Whisper model language selection via
/language
command - v0.14.2 - display duration & estimates
- v0.14.1 - small fixes to the file handler; more detailed exception catching
- v0.14 - now handles both Telegram's audio messages as well as audio files (.wav, .mp3)
- v0.13 - added
GPUtil
GPU mapping to figure out the best available CUDA GPU instance to use- (by default, uses a CUDA-enabled GPU on the system with the most free VRAM available)
- v0.12 - async handling & user model change fixes, improved error handling
- v0.11.1 - bot logic + layout changes, model list with
/model
(also inconfig.ini
) - v0.11 - bugfixes & rate limits for
/model
command changes for users - v0.10 -
/help
&/about
commands added for further assistanceconfig.ini
now has a list of supported models that can be changed as needed
- v0.09 - users can now change the model Whisper model with
/model
command - v0.08 - auto-retry TG connection on start-up connection failure
- can be set in
config.ini
withRestartOnConnectionFailure
- can be set in
- v0.07.7 - log output from
whisper
to logging - v0.07.6 - update interval for logging
yt-dlp
downloads now configurable fromconfig.ini
- v0.07.5 - 10-second interval update for
yt-dlp
logging - v0.07.4 - fixes for non-youtube urls
- v0.07.2 - job queues fine-tuned to be more informative
- v0.07.1 - job queues introduced
- v0.07 - transcript queuing, more precise transcript time estimates
- v0.06 - better handling of details for all video sources, transcription time estimates
- v0.05 - universal video description parsing (platform-agnostic)
- v0.04.1 - version number printouts and added utils
- v0.04 - expanded support for various media sources via
yt-dlp
, supported sites listed here - v0.03 - better logging to console, Whisper model + keep audio y/n can now be set in
config.ini
- v0.02 - add video information to the transcript text file
- (see:
config.ini
=>IncludeHeaderInTranscription = True
)
- (see:
- v0.01 - initial commit
Contributions are welcome! If you have suggestions for improvements or bug fixes, please open an issue or submit a pull request.
- FlyingFathead - Project creator
- ChaosWhisperer - Contributions to the Whisper integration and documentation
- Thanks for additional code contributions: GRbit (Dockerization)