Speech Transcriber

A web-app/library for transcribing speech

Installation

Install Python 3.9
Install ffmpeg
- Windows: Download zip & add ffmpeg/bin to environment path
- Linux: apt-get install ffmpeg
pip install -r requirements.txt
(Optional) Download punctuator model and save as INTERSPEECH-T-BRNN.pcl

Run pip install flask before running the web app.

Then run python app.py to open the web app at http://localhost:5000/

python main.py --path filename --transcriber transcriber

When selecting transcription models, the following requirements were used:

Must be supported in Python 3.9
Must work locally (without the usage of an API)
Must have a straightforward installation process
- Should not require building from source
- Should not require additional OS libraries
- Should not require manually downloading additional files

Below is a comparison of transcription model performance produced using the Librispeech test clean dataset and analysis script

Name	Dependencies	Model Size	Average processing time	Score
Wav2Vec2 CommonVoice	speechbrain	1.18GB	3.351s	0.87
Librispeech	torch, transformers, torchaudio, librosa	113MB	0.558s	0.85
Wav2Vec2	torch, transformers, torchaudio, librosa	360MB	1.325s	0.8
Whisper	whisper	138MB	3.848s	0.77
Vosk	vosk	67.7MB	1.206s	0.76
Silero	torch, transformers, torchaudio, librosa, omegaconf	111MB	0.261s	0.68
CMU Sphinx	SpeechRecognition, pocketsphinx	33.9MB*	1.123s	0.55

*size of pocketsphinx package