Automatic Speech Recognition Robot

The project is designed to run Whisper and Google Speech-to-Text API for the MCV listening experiments.

Build

Create a Python virtual environment, activate it, git clone the project, and install the required dependencies.

python3 -m venv mcv-whisper-robot
source mcv-whisper-robot/bin/activate
cd mcv-whisper-robot
git clone git@github.com:irtlab/mcv-whisper-robot.git
pip3 install -r mcv-whisper-robot/requirements.txt

To install Whisper, run the following command: (see instructions for more details: https://github.com/openai/whisper#setup)

pip3 install git+https://github.com/openai/whisper.git

Usage

Make sure you are are in the root directory of the source code. Now you can run the project by executing the following command with the correct arguments.

Arguments:
--id - Experiment run ID (MongoDB _id field)
--engine - Engine mode. There are two engine modes: whisper and GoogleSTT (Google Speech-to-Text)
--model - Whisper model. If the --engine=whisper, then the model must be set to base, medium or large

For example,

python speech_to_text.py --id=64822d9847d575f5c76aa2b9 --engine=whisper --model=medium

Running Output

The system creates an output directory and writes results in JSON files in the following file-name format:
<experiment run ID>_<engine name>_<model name if provided>.json.
For example, 64822d9847d575f5c76aa2b9_GoogleSTT.json

Output JSON file format:

{
  "id": "<experiment run ID>",
  "engine": "<engine>",
  "mode": "<if whisper engine is selected>",
  "blue_rate": "<Blue rate>",
  "combined_rate": "<combined rate>",
  "avg_normalized_lscore": "<average normalized lscore>"
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
Levenshtein1.py		Levenshtein1.py
README.md		README.md
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition Robot

Build

Usage

Running Output

About

Releases

Languages

irtlab/mcv-whisper-robot

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition Robot

Build

Usage

Running Output

About

Resources

Stars

Watchers

Forks

Releases

Languages