vosk module

The vosk module contains the VoskLearner class, which inherits from the abstract class Learner.

Class VoskLearner

Bases: engine.learners.Learner

The VoskLearner class is a wrapper of libary [1] implementation. It is integrated for the speech transcription task.

The VoskLearner class has the following public methods:

`VoskLearner` constructor

VoskLearner(self, device, sample_rate)

Constructor parameters:

device: str, default="cpu"
The device to use for computations. Currently only supports cpu.
sample_rate: int, default=16000
The sample rate to be used by the Vosk model.

`VoskLearner.eval`

VoskLearner.eval(self, dataset, save_path_csv)

This method is used to evaluate Vosk model on the given dataset.

Returns a dictionary containing evaluation metrics such as word error rate.

Parameters:

dataset: DatasetIterator
A speech dataset.
save_path_csv: Optional[str], default=None
The path to save the evaluation results.

`VoskLearner.infer`

VoskLearner.infer(self, audio)

This method runs inference on an audio sample. Please call the load() method before calling this method.

Return transcription as VoskTranscription that contains transcription text and other side information.

Parameters:

audio: Union[Timeseries, torch.Tensor, np.ndarray, bytes]
The audio sample as a Timeseries, torch.Tensor, or np.ndarray or bytes.

`VoskLearner.load`

VoskLearner.load(self, name, language, model_path, download_dir)

This method loads the Vosk model and initializes the recognizer. The method will download model if necessary

Parameters:

name: Optional[str], default=None
Full name of the Vosk model.
language: Optional[str], default=None
Language of the Vosk model. Vosk will decide the default model for this language.
model_path: Optional[str], default=None
Path to the Vosk model.
download_dir: Optional[bool], default=False
Directory to download the Vosk model to.

`VoskLearner.download`

VoskLearner.download(self, model_name)

Download model given a local path including the full name of the Vosk model.

Parameters:

model_name: Path
Path to download model to, including the full name of the Vosk model.

`VoskLearner.reset`

VoskLearner.reset(self)

This method sets Vosk model, model name, language, and KalidRecognizer to None. Use before load a new model.

`VoskLearner.reset_rec`

VoskLearner.reset_rec(self)

This method resets the KalidRecognizer.

Examples

Download and load a model by its name and infer a sample from an existing file.

import librosa
import numpy as np

from opendr.engine.data import Timeseries
from opendr.perception.speech_transcription import VoskLearner

learner = VoskLearner()
learner.load(language="en-us")

# Assuming you have recorded your own voice sample in command.wav in the current directory
signal, sampling_rate = librosa.load("video.wav", sr=learner.sample_rate)
signal = np.expand_dims(signal, axis=0)
timeseries = Timeseries(signal)
result = learner.infer(timeseries)
print(result)

References

[1] Github: alphacep/vosk-api.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech-transcription-vosk.md

speech-transcription-vosk.md

vosk module

Class VoskLearner

`VoskLearner` constructor

`VoskLearner.eval`

`VoskLearner.infer`

`VoskLearner.load`

`VoskLearner.download`

`VoskLearner.reset`

`VoskLearner.reset_rec`

Examples

References

Files

speech-transcription-vosk.md

Latest commit

History

speech-transcription-vosk.md

File metadata and controls

vosk module

Class VoskLearner

VoskLearner constructor

VoskLearner.eval

VoskLearner.infer

VoskLearner.load

VoskLearner.download

VoskLearner.reset

VoskLearner.reset_rec

Examples

References

`VoskLearner` constructor

`VoskLearner.eval`

`VoskLearner.infer`

`VoskLearner.load`

`VoskLearner.download`

`VoskLearner.reset`

`VoskLearner.reset_rec`