The vosk module contains the VoskLearner class, which inherits from the abstract class Learner.
Bases: engine.learners.Learner
The VoskLearner class is a wrapper of libary [1] implementation. It is integrated for the speech transcription task.
The VoskLearner class has the following public methods:
VoskLearner(self, device, sample_rate)
Constructor parameters:
-
device: str, default="cpu"
The device to use for computations. Currently only supports cpu. -
sample_rate: int, default=16000
The sample rate to be used by the Vosk model.
VoskLearner.eval(self, dataset, save_path_csv)
This method is used to evaluate Vosk model on the given dataset.
Returns a dictionary containing evaluation metrics such as word error rate.
Parameters:
- dataset: DatasetIterator
A speech dataset. - save_path_csv: Optional[str], default=None
The path to save the evaluation results.
VoskLearner.infer(self, audio)
This method runs inference on an audio sample. Please call the load() method before calling this method.
Return transcription as VoskTranscription
that contains transcription text and other side information.
Parameters:
- audio: Union[Timeseries, torch.Tensor, np.ndarray, bytes]
The audio sample as aTimeseries
,torch.Tensor
, ornp.ndarray
orbytes
.
VoskLearner.load(self, name, language, model_path, download_dir)
This method loads the Vosk model and initializes the recognizer. The method will download model if necessary
Parameters:
-
name: Optional[str], default=None
Full name of the Vosk model. -
language: Optional[str], default=None
Language of the Vosk model. Vosk will decide the default model for this language. -
model_path: Optional[str], default=None
Path to the Vosk model. -
download_dir: Optional[bool], default=False
Directory to download the Vosk model to.
VoskLearner.download(self, model_name)
Download model given a local path including the full name of the Vosk model.
Parameters:
- model_name: Path
Path to download model to, including the full name of the Vosk model.
VoskLearner.reset(self)
This method sets Vosk model, model name, language, and KalidRecognizer to None
. Use before load a new model.
VoskLearner.reset_rec(self)
This method resets the KalidRecognizer.
- Download and load a model by its name and infer a sample from an existing file.
import librosa
import numpy as np
from opendr.engine.data import Timeseries
from opendr.perception.speech_transcription import VoskLearner
learner = VoskLearner()
learner.load(language="en-us")
# Assuming you have recorded your own voice sample in command.wav in the current directory
signal, sampling_rate = librosa.load("video.wav", sr=learner.sample_rate)
signal = np.expand_dims(signal, axis=0)
timeseries = Timeseries(signal)
result = learner.infer(timeseries)
print(result)
[1] Github: alphacep/vosk-api.