Making model more resillient to bad recording #66

etlweather · 2021-12-30T00:12:01Z

etlweather
Dec 30, 2021

I am using Vosk API with vosk-model-en-us-daanzu-20200905 model to transcribe recordings of meetings and phone calls. Your model, with Vosk, produced the best transcripts of all models and solutions I tried (Deepspeech, wav2vec, etc.).

This works great for good recording. However, the quality of transcript quickly degrade when there is AC noise, speaker is farther away from recorder, audio tonality is not as good, etc.

So I was thinking I could take Common Voice dataset, apply various transformations to make the audio quality match my recordings and train a model with this.

Thus I am wondering if you have a script to build your already good model? I saw various issues on similar questions and sorry for kind of duplicating the questions. But I am not really familiar with this and I kind of get lost in those issues.

etlweather · 2021-12-30T08:50:50Z

etlweather
Dec 30, 2021
Author

Perhaps part of the answer is found in this repo: https://github.com/daanzu/kaldi_ag_training

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making model more resillient to bad recording #66

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Making model more resillient to bad recording #66

etlweather Dec 30, 2021

Replies: 1 comment

etlweather Dec 30, 2021 Author

etlweather
Dec 30, 2021

etlweather
Dec 30, 2021
Author