Skip to content

Latest commit

 

History

History
105 lines (76 loc) · 3.07 KB

README.md

File metadata and controls

105 lines (76 loc) · 3.07 KB

WeNet Python Binding

This is a python binding of WeNet.

WeNet is a production first and production ready end-to-end speech recognition toolkit.

The best things of the binding are:

  1. Multiple languages supports, including English, Chinese. Other languages are in development.
  2. Non-streaming and streaming API
  3. N-best, contextual biasing, and timestamp supports, which are very important for speech productions.
  4. Alignment support. You can get phone level alignments this tool, on developing.

Install

Python 3.6+ is required.

pip3 install wenetruntime

Usage

Note:

  1. For macOS, wenetruntime packed libtorch.so, so we can't import torch and wenetruntime at the same time.
  2. For Windows and Linux, wenetruntime depends on torch. Please install and import the same version torch as wenetruntime.

Non-streaming Usage

import sys
import torch
import wenetruntime as wenet

wav_file = sys.argv[1]
decoder = wenet.Decoder(lang='chs')
ans = decoder.decode_wav(wav_file)
print(ans)

You can also specify the following parameter in wenet.Decoder

  • lang (str): The language you used, chs for Chinese, and en for English.

  • model_dir (str): is the Runtime Model directory, it contains the following files. If not provided, official model for specific lang will be downloaded automatically.

    • final.zip: runtime TorchScript ASR model.
    • units.txt: modeling units file
    • TLG.fst: optional, it means decoding with LM when TLG.fst is given.
    • words.txt: optional, word level symbol table for decoding with TLG.fst

    Please refer https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md for the details of Runtime Model.

  • nbest (int): Output the top-n best result.

  • enable_timestamp (bool): Whether to enable the word level timestamp.

  • context (List[str]): a list of context biasing words.

  • context_score (float): context bonus score.

  • continuous_decoding (bool): Whether to enable continuous(long) decoding.

For example:

decoder = wenet.Decoder(model_dir,
                        lang='chs',
                        nbest=5,
                        enable_timestamp=True,
                        context=['不忘初心', '牢记使命'],
                        context_score=3.0)

Streaming Usage

import sys
import torch
import wave
import wenetruntime as wenet

test_wav = sys.argv[1]

with wave.open(test_wav, 'rb') as fin:
    assert fin.getnchannels() == 1
    wav = fin.readframes(fin.getnframes())

decoder = wenet.Decoder(lang='chs')
# We suppose the wav is 16k, 16bits, and decode every 0.5 seconds
interval = int(0.5 * 16000) * 2
for i in range(0, len(wav), interval):
    last = False if i + interval < len(wav) else True
    chunk_wav = wav[i: min(i + interval, len(wav))]
    ans = decoder.decode(chunk_wav, last)
    print(ans)

You can use the same parameters as we introduced above to control the behavior of wenet.Decoder

Build on Your Local Machine

git clone https://github.com/wenet-e2e/wenet.git
cd wenet/runtime/binding/python
python setup.py install