Replies: 2 comments 3 replies
-
I see. I don't think we have ever tried to make RNNT char models work in buffered inference mode. There's not much of a reason actually - because character encoding can be simulated with Sentencepiece with "char" spe type. Anyway, now that you have a char model here's a few options -
|
Beta Was this translation helpful? Give feedback.
-
Thanks so much for the reply and the help @titu1994. Still working on this. But thought I'd let you know that I tried substituting for self.tokenizer with model.decoding as you suggested. It turns out the tokenizer is passed into streaming_utils.py, where it is used quite a bit in various places so wasn't just two places but - just to check if this works - I made the following wrapper..
And then did this in a couple of places
Unfortunately I am getting very poor performance. In particular almost entirely blank or empty transcriptions.. so even with a 19 min long source audio (*which transcribes OK otherwise) I only end up with a tiny bit of text.
Its so 'wrong' that wonder if I am just not doing the transcribe step correctly. I have other code that transcribes with EncDecRNNTModel just fine - using the 'transcribe()' method. This model was trained with 'streaming' in mind (in terms of context buffers etc). Maybe it's to do with mis-matching stride lengths or context buffer sizes or something. To help me debug this is there any version of this streaming code that is simpler.. that is, one at a time true 'streaming' (as opposed to batch based) and where it doesn't calculate the buffers up front, but instead does it as it comes in. ? |
Beta Was this translation helpful? Give feedback.
-
Hi !
We have a custom trained Nemo model. In particular a Conformer, with RNNT Char encoded decoder layers..
At risk of repeating myself, it's a conformer model using Char encoding (instead of BPE) and using an RNNT/Transducer (instead of CTC). The model class is EncDecRNNTModel.
I'm trying to get this working in streaming aka buffered inference mode.
There are some excellent notebooks with explanations and example code of how to do streaming with Nemo this, here and here
(Yes I do realize that these notebooks are in the Nemo github, not on google per se).
I'm getting problems that might be because the examples have not been updated to latest versions? Or maybe it's something else. Anyway would really appreciate any help.
The short version for the problem I'm having is that I get this error when I try to use it.
Specifically the LongestCommonSubsequenceBatchedFrameASRRNNT class (from nemo/collections/asr/parts/utils/streaming_utils.py) makes reference to the model.tokenizer object.
It does that on this line 715
The problem is that the asr_model I'm using aka the EncDecRNNTModel from Nemo 1.20 doesn't have a tokenizer. Methods like decode_ids_to_tokens are on the model.decoding object.
Any help very much appreciated! Thanks in advance.
BTW I'm using ..
Beta Was this translation helpful? Give feedback.
All reactions