Skip to content

ASR for rapid speech #6360

Closed Answered by titu1994
OllieBroadhurst asked this question in Q&A
Discussion options

You must be logged in to vote

Note that conformer is a 4x stride model - and we use a window stride of 0.01 s so effectively conformer output chunk is of duration 0.04 s. If you modify the window stride larger, remember that you will have to deal with longer and longer delay between token emissions. It's also not a guarantee that it will do better - RNNT can predict multiple token per timestep but it will not predict overlapped tokens in the same time frame without specifically training for such a task. So be careful or arbitrarily changing the window stride.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by OllieBroadhurst
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants