You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think @titu1994 might be able to answer this - I just didn't want to clutter the issue thread I started with this tangent 😉
Please correct me where I've got this wrong - I'm sure there are more than a few things I'm not quite piecing together correctly!
With the EncDecCTCModel, computing posteriors is as easy as calling the forward() method and the transcribe() method also has the option of returning posteriors.
For the EncDecRNNTModel, it a bit more complicated. From looking at the EncDecRNNTModel.training_step() method, I believe posteriors can be computed using these two steps steps:
For the EncDecHybridRNNTCTCModel, things are a bit different. I believe you can compute the RNNT posteriors using the same steps above, but for CTC posteriors you would call the ctc_decoder member (which I think is a ConvASRDecoder created from the 'aux_ctc' config of the hybrid model):
From what I can tell, the method EncDecHybridRNNTCTCModel.change_decoding_strategy() doesn't significantly impact either of the posterior computation steps above, but it definitely impacts how the EncDecHybridRNNTCTCModel.transcribe() method works. If decoding strategy is set to ctc, then the transcribe() method can return log posteriors (like the CTC model method does), otherwise it calls the RNNT version of transcribe which does not return posteriors.
Does that sound about right? Honestly it would be nice if there was a simple way to "view" a HybridRNNTCTC model as a CTC model and have all the APIs work the same way the do with a CTC model... It would also be nice if there was a single method to return log posteriors from the RNNT models when those are needed. I understand I could be wrong about all of that though... 🤣
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I think @titu1994 might be able to answer this - I just didn't want to clutter the issue thread I started with this tangent 😉
Please correct me where I've got this wrong - I'm sure there are more than a few things I'm not quite piecing together correctly!
With the
EncDecCTCModel
, computing posteriors is as easy as calling theforward()
method and thetranscribe()
method also has the option of returning posteriors.For the
EncDecRNNTModel
, it a bit more complicated. From looking at theEncDecRNNTModel.training_step()
method, I believe posteriors can be computed using these two steps steps:For the
EncDecHybridRNNTCTCModel
, things are a bit different. I believe you can compute the RNNT posteriors using the same steps above, but for CTC posteriors you would call thectc_decoder
member (which I think is aConvASRDecoder
created from the 'aux_ctc' config of the hybrid model):From what I can tell, the method
EncDecHybridRNNTCTCModel.change_decoding_strategy()
doesn't significantly impact either of the posterior computation steps above, but it definitely impacts how theEncDecHybridRNNTCTCModel.transcribe()
method works. If decoding strategy is set toctc
, then thetranscribe()
method can return log posteriors (like the CTC model method does), otherwise it calls the RNNT version oftranscribe
which does not return posteriors.Does that sound about right? Honestly it would be nice if there was a simple way to "view" a HybridRNNTCTC model as a CTC model and have all the APIs work the same way the do with a CTC model... It would also be nice if there was a single method to return log posteriors from the RNNT models when those are needed. I understand I could be wrong about all of that though... 🤣
Beta Was this translation helpful? Give feedback.
All reactions