Replies: 5 comments 1 reply
-
Are both models using hybrid loss but just encoder changed? Depending on seq length, CTC might not work well. Fast Conformer is 8x stride model, please make sure your transcript is always at least slightly longer than the audio seq/ 8. Rnnt loss conveged well but CTC loss diverged, you can try using smaller CTC aux weight (0.1 instead of 0.3 which is the default). The hyper parameters are tuned for pretty large datasets, so they might not fit am overfitting task of 900 samples. |
Beta Was this translation helpful? Give feedback.
-
thanks, @titu1994, I tried CTC aux weight=0.1, |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@bmwshop, I do not quite understand the condition about 8x wrt audiosequence. I printed signal_len and transcript_len that come from dataloader, average values are like 118000 for signal_len, 65 for transcript_len. Tried to train ctc models separately. |
Beta Was this translation helpful? Give feedback.
-
Returning to hybrid model, now using bpe instead of char. Same micro dataset as in previous experiments. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I tried to train from scratch two hybrid models on microdataset of 900 sentences using char tokens.
The hyperparameters were taken from default nemo config
conformer/conformer_transducer_char.yaml
The single difference was using conformer/fastconformer in them. The first model converges well, the second doesn't. Can anyone give a hint - why could it be?
Beta Was this translation helpful? Give feedback.
All reactions