Why hybrid ASR could train well with conformer but not fastconformer (everything else being the same)? #7797

bene-ges · 2023-10-24T19:54:50Z

bene-ges
Oct 24, 2023

Hi,
I tried to train from scratch two hybrid models on microdataset of 900 sentences using char tokens.
The hyperparameters were taken from default nemo config conformer/conformer_transducer_char.yaml
The single difference was using conformer/fastconformer in them. The first model converges well, the second doesn't. Can anyone give a hint - why could it be?

titu1994 · 2023-10-24T20:16:10Z

titu1994
Oct 24, 2023
Maintainer

Are both models using hybrid loss but just encoder changed? Depending on seq length, CTC might not work well. Fast Conformer is 8x stride model, please make sure your transcript is always at least slightly longer than the audio seq/ 8.

Rnnt loss conveged well but CTC loss diverged, you can try using smaller CTC aux weight (0.1 instead of 0.3 which is the default).

The hyper parameters are tuned for pretty large datasets, so they might not fit am overfitting task of 900 samples.

0 replies

bene-ges · 2023-10-25T13:03:09Z

bene-ges
Oct 25, 2023
Author

thanks, @titu1994,
yes, same hybrid loss, only encoder params(subsampling and conv_kernel_size) correspond to fastconformer.

I tried CTC aux weight=0.1,
it became slightly better, ctc val loss doesn't diverge anymore
but still very far from simple(not fast) conformer. Is it that training ctc with fastconformer requires more iterations?

0 replies

bene-ges · 2023-10-25T14:36:13Z

bene-ges
Oct 25, 2023
Author

I also tried to modify encoder to support fastconformer+streaming+multi like in your model stt_en_fastconformer_hybrid_large_streaming_multi.
Result: CTC doesn't train at all - even train loss doesn't decrease. Look at pink graph. Is it expected behaviour?

1 reply

bmwshop Oct 26, 2023
Collaborator

Have you tried just training the Fast Conformer CTC model separately? Maybe that will offer a clue. I don't believe we attempted to train it with char tokens. As @titu1994 mentioned, CTC models won't work if the transcripts are very short, the total # of output tokens less than the 8x reduced (in case of FC, in case of C, 4x) audio sequence. It would be good to rule this out.

bene-ges · 2023-10-27T12:44:17Z

bene-ges
Oct 27, 2023
Author

@bmwshop, I do not quite understand the condition about 8x wrt audiosequence. I printed signal_len and transcript_len that come from dataloader, average values are like 118000 for signal_len, 65 for transcript_len.

Tried to train ctc models separately.
Similar picture: fastconformer converges much slower, fastconformer+streaming even worse.

0 replies

bene-ges · 2023-10-27T13:20:30Z

bene-ges
Oct 27, 2023
Author

Returning to hybrid model, now using bpe instead of char. Same micro dataset as in previous experiments.
hybrid + fastconformer + streaming + multi, ctc_weight=0.1
If I try to overfit from scratch - ctc doesn't seem to train.
But if I initialize from your pretrained model stt_en_fastconformer_hybrid_large_streaming_multi, it works ok, both losses go down. I use different language, so replaced tokenizer in both cases.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why hybrid ASR could train well with conformer but not fastconformer (everything else being the same)? #7797

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why hybrid ASR could train well with conformer but not fastconformer (everything else being the same)? #7797

bene-ges Oct 24, 2023

Replies: 5 comments · 1 reply

titu1994 Oct 24, 2023 Maintainer

bene-ges Oct 25, 2023 Author

bene-ges Oct 25, 2023 Author

bmwshop Oct 26, 2023 Collaborator

bene-ges Oct 27, 2023 Author

bene-ges Oct 27, 2023 Author

bene-ges
Oct 24, 2023

Replies: 5 comments 1 reply

titu1994
Oct 24, 2023
Maintainer

bene-ges
Oct 25, 2023
Author

bene-ges
Oct 25, 2023
Author

bmwshop Oct 26, 2023
Collaborator

bene-ges
Oct 27, 2023
Author

bene-ges
Oct 27, 2023
Author