-
Hello! I'm new to ASR (AI in general) and I'm struggling to fine-tune the Conformer CTC (small) on the UASpeech dataset. If I try to fine-tune the model from the NeMo pre-trained checkpoint, the loss starts around 40 and jumps to 4.14e+03 in the next step then some value around e+05 and stays in that range. Val_wer is nearly a constant 100%. It occasionally predicts long sequences of repeated characters (e.g.
Here's my
I'm thinking that there might be an issue with the
The loss explosion doesn't happen if I start training from scratch (using a config file). The loss starts around 400 but decreases rapidly to 11/12 within the first epoch then plateaus. After 60 epochs, the predictions still do not seem to improve (mostly random single letters), but maybe it just needs more training? Val_wer fluctuates between 99.9 and 100%.
Thank you so much in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
@VahidooX could you take a look at this one? |
Beta Was this translation helpful? Give feedback.
-
After some investigation, I think that the exploding loss could have been caused by the vocabulary size being too large. I initially thought the vocab size was supposed to be the number of unique words in my dataset, but since we're using CTC it should be at least the number of characters in the dataset or more. And in order to use the pre-trained weights, the vocabulary size needs to match. |
Beta Was this translation helpful? Give feedback.
-
What is init_weights_from_model you have used? I don't think we have this one in nemo, right? You need to add +init_from_nemo_model=model_nemo_file.nemo If you have a very small dataset, it is suggested to use the same tokenizer as the pretrained model. You need to change the tokenizer mostly when you train it on another language. Anyway, when you change the token size, then the decoder weights cannot get loaded as they have different shapes. You can skip loading the weights by using exclude option like here: What is the accuracy of the pretrained model on your dataset without fine-tuning? If you want to fine-tune on small dataset, I suggest to keep the same tokenizer. Nemo files are regular gzip file, and you may unzip the nemo file and get the tokenizer out. Use lower lr like lr=1 or lr=0.5 so that the model don't diverge significantly. @titu1994 do we have details of how users can load these checkpoint in our docs? |
Beta Was this translation helpful? Give feedback.
-
Hi @VahidooX, @titu1994 , As new data has some new words so- Please suggest me, as I have limited data and wanted to finetune the model in same language. |
Beta Was this translation helpful? Give feedback.
After some investigation, I think that the exploding loss could have been caused by the vocabulary size being too large. I initially thought the vocab size was supposed to be the number of unique words in my dataset, but since we're using CTC it should be at least the number of characters in the dataset or more. And in order to use the pre-trained weights, the vocabulary size needs to match.