Many algorithms are used to process speech and "correctly" convert speech into words.
For DL, Neural networks such as CNN, SNN, SRNN are used as models for the task, based on the output of MFCC on audio sample.
For Continuous Speech Recognition, difficulty rises because of the necessity to predict the upcoming word that follows the same meaning and logic of the sentence. Some algorithms are used such as :
- Viterbi
- N-grams
- HMM
- MLP
- PLP
- Hybrid HMM-MLP