-
Notifications
You must be signed in to change notification settings - Fork 100
How µSpeech Detects Phonemes
µSpeech can currently detect 2 phonemes with great accuracy and a number of others with lesser accuracy. µSpeech uses an innovative series of algorithms to do this. Don't expect HMMs, or mixture models or any other ML stuff as the arduino cannot handle them.
introduced in version 1.0
In our speech we use three basic types of phonemes: Vowels, Fricatives and 'Plosives. uSpeech is best at handling Fricatives. Fricatives include /s/, /sh/, /f/. There are two routes uspeech uses to identify these. First of vowels have very clear and low frequency wave forms. Thus, the 'complexity' of the waveform is far lesser. Sounds like /s/, /sh/, /ch/ have very complex waveforms which are almost like white noise. These are generally made by air moving quickly through our mouth and the voice box takes no part in them. uSpeech ranks sounds in order of their complexity.
complexity = abs(derivative of sound)/abs(Integral of sound)
Letters like /v/ and /z/ are made by a combination of sound from our voice box and our mouths. These have mid ranged complexity values. These values require calibration on the users side for the algorithm to accurately work. Vowels have a very low complexity score.
introduced in version 2.0
One of the most elusive fricatives is 'f'. In order to deal with this, we have found a different algorithm. By using a simple low pass filter we can determine when a user says 'f'. This filter requires calibration to function properly.
3.0 working-branch