-
Notifications
You must be signed in to change notification settings - Fork 14
2. Settings
There are multiple settings that you can change by modifying variables in the SETTINGS section of nala.py. Here is a brief list of them along with what they can be changed to and their descriptions.
Variable | Options | Description |
---|---|---|
transcription_type | ‘sphinx’, ‘google’ | The type of transcription. Default is ‘sphinx’ (and if google and the path to the environment variable cannot be found, it will revert back to sphinx). |
wake_type | 'sphinx', 'snowboy', 'porcupine' | Wakeword detector used to detect user queries. Default is ‘porcupine’ as it is the most accurate wakeword detector. |
query_time | 2 | Time in seconds of each query when Nala is activated. The default query time is 2 seconds (from trial-and-error). |
multi_query | True, False | Multi-query capability allows you to separate queries with AND in the transcript, so it doesn’t stop after one query. Default is True. |
query_save | True, False | Ability to save queries once they have been propagated. Otherwise, they are deleted. This is useful if you want to cache query data or build a dataset. Default is True. |
register_face | True, False | Store face when user registers to authenticate later with facial recognition. Default is True. |
sleep_time | 30 | The time (in minutes) that Nala will sleep if you trigger the “Go to sleep” action query. Default is 30 minutes. |
You also need to set some environment variables if you'd like to use a few actions. Specifically, it requires access to the root account to do things like shutdown or restart the computer. If you don't want to do this, it's totally fine too, you just won't be able to do these commands.
You can modify the voice type by changing the speak.py script in the ./actions/folder. Nala uses the Pyttsx3 library and the default is Fiona ('com.apple.speech.synthesis.voice.fiona'). You just need to modify this script for the voice to change across all Nala experiences. Check out the Pyttsx3 documentation for more information on how to modify things like the speech rate and pitch.
import sys
import pyttsx3 as pyttsx
def say(text):
engine = pyttsx.init()
engine.setProperty('voice','com.apple.speech.synthesis.voice.fiona')
engine.say(text)
engine.runAndWait()
say(str(sys.argv[1]))
Nala by default queries wake words with PocketSphinx to not drive up costs (as google charges $0.006/query). Therefore, as you code more actions into Nala you may need to retrain a new transcription model based on a new language model.
This is quite easy to do. Here are some quick instructions.
- First, we need to create a text document (.txt) with keywords to train the language model. These are the words that the transcription model will be trained to recognize. The fewer the words in the master corpus, the better accuracy you’ll likely achieve. Here are the words the current model uses that you can add onto; you should keep these core words in there if you'd like to use the default actions.
play music
get the weather
get social
get coffee
get the news
get sports
get food
get ice cream
get beers
get beer
get social
get food
get nightlife
find a bar
plan trip
set alarm
stop alarm
make a poem
make a joke
record audio
record video
open atom
open sublime
open spotify
open twitter
open linkedin
open facebook
open github
chill out
exercise
I love you
search
be grateful
meditate
shut down
restart
log out
sleep
- Now that we have our text corpus, we can go to the LMTool page. Simply click on the “Browse…” button, select the corpus.txt file you created, then click “COMPILE KNOWLEDGE BASE” and download all the files (Figure 7.3.5.2). In this case, there is a TAR4311.tgz file that I can download at the top easily into the downloads folder.
- Now that we have all the required files, all we need to do is load them in PocketSphinx via the ./data/models/ps_transcribe.py script. You just need to change the 4437.lm and 4437.dic files to whatever .lm and .dic file you just trained.
from os import environ, path
import sys
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
# Get all the directories right
def transcribe(HOSTDIR, SAMPLE):
# fix host directory if it doesn't contain a '/'
if HOSTDIR[-1] != '/':
HOSTDIR = HOSTDIR+'/'
SAMPLEDIR = HOSTDIR+SAMPLE
MODELDIR = HOSTDIR+"data/models"
DATADIR = HOSTDIR+"data/wakewords"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', MODELDIR+'/en-us')
config.set_string('-lm', MODELDIR+'/4437.lm')
config.set_string('-dict', MODELDIR+'/4437.dic')
decoder = Decoder(config)
# Decode streaming data.
decoder = Decoder(config)
decoder.start_utt()
stream = open(SAMPLEDIR, 'rb')
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
decoder.end_utt()
print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])
output=[seg.word for seg in decoder.seg()]
output.remove('<s>')
output.remove('</s>')
transcript = ''
for i in range(len(output)):
if i == 0:
transcript=transcript+output[i]
else:
transcript=transcript+' '+output[i]
transcript=transcript.lower()
print('transcript: '+transcript)
return transcript
Pocketsphinx, Snowbird, or Porcupine. Defaults to Porcupine because it's the most accurate.
Note that Snowbird and Porcupine require licenses for commercial use of their models.