Skip to content

2. Settings

Jim Schwoebel edited this page Aug 1, 2018 · 38 revisions

List of modifiable settings

There are multiple settings that you can change by modifying variables in the SETTINGS section of nala.py. Here is a brief list of them along with what they can be changed to and their descriptions.

Variable Options Description
transcription_type ‘sphinx’, ‘google’ The type of transcription. Default is ‘sphinx’ (and if google and the path to the environment variable cannot be found, it will revert back to sphinx).
wake_type 'sphinx', 'snowboy', 'porcupine' Wakeword detector used to detect user queries. Default is ‘porcupine’ as it is the most accurate wakeword detector.
query_time 2 Time in seconds of each query when Nala is activated. The default query time is 2 seconds (from trial-and-error).
multi_query True, False Multi-query capability allows you to separate queries with AND in the transcript, so it doesn’t stop after one query. Default is True.
query_save True, False Ability to save queries once they have been propagated. Otherwise, they are deleted. This is useful if you want to cache query data or build a dataset. Default is True.
register_face True, False Store face when user registers to authenticate later with facial recognition. Default is True.
sleep_time 30 The time (in minutes) that Nala will sleep if you trigger the “Go to sleep” action query. Default is 30 minutes.

You also need to set some environment variables if you'd like to use a few actions. Specifically, it requires access to the root account to do things like shutdown or restart the computer. If you don't want to do this, it's totally fine too, you just won't be able to do these commands.

Modifying voice type

You can modify the voice type by changing the speak.py script in the ./actions/folder. Nala uses the Pyttsx3 library and the default is Fiona ('com.apple.speech.synthesis.voice.fiona'). You just need to modify this script for the voice to change across all Nala experiences. Check out the Pyttsx3 documentation for more information on how to modify things like the speech rate and pitch.

import sys
import pyttsx3 as pyttsx

def say(text):
    engine = pyttsx.init()
    engine.setProperty('voice','com.apple.speech.synthesis.voice.fiona')
    engine.say(text)
    engine.runAndWait()

say(str(sys.argv[1]))

Training new transcription models

Nala by default queries wake words with PocketSphinx to not drive up costs (as google charges $0.006/query). Therefore, as you code more actions into Nala you may need to retrain a new transcription model based on a new language model.

This is quite easy to do. Here are some quick instructions.

  1. First, we need to create a text document (.txt) with keywords to train the language model. These are the words that the transcription model will be trained to recognize. The fewer the words in the master corpus, the better accuracy you’ll likely achieve. Here are the words the current model uses that you can add onto; you should keep these core words in there if you'd like to use the default actions.
play music 
get the weather
get social 
get coffee 
get the news 
get sports 
get food
get ice cream
get beers
get beer 
get social 
get food 
get nightlife 
find a bar 
plan trip 
set alarm 
stop alarm 
make a poem 
make a joke 
record audio 
record video 
open atom
open sublime 
open spotify 
open twitter
open linkedin
open facebook
open github
chill out 
exercise
I love you
search  
be grateful 
meditate 
shut down
restart 
log out
sleep 
  1. Now that we have our text corpus, we can go to the LMTool page. Simply click on the “Browse…” button, select the corpus.txt file you created, then click “COMPILE KNOWLEDGE BASE” and download all the files (Figure 7.3.5.2). In this case, there is a TAR4311.tgz file that I can download at the top easily into the downloads folder.

  1. Now that we have all the required files, all we need to do is load them in PocketSphinx via the ./data/models/ps_transcribe.py script. You just need to change the 4437.lm and 4437.dic files to whatever .lm and .dic file you just trained.
from os import environ, path
import sys
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

# Get all the directories right

def transcribe(HOSTDIR, SAMPLE):
    
    # fix host directory if it doesn't contain a '/'
    
    if HOSTDIR[-1] != '/':
        HOSTDIR = HOSTDIR+'/'
    SAMPLEDIR = HOSTDIR+SAMPLE
    MODELDIR = HOSTDIR+"data/models"
    DATADIR = HOSTDIR+"data/wakewords"

    # Create a decoder with certain model
    config = Decoder.default_config()
    config.set_string('-hmm', MODELDIR+'/en-us')
    config.set_string('-lm', MODELDIR+'/4437.lm')
    config.set_string('-dict', MODELDIR+'/4437.dic')
    decoder = Decoder(config)

    # Decode streaming data.
    decoder = Decoder(config)
    decoder.start_utt()
    stream = open(SAMPLEDIR, 'rb')
    while True:
      buf = stream.read(1024)
      if buf:
        decoder.process_raw(buf, False, False)
      else:
        break
    decoder.end_utt()

    print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])
    output=[seg.word for seg in decoder.seg()]
    output.remove('<s>')
    output.remove('</s>')

    transcript = ''
    for i in range(len(output)):
        if i == 0:
            transcript=transcript+output[i]
        else:
            transcript=transcript+' '+output[i]

    transcript=transcript.lower()
    print('transcript: '+transcript)

    return transcript

Training new wakewords

Pocketsphinx, Snowbird, or Porcupine. Defaults to Porcupine because it's the most accurate.

Note that Snowbird and Porcupine require licenses for commercial use of their models.

Clone this wiki locally