Skip to content

2. Settings

Jim Schwoebel edited this page Aug 2, 2018 · 38 revisions

List of modifiable settings

There are multiple settings that you can change by modifying variables in the SETTINGS.json database of nala.py. Here is a brief list of them along with what they can be changed to and their descriptions.

Variable Options Description
alarm True, False whether the alarm is turned on or off at the designated time
alarm time 8 the time the alarm would go off at (in 24 hour time, 8 = 8AM, 13 = 1 PM) if the alarm action is turned on.
greeting True, False if True, then Nala will greet you every time you login and get the weather (default). If False, she will not do this.
end 1531914937.172238 the last time that you updated the database (this is useful for understanding sessions)
transcription_type ‘sphinx’, ‘google’ The type of transcription. Default is ‘sphinx’ (and if google and the path to the environment variable cannot be found, it will revert back to sphinx).
wake_type 'sphinx', 'snowboy', 'porcupine' Wakeword detector used to detect user queries. Default is ‘porcupine’ as it is the most accurate wakeword detector.
query_time 2 Time in seconds of each query when Nala is activated. The default query time is 2 seconds (from trial-and-error).
multi_query True, False Multi-query capability allows you to separate queries with AND in the transcript, so it doesn’t stop after one query. Default is True.
query_save True, False Ability to save queries once they have been propagated. Otherwise, they are deleted. This is useful if you want to cache query data or build a dataset. Default is True.
register_face True, False Store face when user registers to authenticate later with facial recognition. Default is True.
sleep_time 30 The time (in minutes) that Nala will sleep if you trigger the “Go to sleep” action query. Default is 30 minutes.
query_json True, False Save .json queries as well in the data/queries folder to match audio (e.g. sample.wav --> sample.json with query info)
budget 30 Budget user has to go out with friends (for actions).
genre 'classical' Type of music genre user prefers (for actions).

You also need to set some environment variables if you'd like to use a few actions. Specifically, it requires access to the root account to do things like shutdown or restart the computer. If you don't want to do this, it's totally fine too, you just won't be able to do these commands.

Modifying voice type

You can modify the voice type by changing the speak.py script in the ./actions/folder. Nala uses the Pyttsx3 library and the default is Fiona ('com.apple.speech.synthesis.voice.fiona'). You just need to modify this script for the voice to change across all Nala experiences.

import sys
import pyttsx3 as pyttsx

def say(text):
    engine = pyttsx.init()
    engine.setProperty('voice','com.apple.speech.synthesis.voice.fiona')
    engine.say(text)
    engine.runAndWait()

say(str(sys.argv[1]))

Check out the Pyttsx3 documentation for more information on how to modify things like the speech rate and pitch.

Training new transcription models

Nala by default queries wake words with PocketSphinx to not drive up costs (as google charges $0.006/query). Therefore, as you code more actions into Nala you may need to retrain a new transcription model based on a new language model.

This is quite easy to do. Here are some quick instructions.

  1. First, we need to create a text document (.txt) with keywords to train the language model. These are the words that the transcription model will be trained to recognize. The fewer the words in the master corpus, the better accuracy you’ll likely achieve. Here are the words the current model uses that you can add onto; you should keep these core words in there if you'd like to use the default actions. You can find this text file at this path: ./data/models/nala_2.txt.
play music 
get the weather
get social 
get coffee 
get the news 
get sports 
get food
get ice cream
get beers
get beer 
get social 
get food 
get nightlife 
find a bar 
plan trip 
set alarm 
stop alarm 
make a poem 
make a joke 
record audio 
record video 
open atom
open sublime 
open spotify 
open twitter
open linkedin
open facebook
open github
chill out 
exercise
I love you
search  
be grateful 
meditate 
shut down
restart 
log out
sleep 
  1. Now that we have our text corpus, we can go to the LMTool page. Simply click on the “Browse…” button, select the corpus.txt file you created, then click “COMPILE KNOWLEDGE BASE” and download all the files (Figure 7.3.5.2). In this case, there is a TAR4311.tgz file that I can download at the top easily into the downloads folder.

  1. Now that we have all the required files, all we need to do is load them in PocketSphinx via the ./data/models/ps_transcribe.py script. You just need to change the 4437.lm and 4437.dic files to whatever .lm and .dic file you just trained. Now you're good to go with a new transcription model!
from os import environ, path
import sys
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

# Get all the directories right

def transcribe(HOSTDIR, SAMPLE):
    
    # fix host directory if it doesn't contain a '/'
    
    if HOSTDIR[-1] != '/':
        HOSTDIR = HOSTDIR+'/'
    SAMPLEDIR = HOSTDIR+SAMPLE
    MODELDIR = HOSTDIR+"data/models"
    DATADIR = HOSTDIR+"data/wakewords"

    # Create a decoder with certain model
    config = Decoder.default_config()
    config.set_string('-hmm', MODELDIR+'/en-us')
    config.set_string('-lm', MODELDIR+'/4437.lm')
    config.set_string('-dict', MODELDIR+'/4437.dic')
    decoder = Decoder(config)

    # Decode streaming data.
    decoder = Decoder(config)
    decoder.start_utt()
    stream = open(SAMPLEDIR, 'rb')
    while True:
      buf = stream.read(1024)
      if buf:
        decoder.process_raw(buf, False, False)
      else:
        break
    decoder.end_utt()

    print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])
    output=[seg.word for seg in decoder.seg()]
    output.remove('<s>')
    output.remove('</s>')

    transcript = ''
    for i in range(len(output)):
        if i == 0:
            transcript=transcript+output[i]
        else:
            transcript=transcript+' '+output[i]

    transcript=transcript.lower()
    print('transcript: '+transcript)

    return transcript

Training new wakewords

The best wakeword engine to use is Porcupine (as shown in the following figure taken from this repo). What's great is that Porcupine is completely opensource for mac, windows, and linux operating systems.

If you'd like to train a new wakeword, you are first going to need to clone Porcupine's repository:

git clone https://github.com/Picovoice/Porcupine.git
cd Porcupine

Porcupine enables developers to build models for any wake word. This is done using Porcupine's optimizer utility. It finds optimal model hyper-parameters for a given hotword and stores these parameters in a, so-called, keyword file. You could create your own keyword file using the Porcupine's optimizer from the command line

tools/optimizer/${SYSTEM}/${MACHINE}/pv_porcupine_optimizer -r resources/ -w ${WAKE_WORD} \
-p ${TARGET_SYSTEM} -o ${OUTPUT_DIRECTORY}

In the above example replace ${SYSTEM} and ${TARGET_SYSTEM} with current and target (runtime) operating systems (linux, mac, or windows). ${MACHINE} is the CPU architecture of current machine (x86_64 or i386). ${WAKE_WORD} is the chosen wake word. Finally, ${OUTPUT_DIRECTORY} is the output directory where keyword file will be stored.

Now that you have your wake word, all you need to do is edit the ./data/models/porcupine file so that it matches with your wake word of interest and Nala will now respond to that wake word.

If you instead would like to train models with PocketSphinx or Snowboy, check out each of their documentation.

Clone this wiki locally