llm voice assistant in python

Picovoice · May 28, 2024 · c22661d · c22661d
1 parent 040177d
commit c22661d
Show file tree

Hide file tree

Showing 7 changed files with 494 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,6 @@
-# pico-cookbook
+# Pico Cookbook
+
+Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
+
+[![Twitter URL](https://img.shields.io/twitter/url?label=%40AiPicovoice&style=social&url=https%3A%2F%2Ftwitter.com%2FAiPicovoice)](https://twitter.com/AiPicovoice)<!-- markdown-link-check-disable-line -->
+[![YouTube Channel Views](https://img.shields.io/youtube/channel/views/UCAdi9sTCXLosG1XeqDwLx7w?label=YouTube&style=social)](https://www.youtube.com/channel/UCAdi9sTCXLosG1XeqDwLx7w)
diff --git a/recipes/.gitkeep b/recipes/.gitkeep
diff --git a/recipes/llm-voice-assistant/README.md b/recipes/llm-voice-assistant/README.md
@@ -0,0 +1,14 @@
+# LLM-Powered Voice Assistant
+
+Hands-free voice assistant powered by a large language model (LLM), all voice recognition, LLM inference, and speech synthesis are on-device.
+
+## Components
+
+- [Porcupine Wake Word](https://picovoice.ai/docs/porcupine/)
+- [Cheetah Streaming Speech-to-Text](https://picovoice.ai/docs/cheetah/)
+- [picoLLM Inference Engine](https://github.com/Picovoice/picollm)
+- [Orca Streaming Text-to-Speech](https://picovoice.ai/docs/orca/)
+
+## Implementations
+
+- [Python](python)
diff --git a/recipes/llm-voice-assistant/python/README.md b/recipes/llm-voice-assistant/python/README.md
@@ -0,0 +1,74 @@
+## Compatibility
+
+- Python 3.8+
+- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5 and 4).
+
+## AccessKey
+
+AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is
+using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet
+connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%
+offline and completely free for open-weight models. Everyone who signs up for
+[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.
+
+## picoLLM Model
+
+picoLLM Inference Engine supports many open-weight models. The models are on
+[Picovoice Console](https://console.picovoice.ai/).
+
+## Usage
+
+Install the required packages:
+
+```console
+pip install -r requirements.txt
+```
+
+Run the demo:
+
+```console
+python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} 
+```
+
+Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the 
+model downloaded from Picovoice Console.
+
+To see all available options, type the following:
+
+```console
+python main.py --help
+```
+
+## Custom Wake Word
+
+The demo's default wake phrase is `Picovoice`. You can generate your custom (branded) wake word using Picovoice  Console by following [Porcupine Wake Word documentation (https://picovoice.ai/docs/porcupine/). Once you have the model trained, simply pass it to the demo
+application using `--keyword_model_path` argument.
+
+## Profiling
+
+To see the runtime profiling metrics, run the demo with the `--profile` argument:
+
+```console
+python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} --profile 
+```
+
+Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the 
+model downloaded from Picovoice Console.
+
+The demo profiles three metrics: Real-time Factor (RTF), Token per Second (TPS), and Latency.
+
+### Real-time Factor (RTF)
+
+RTF is a standard metric for measuring the speed of speech processing (e.g., wake word, speech-to-text, and 
+text-to-speech). RTF is the CPU time divided by the processed (recognized or synthesized) audio length. Hence, a lower RTF means a more efficient engine.
+
+### Token per Second (PPS)
+
+Token per second is the standard metric for measuring the speed of LLM inference engines. TPS is the number of 
+generated tokens divided by the compute time used to create them. A higher TPS is better.
+
+### Latency
+
+We measure the latency as the delay between the end of the user's utterance (i.e., the time when the user finishes talking) and the 
+time that the voice assistant generates the first chunk of the audio response (i.e., when the user starts hearing the response).
+