-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples: added trigger-phrase agent example #800
base: main
Are you sure you want to change the base?
Conversation
- switched to elevenlabs for tts - switched tts audio publishing into a streamed method - added boost trigger for deepgram stt - added references to the returns of asyncio.createtask
- added readme - removed unused variable
…com/livekit/agents into hamdan/trigger-phrase-agent-example
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments, plus run ruff check . && ruff format .
for linting
examples/trigger-phrase/agent.py
Outdated
tokenize.basic.WordTokenizer(ignore_punctuation=True) | ||
) | ||
|
||
trigger_phrase = "Hi Bob!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TRIGGER_PHRASE
instead to show that this is a changeable constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh okay, didn't know that - thanks!
examples/trigger-phrase/agent.py
Outdated
initial_ctx = llm.ChatContext().append( | ||
role="system", | ||
text=( | ||
f"You are {trigger_phrase}, a voice assistant created by LiveKit. Your interface with users will be voice. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird misleading use of trigger_phrase here. this implies that it can only be used as a name, i think it's best to drop it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree. I was also debating about this but my thought was that it might be helpful to give the LLM a bit more context
- changed trigger phrase variable into a constant - removed passing trigger phrase to the LLM context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! i noticed a few things:
- there's no STT transcriptions in chat, can you add those?
- it's a bit slow. worth looking into
- semantically this should probably be inside the
voice_assistant
examples directory
examples/trigger-phrase/agent.py
Outdated
vad = silero.VAD.load( | ||
min_speech_duration=0.01, | ||
min_silence_duration=0.5, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should have this be in the prewarm function so it doesn't block the job from starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oope I see it in the docs now, should have read it better 🤦♂️
I think it is mainly due to the 0.5 sec timeout set for the VAD, and maybe partly due to the computation that needs to happen on every END_OF_SPEECH event. I am not sure the best way to address them though. Since the primary goal of this example is to show the users a way to use transcribed words to trigger the LLM, I didn't go down the path of ensuring minimum possible latency like
Even though technically this is a voice assistant, since we are not using the VoiceAssistant class, I feel like it would be confusing and counter intuitive to the users if we placed in that directory and hence resorted to a stand alone example directory. What do you think? |
in my testing i encountered closer to three or sometimes four seconds of silence before the response started playing. this doesn't need to be fully optimized as an example, but at this point it is hurting the effectiveness of the demo. re: directory, disregard; did not notice this doesn't actually use VoicePipelineAgent. |
- removed VAD - add STT transcription - removed first participant constraint
|
@s-hamdananwar this is how I was able to manage "multiple" participants in a single raise hand queue, check out the PR and let me know if this can help resolve the issue of still only listening to the first participant that joins the room. |
No description provided.