DRAFT: device keyphrase listening #9

youmustfight · 2024-01-02T16:47:04Z

I really want to be able to say "Lablink, start recording" instead of using a button. So I gave this a whirl. A few considerations/challenges:

You can only have 1 active audio listener. This means we have to record in 1 location and push to another
I tried doing an additional thread/process, but it seems like we might be hitting resource constraints (I think the pyaudio resource is another process)
Unconfirmed for RPI, but with IS2 you could only do output/input at a time and had to switch drivers for the board. I'm unsure if playing audio is messing with stuff.
I was pushing audio data over the queue fine, but for some reason, it would hang when I think we got to turning off the picamera2 in the record process.
Tried using super small background listener pocketsphinx with SpeechRecognition but hit two problems. The background listener function they have runs very infrequently so I need to use audio data. Pocketshinx is pretty bad at keyword recognition. It was allll over the place.
Before that, I tried using the Whipser tiny model, but it created some conflict with other packages I think because of some conflict/config issue with torch. So that could have worked but needed to fiddle with it more. I assume if I went the transformers package route, I'd hit a similar issue perhaps.

All in all, it feels close, but the refinements and head hitting isn't really worth it right now. Alternatively, one hackier approach could be pushing the audio to a S3 and letting server side process. but I think that'll still be a thread/process constraint. It's a little wonky with the video/audio combined although that's made things very simple so far.

https://github.com/Uberi/speech_recognition/blob/master/examples/threaded_workers.py
https://github.com/cmusphinx/pocketsphinx
https://github.com/Infatoshi/chatgpt-voice-assistant/blob/main/main.py
https://picovoice.ai/platform/porcupine/

vercel · 2024-01-02T16:47:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lattice	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 2, 2024 4:48pm

feat(halo): audio listener/queue that pushes data to recording

b1d1fda

feat(halo): processor listener

3107563

vercel bot deployed to Preview January 2, 2024 16:48 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: device keyphrase listening #9

DRAFT: device keyphrase listening #9

youmustfight commented Jan 2, 2024 •

edited

Loading

vercel bot commented Jan 2, 2024 •

edited

Loading

DRAFT: device keyphrase listening #9

Are you sure you want to change the base?

DRAFT: device keyphrase listening #9

Conversation

youmustfight commented Jan 2, 2024 • edited Loading

vercel bot commented Jan 2, 2024 • edited Loading

youmustfight commented Jan 2, 2024 •

edited

Loading

vercel bot commented Jan 2, 2024 •

edited

Loading