Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: device keyphrase listening #9

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

youmustfight
Copy link
Member

@youmustfight youmustfight commented Jan 2, 2024

I really want to be able to say "Lablink, start recording" instead of using a button. So I gave this a whirl. A few considerations/challenges:

  • You can only have 1 active audio listener. This means we have to record in 1 location and push to another
  • I tried doing an additional thread/process, but it seems like we might be hitting resource constraints (I think the pyaudio resource is another process)
  • Unconfirmed for RPI, but with IS2 you could only do output/input at a time and had to switch drivers for the board. I'm unsure if playing audio is messing with stuff.
  • I was pushing audio data over the queue fine, but for some reason, it would hang when I think we got to turning off the picamera2 in the record process.
  • Tried using super small background listener pocketsphinx with SpeechRecognition but hit two problems. The background listener function they have runs very infrequently so I need to use audio data. Pocketshinx is pretty bad at keyword recognition. It was allll over the place.
  • Before that, I tried using the Whipser tiny model, but it created some conflict with other packages I think because of some conflict/config issue with torch. So that could have worked but needed to fiddle with it more. I assume if I went the transformers package route, I'd hit a similar issue perhaps.

All in all, it feels close, but the refinements and head hitting isn't really worth it right now. Alternatively, one hackier approach could be pushing the audio to a S3 and letting server side process. but I think that'll still be a thread/process constraint. It's a little wonky with the video/audio combined although that's made things very simple so far.

https://github.com/Uberi/speech_recognition/blob/master/examples/threaded_workers.py
https://github.com/cmusphinx/pocketsphinx
https://github.com/Infatoshi/chatgpt-voice-assistant/blob/main/main.py
https://picovoice.ai/platform/porcupine/

Copy link

vercel bot commented Jan 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lattice ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 2, 2024 4:48pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant