The SpeechRecognition
package provides a high-level interface to record and process audio inputs in Python.
Reference:
- https://github.com/Uberi/speech_recognition
- https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst
- https://github.com/s2t2/learning-new-sounds
- https://github.com/prof-rossetti/voice-interface-demo-py
This package depends on another Python package called "pyaudio", which itself depends on a lower-level library caled "portaudio" (not a Python package). To install "portaudio":
- On a Mac, use homebrew (
brew install portaudio
). - On Windows, use pipwin within an active virtual environment (see installation steps below).
Do these installation steps after activating a virtual environment.
Windows:
pip install pipwin
pipwin install pyaudio # will install along with lower level binaries
pip install SpeechRecognition # depends on the "pyaudio" Python package
Mac:
pip install pyaudio
pip install SpeechRecognition
Record audio using your computer's built-in microphone, and save that to a file:
import speech_recognition as sr
client = sr.Recognizer()
with sr.Microphone() as mic:
print("Say something!")
audio = client.listen(mic)
with open("my-recording.flac", "wb") as f:
f.write(audio.get_flac_data())
Record audio using your computer's built-in microphone, and recognize the spoken words:
import speech_recognition as sr
client = sr.Recognizer()
with sr.Microphone() as mic:
print("Say something!")
audio = client.listen(mic)
# returns the transcript with the highest confidence:
transcript = client.recognize_google(audio)
#> 'how old is the Brooklyn Bridge'
# returns all transcripts:
response = client.recognize_google(audio, show_all=True)
#> {
#> 'alternative': [
#> {
#> 'transcript': 'how old is the Brooklyn Bridge',
#> 'confidence': 0.987629
#> }
#> ],
#> 'final': True
#> }