Skip to content

0.2. Collection

Jim Schwoebel edited this page Aug 16, 2018 · 21 revisions

When I grew up I played basketball pretty much everyday. My coaches always stressed that fundamentals were important. Things like how consistently you shot the ball, hedged on defense, turned the ball over, and boxed out and rebounded dramatically affected the outcome of most games we played. It was often something simple - like turning the ball over too many times or we getting dramatically out-rebounded by the other team that caused us to lose games. In other words, we “deviated from our habits” and the other team “got us.”

Voice computing is very much like basketball. It’s quite important to understand the fundamentals: microphone selection, saving and manipulating audio files, using audio codecs to compress audio, etc. It’s often something simple, like forgetting to record files with a particular microphone, which leads to poor or good audio quality. If you don’t understand these things you’ll be ‘outcompeted’ by the big corporate players like Google, Nuance, IBM, or Amazon. In contrast, if you understand and stick to the voice computing fundamentals you can ‘outcompete’ these corporate giants (in terms of the software and datasets that you make).

The goal of this chapter is to help get you up-to-speed with how to develop tools in this ‘voice-first’ era. Specifically, we will overview:

  • 1.1 - Basic principles of voice computing
  • 1.2 - Installing dependencies
  • 1.3 - How to read and write audio files
  • 1.4 - Manipulating audio files
  • 1.5 - Playing audio files
  • 1.6 - Recording streaming audio
  • 1.7 - Converting audio formats
  • 1.8 - Transcribing audio
  • 1.9 - Text-to-speech systems

In this way, you will have the foundations necessary to thrive and build interesting voice applications in python.