We were inspired by the recent release of the OpenAI Whisper and ChatGPT APIs. The OpenAI API allows users to connect to the OpenAI servers and run queries on LLMs through their apps. The Whisper API allows users to send voice data to OpenAI for them to transcribe. We realized that when we combined these APIs, we could provide a more immersive AI conversation experience than anyone has achieved before.
The app allows users to record their voice and send it to ChatGPT, which will then respond to them through speech. This gives the user an immersive experience of conversing with the AI almost like they would with a normal human.
We built an API pipeline that converts spoken audio to text, sends text to ChatGPT, and plays its response in audio. We use the Whisper API for audio to text, ChatGPT for text prompt and response, and Google' s Text-to-Speech services for text to speech.
We used the React Native and the Expo framework to design an app to do this. Getting the Expo framework to work on our computers was a challenge, as it's computationally intense. Getting all the APIs strung together was also a challenge. At one point, we wanted to make the AI response play automatically after you' re finished recording, but the way that Flask handles API requests made this impossible for us.
-Getting our original goal done -Making the UI look good -Combining three different API
-
Flask
-
Expo
-
React Native
-
JavaScript
-
Tailwind
-
How to work in a team
- Connect it to more APIs, such as VALL-E
- Submission to the App & Google Play store
- Text prompt processing
- Added integration with other apps such as calendar, messages, etc.
- get in audiogpt folder
- cd app
- in terminal:
npm install --global expo-cli
- in terminal:
npm i
- download 'Expo Go' on your phone
- in terminal:
expo start
- scan URL
- app should open on your phone
- finish readme
- audio recording button