Skip to content

Latest commit

 

History

History
154 lines (89 loc) · 3.89 KB

README.md

File metadata and controls

154 lines (89 loc) · 3.89 KB

Logo

WhisCall

A framework for AI WhatsApp calls using Whisper, Coqui TTS, GPT-3.5 Turbo, Virtual Audio Cable, and the WhatsApp Desktop App.

Demo.mp4

Tools used in this framework

  • Whisper (Speech to Text)
  • OpenAI GPT 3.5 Turbo
  • Coqui TTS
  • Virtual Audio Cable
  • WhatsApp Desktop App

Installation

Install Build Tools from Visual Studio 2022

Download Visual Studio Installer

App Screenshot

Install CUDA Toolkit 12.4

Download CUDA Toolkit 12.4

App Screenshot

Install Espeak

Download Espeak from here

App Screenshot

Install VB-Audio Cable

Note: You need two separate Virtual Audio Cables. I am using VB Audio Cable and (VAC) Virtual Audio Cable. Install both.

App Screenshot

Install Whatsapp Desktop Version

Download Whatsapp

App Screenshot

Now Clone the Repo

  https://github.com/skshadan/WhisCall.git
  pip install -r requirements.txt

Find Speaker And Microphone Index

Run the below code to find the index of your virtual audio cable for the microphone and speaker.

import pyaudio

def list_audio_devices():
    p = pyaudio.PyAudio()
    info = p.get_host_api_info_by_index(0)
    num_devices = info.get('deviceCount')

    # Lists of devices to return
    speakers = []
    microphones = []

    # Scan through devices and add to list
    for i in range(0, num_devices):
        device = p.get_device_info_by_index(i)
        if device.get('maxInputChannels') > 0:
            microphones.append((i, device.get('name')))
        if device.get('maxOutputChannels') > 0:
            speakers.append((i, device.get('name')))

    p.terminate()
    return microphones, speakers

microphones, speakers = list_audio_devices()

print("Microphones:")
for idx, name in microphones:
    print(f"Index: {idx}, Name: {name}")

print("\nSpeakers:")
for idx, name in speakers:
    print(f"Index: {idx}, Name: {name}")

Select Input & Output for Microphone and Speaker in WhatsApp App

App Screenshot

Run the code

main.py

from voice import select_microphone, transcribe_audio
from response import generate_response, text_to_speech, PlayAudio

def main():
    mic_index = select_microphone()
    for text in transcribe_audio(mic_index):
        if text:
            gpt_response = generate_response(text)
            text_to_speech(gpt_response) 
            PlayAudio()


if __name__ == "__main__":
    main()

Select the Microphone Index. The TTS & Whisper will load, and that's it!!!

App Screenshot

If you want different voices, you need to change the TTS model as follows:

Download Models From Here:

Facing Any Issues?

Feel free to ask if you are having any issues. Also, feel free to contribute.

fin.