Skip to content
This repository has been archived by the owner on Apr 17, 2024. It is now read-only.
/ dAIlogue Public archive

An AI that listens and talks back using SOTA STT, TTS and GPT-NEO

License

Notifications You must be signed in to change notification settings

FontaineRiant/dAIlogue

Repository files navigation

Out of date, use wrAIter in conversation mode instead

dAIlogue is a voiced AI that listens to your microphone input or reads your text input and replies using its own voice.

The AI writer is powered by EleitherAI's GPT-NEO model, a replication of GPT-3. The suggested model has 2.7 billion parameters and was fine-tuned to write light novels, including dialogues.

Features

  • State-of-the-art Speech To Text AI that listens to your voice,
  • State-of-the-art Artificial Intelligence that follows the conversation and generates human-like sentences,
  • State-of-the-art Text To Speech AI that reads the outputs out loud.
  • Pick from multiple speakers, affecting their behaviour: a self-aware AI, a man, a woman, a robot, your cat/dog, or an idiot.

Local Installation

  1. Set up CUDA 11.1 to enable hardware acceleration (need a good GPU).
  2. Install python 3.7, Visual C++ 14.0 (or later), portaudio v19 and eSpeak-ng.
  3. Set the PHONEMIZER_ESPEAK_PATH environment variable to C:\Program Files\eSpeak NG\espeak-ng.exe or wherever you installed it.
  4. Download or clone this repository.
  5. Run install.ps1 (windows powershell) or install.sh (shell script).
  6. Install pyaudio 0.2.11 (for windows get it from here).
  7. Download a GPT-NEO model and put its content in ./models/[model name]/. Here's a link to finetuneanon's light novel model.
  8. Download an STT model (rename it stt.tflite) and put it at ./models/stt.tflite (optional: do the same for its scorer ./models/stt.scorer)
  9. Play by running play.ps1 (windows powershell) or play.sh (shell script). You can also launch main.py directly with your own launch options (model selection, gpu/cpu).

FAQ

What kind of hardware do I need?

CPU inference is currently broken for text generation, and enabled by default for text-to-speech (launch option). So you'll need a GPU with at least 8 GB of VRAM. If you run into video memory issues, you can lower max_history in ./generator/generator.py (maximum number of "words" that the AI can read before writing text).

Does the AI learn from my inputs?

While the AI remembers the last thousand words of the conversation, it doesn't learn from it. Playing or saving a discussion won't affect the way it plays another.

Does the AI forget parts of the conversation?

Yes. Because the model can only take 1024 words in the input, the oldest events can be dropped to make the story fit. However, the context of the conversation (your choice of its identity) is never "forgotten".

Until you hit 1024 words, longer stories yield progressively better results.

Can I fine-tune the AI on a corpus of my choice?

I didn't bother with fine-tuning with GPT-NEO. The model is just too large to fit into my machine or any free cloud GPU. So you're on your own.

dAIlogue is a terrible name.

Yes it is.

Does this thing respect my privacy?

Yes, dAIlogue only needs to connect to the internet to download the TTS model and to install python packages. It doesn't upload anything, and only saves conversations on your hard drive if you explicitly ask it to. To play sound, the last played wave file is also stored on your machine.

I read an article about AIdungeon and profanity. Doesn't this have the same issues?

No. First, dAIlogue doesn't adjust based on your or other players' inputs. The model runs on your machine, so tempering with it would only affect your own experience. Second, a censor is enabled by default, trashing and regenerating entire paragraphs if the model outputs a single banned word. It can be disabled in the launch options, giving you the freedom of choice.

Credits

About

An AI that listens and talks back using SOTA STT, TTS and GPT-NEO

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages