Skip to content

An Audio-Interactive Document Assistant for the Visually Impaired powered by Google's Gemini

License

Notifications You must be signed in to change notification settings

sankeer28/VoiceAid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceAid is an Audio-Interactive Document Assistant designed to assist the visually impaired in navigating documents. Powered by Google's Gemini, VoiceAid allows users to interact with documents using voice commands and provides audio responses.

GIF

Recommended to use on a chromium-based browser

Please refrain from sharing personal information with any AI systems, as it may be used to train them. Protect your privacy by avoiding the disclosure of sensitive data.

Features

  • Supports various file formats including PDF, DOCX, PNG, JPG, and JPEG.
  • Utilizes Google's Gemini for generating AI-powered responses.
  • Provides live transcription of user commands.
  • Allows users to upload documents and ask questions via voice or text input.
  • Offers audio feedback for responses.
  • uses Chromium's built-in TTS: Microsoft Liam Online (Natural) - English (Canada)
  • Dark / Light themes

Supported Devices

Devices Voice Input Voice Output Text Output File Upload
iOS
Android ⚠️ ⚠️
Windows 10/11
Mac
  • ⚠️: May or May not work properly, unstable, bugs
  • ❌: Fully does not work
  • ✅: Fully works

Getting Started

To use VoiceAid, you'll need an API key from Google's AI Studio. Follow these steps to get started:

  1. Obtain an API key from Google's AI Studio.
  2. Clone this repository to your local machine.
  3. Open index.html in your preferred web browser.
  4. Enable microphone access.
  5. Enter your API key in the provided input field.
  6. Start interacting with VoiceAid by asking questions or uploading documents.

Commands

VoiceAid supports the following voice commands:

  • "Stop talking" or "Stop": Stops audio playback.
  • "Delete file", "Remove file", "Remove the file", or "Delete the file": Clears the uploaded document.

Contributions

Contributions are welcome! If you'd like to contribute to VoiceAid, feel free to fork this repository and submit a pull request with your changes.

Demo

VoiceAid_Demo.mp4

About

An Audio-Interactive Document Assistant for the Visually Impaired powered by Google's Gemini

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published