Skip to content

This project delivers a content recognition application that analyzes frames from real-time video streams, with multi-language support and text-to-speech (TTS) capabilities. Using Google's powerful artificial intelligence models, it analyzes images and provides descriptions in different languages. These explanations can also be conveyed vocally.

Notifications You must be signed in to change notification settings

SentryCoderDev/Gemini_SeeSpeakTranslate_GUI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeeSpeakTranslate_GUI

Description

This project provides a content description application that can analyze frames from real-time video streams, supports multiple languages, and includes text-to-speech (TTS) capabilities. Utilizing powerful AI models from Google, the application analyzes images and delivers descriptions in various languages, which can also be audibly presented. It features a user-friendly interface with customizable style options.

Features

  • Real-Time Video Streaming: View and manage video streams from different camera sources.
  • Frame Description: AI-powered content description that analyzes and describes video frames.
  • Multi-Language Support: Content descriptions in English, Turkish, German, and Arabic.
  • Text-to-Speech Conversion: Listen to descriptions in the selected language.
  • User-Friendly Interface: Intuitive and easy-to-use interface.
  • Customizable Style: Personalize the application's appearance with style settings.

Files and Modules

  1. content_description.py:
    • Contains the content description class and functions.
    • Provides user input handling, frame description, and text-to-speech functionality.
  2. main.py:
    • The main entry point of the application.
    • Sets up the user interface using Tkinter and applies style settings.
    • Initializes the video stream handler and content describer objects.
  3. video_stream.py:
    • Contains the class that manages video streaming.
    • Processes video streams from the camera and displays them on a Tkinter Canvas.
    • Provides functions to start, stop, and update the video stream.

Usage

  1. Install Required Libraries:

    pip install -r requirements.txt
  2. Set Up Environment Variables:

    Add your Google API key to the .env file.

  3. Run the Application:

    python main.py
  4. Select Camera and Language:

    In the application interface, select a camera from the available options and choose the desired description language.

  5. Start Video Stream and Get Description:

    • Press the "Select Camera" button to start the video stream.
    • Press the "Describe the frame" button to get the description of the video frame.
    • Press the "Text-to-Speech" button to hear the description audibly.

Example

To see an example of the application in action, refer to the following steps:

  1. Select Camera: Choose the desired camera source from the dropdown menu.
  2. Select Language: Choose the language for the description from the language menu.
  3. Start Video Stream: Click "Select Camera" to start the video stream.
  4. Describe Frame: Click "Describe the frame" to get a textual description of the current video frame.
  5. Text-to-Speech: Click "Text-to-Speech" to hear the description spoken aloud.

About

This project delivers a content recognition application that analyzes frames from real-time video streams, with multi-language support and text-to-speech (TTS) capabilities. Using Google's powerful artificial intelligence models, it analyzes images and provides descriptions in different languages. These explanations can also be conveyed vocally.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%