Welcome to the VisualSense repository! The problem being addressed by this project is the significant barrier faced by visually impaired individuals in comprehending visual content in their surroundings. Whether encountering images in real-time through a camera or uploading pictures for analysis, visually impaired individuals often struggle to understand the contents of these images without sighted assistance. This project seeks to bridge this gap by developing a web application that utilizes Visual Question Answering (VQA) technology. Through VQA, users can interactively ask questions about images, enabling them to gain a better understanding of the visual content independently. By providing this tool, the project aims to enhance the accessibility of visual information for visually impaired individuals and promote their autonomy and inclusion.
-
Real-time Image Analysis: Instantly analyze images from the device's camera.
-
Interactive Questioning: Users ask questions about image content using natural language.
-
AI-driven Answering: Spoken answers generated based on image analysis and user queries.
-
Flexible Image Upload: Ability to upload images for analysis from device storage.
-
Accessibility Features: User-friendly interface with compatibility for screen readers.
-
Promotes Independence: Empowers visually impaired individuals to access visual information independently.
-
Inclusion: Reduces reliance on sighted assistance, fostering greater inclusion.
1. Front-End
cd frontend
npm i
npm run dev
2. Back-End
cd backend
uvicorn index:app --reload
Technologies used in the project:
- PyTorch
- FastAPI
- NextJS
- Hugging Faces Transformers
- Gemini API
If you encounter any issues or have suggestions for improvements, please create an issue or pull request on GitHub.