Skip to content

Latest commit

 

History

History
134 lines (93 loc) · 6.71 KB

README.md

File metadata and controls

134 lines (93 loc) · 6.71 KB

📄 Query PDF (Enhancing Accesibility For All Users)

Solution Overview:

Query PDF is a voice-powered AI RAG (Retrieval-Augmented Generation) application 🎤 designed to simplify working with PDFs 📚. Users can upload documents and interact via voice commands 🗣️, receiving accurate summaries and real-time responses ⚡.

Process Overflow:

diagram

Why We Built This Solution:

We built this solution to address common challenges people face with large, complex documents 📑. Traditional search tools can be limiting and are often inaccessible for individuals with disabilities . By integrating RAG and voice technology 🤖, we aimed to create an app that lets users interact with documents naturally, using conversation 💬.

🎯 Target Users:

  • Individuals with Visual Impairments or Learning Disabilities : Benefit from having documents read aloud and interacting using voice commands, promoting accessibility.
  • Business Professionals 📊: Work with lengthy contracts, proposals, or reports and require a fast, accessible way to review documents.
  • Multitaskers 💼: Engage with documents hands-free, listening to summaries or searching documents while focusing on other tasks.
  • Students and Researchers 🧑‍🎓: Need to extract and interact with large volumes of information from academic PDFs, reports, or textbooks quickly.

How RAG Helped:

RAG ensures the app provides accurate, relevant answers by retrieving specific data from PDFs 📂 and generating real-time voice summaries 🗣️📄. This reduces errors, making the app a trustworthy tool for users needing precise document-based information ✅.


Innovation 💡:

The app combines voice interaction 🎙️ with RAG technology 🛠️ to offer an easy, hands-free way to explore PDFs. It’s particularly helpful for users who may find traditional document navigation challenging, such as those with visual impairments 👀 or those who prefer voice over reading 📖.


Impact 🌍:

The app is set to transform how people engage with digital documents . By providing voice-driven summaries 🔊 and search 🔍, students, professionals, and individuals with accessibility needs can easily access key information without manually scrolling through long PDFs ⏳.


Usability 🔧:

The app is designed to be simple and accessible . Users upload a PDF, use voice commands to interact with content 🎤, and receive voice-based responses 🗣️. It’s intuitive and user-friendly, with no technical skills required 💻🚫.


Technology & Languages

  • JavaScript
  • Java
  • .NET
  • Python
  • AI Studio
  • AI Search
  • PostgreSQL
  • Cosmos DB
  • Azure SQL

Other Technologies Used

  • Next.js
  • React
  • JavaScript
  • Hugging Face
  • Pinecone
  • OpenAI API Key

Youtube Presentation Link:

Live Website Demo:

App Overview

1. Landing Page

The app begins with a Landing Page that welcomes users. To start using the app, click the "Start to PDF Now" button, which navigates you to the page where you can upload a PDF document.

home1

2. Homepage Features

On the homepage, users can explore the features app's three main features by clicking "Features" tab:

  • PDF Summary: Automatically generates a summary of the uploaded PDF.
  • Ask Questions: Allows users to ask specific questions about the PDF content.
  • Voice Chat: Engage in a voice-based conversation to send messages and interact with the PDF content.

home2

3. Meet the Team

By clicking on the "Meet the Team" section from the homepage, users can view the GitHub repositories of the contributors involved in building the app. home3

4. Chatbot Interaction Sample

Here’s an example of user interaction:

  • After clicking "Start PDF Chat Now", the user uploads a PDF file.
  • The app generates a summary of the uploaded document (e.g., a hackathon PDF).
  • The user can then prompt the chatbot (e.g., "When is submission due?"), and the bot will scan the document to respond accordingly.
Pasted Graphic 1

-By integrating RAG, this app ensures high-quality, context-aware interactions with PDF documents, enhancing the overall user experience.

5. Additional Sample

home4

Meet The Team

Team

Further Research🔍

To enhance the app's effectiveness and inclusivity, additional research and development can focus on the following areas:

1. Voice Interaction for Differently Abled Individuals

Conduct studies to assess and refine the voice interaction feature for users with various disabilities, including:

  • Speech Impairments: Tailor voice recognition and response features to better accommodate users with speech disabilities.
  • Hearing Impairments: Ensure that voice commands and responses are accessible and clear, possibly integrating text-to-speech and speech-to-text functionalities.

2. Usability Studies with Impaired Groups

Perform detailed usability studies to evaluate how individuals with cognitive, visual, or physical impairments interact with the app. This can include:

  • Cognitive Impairments: Simplify interactions and improve the clarity of instructions and feedback.
  • Visual Impairments: Enhance compatibility with screen readers and ensure that visual elements are accessible.

3. Language Processing and Adaptation

Improve natural language processing (NLP) capabilities to handle diverse speech patterns, accents, and speeds. Research could focus on:

  • Accent and Dialect Recognition: Adapt the app to accurately understand and respond to various accents and dialects.
  • Contextual Understanding: Enhance the app’s ability to comprehend and generate relevant responses based on contextual nuances in user queries.

By addressing these research areas, the app can become more inclusive, user-friendly, and effective for a broader range of users.