-
Notifications
You must be signed in to change notification settings - Fork 210
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #194 from J-B-Mugundh/navi-bot
Added GPS-Free Navigational Chatbot
- Loading branch information
Showing
5 changed files
with
455 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# **NaviBot-Voice-Assistant** | ||
|
||
### 🎯 **Goal** | ||
|
||
The main goal of NaviBot-Voice-Assistant is to assist visually impaired users by providing a voice-based interaction system that helps them navigate complex routes. The assistant uses voice commands and PDF-based route data to offer step-by-step guidance. It enhances accessibility by combining voice recognition, text-to-speech, and advanced AI-powered query systems. | ||
|
||
### 🧵 **Dataset** | ||
|
||
NaviBot does not require any pre-existing dataset. Instead, it utilizes user-uploaded PDF route data for navigation assistance. However, if you're working with specific datasets for training or additional use cases, please link them here. | ||
|
||
### 🧾 **Description** | ||
|
||
NaviBot-Voice-Assistant allows users to upload PDF route data, process it into text chunks, and query it using either text or voice inputs. It uses Google Generative AI for question answering, FAISS for vector storage, and Langchain to build conversational AI. The assistant is designed for seamless interaction and voice guidance, making navigation more accessible for blind users. | ||
|
||
### 🧮 **What I had done!** | ||
|
||
- Integrated **speech recognition** using `speech_recognition` to enable voice inputs. | ||
- Added **text-to-speech** functionality using `pyttsx3` for voice-based responses. | ||
- Allowed users to **upload PDF documents** to extract and process route data. | ||
- Utilized **Google Generative AI embeddings** to transform text data into vectors. | ||
- Implemented **FAISS** for efficient similarity search of navigation instructions. | ||
- Built a conversational chain for answering questions based on the processed text. | ||
- Deployed the application using **Streamlit** to create a user-friendly web interface. | ||
|
||
### 🚀 **Models Implemented** | ||
|
||
- **Google Generative AI for Embeddings**: Chosen for its advanced natural language understanding and high accuracy in generating meaningful embeddings. | ||
- **FAISS (Facebook AI Similarity Search)**: Selected for its high-performance similarity search on large-scale vectors, allowing for efficient retrieval of relevant route instructions. | ||
|
||
### 📚 **Libraries Needed** | ||
|
||
- `streamlit` | ||
- `pyttsx3` | ||
- `speech_recognition` | ||
- `PyPDF2` | ||
- `langchain` | ||
- `langchain_google_genai` | ||
- `google.generativeai` | ||
- `langchain_community.vectorstores` | ||
- `FAISS` | ||
- `dotenv` | ||
|
||
### 📊 **Exploratory Data Analysis Results** | ||
|
||
This project does not involve a traditional EDA, as it focuses on real-time PDF processing and AI-powered voice interaction. However, if visualizations or processing statistics (e.g., text chunk sizes, similarity scores) are generated, they can be displayed here. | ||
|
||
### 📈 **Performance of the Models based on the Accuracy Scores** | ||
|
||
The performance of the system can be evaluated based on: | ||
- **Response accuracy**: How well the system provides relevant answers from the context. | ||
- **Speech recognition accuracy**: How accurately it converts voice input into text. | ||
- **Embedding quality**: How well the AI model embeds text for similarity search. | ||
|
||
### 💻 How to run | ||
|
||
To get started with NaviBot-Voice-Assistant, follow these steps: | ||
|
||
1. Navigate to the project directory: | ||
|
||
```bash | ||
cd NaviBot-Voice-Assistant | ||
``` | ||
|
||
3. Activating venv (optional) | ||
|
||
```bash | ||
conda create -n venv python=3.10+ | ||
conda activate venv | ||
``` | ||
|
||
4. Install dependencies: | ||
|
||
```python | ||
pip install -r requirements.txt | ||
``` | ||
|
||
5. Configure environment variables | ||
``` | ||
Rename `.env-sample` to `.env` file | ||
Replace the API your Google API Key, | ||
``` | ||
Kindly follow refer to this site for getting [your own key](https://ai.google.dev/tutorials/setup) | ||
<br/> | ||
|
||
6. Run the chatbot: | ||
|
||
```bash | ||
streamlit run app.py | ||
``` | ||
|
||
PS: Try running other files as well for different use case. | ||
|
||
### 📢 **Conclusion** | ||
|
||
NaviBot-Voice-Assistant successfully integrates voice recognition and AI-powered question-answering to assist visually impaired users in navigating routes. It ensures high interaction accuracy by leveraging state-of-the-art models like Google Generative AI and FAISS for efficient retrieval and processing. The integration of these technologies provides a reliable and accessible navigation experience for its users. | ||
|
||
### ✒️ **Signature** | ||
|
||
**[J B Mugundh]** | ||
GitHub: [Github](https://github.com/J-B-Mugundh) | ||
LinkedIn: [Linkedin](https://www.linkedin.com/in/mugundhjb/) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
import streamlit as st | ||
import pyttsx3 | ||
import speech_recognition as sr | ||
from PyPDF2 import PdfReader | ||
from langchain.text_splitter import RecursiveCharacterTextSplitter | ||
import os | ||
from langchain_google_genai import GoogleGenerativeAIEmbeddings | ||
import google.generativeai as genai | ||
from langchain_community.vectorstores import FAISS | ||
from langchain_google_genai import ChatGoogleGenerativeAI | ||
from langchain.chains.question_answering import load_qa_chain | ||
from langchain.prompts import PromptTemplate | ||
from dotenv import load_dotenv | ||
|
||
# Load environment variables from .env file | ||
load_dotenv() | ||
# Retrieve Google API key from environment variables | ||
os.getenv("GOOGLE_API_KEY") | ||
# Configure Generative AI API using the API key | ||
genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) | ||
|
||
# Initialize pyttsx3 for voice output (Text-to-Speech engine) | ||
engine = pyttsx3.init() | ||
|
||
# Function to speak the given text using pyttsx3 | ||
def speak(text): | ||
engine.say(text) | ||
engine.runAndWait() | ||
|
||
# Function to listen for voice input using the microphone | ||
def listen(): | ||
r = sr.Recognizer() | ||
# Use the microphone as the audio source | ||
with sr.Microphone() as source: | ||
st.write("Listening...") # Notify the user that it's listening | ||
r.adjust_for_ambient_noise(source) # Adjust for noise in the environment | ||
audio = r.listen(source) # Capture the audio | ||
|
||
try: | ||
# Use Google Speech Recognition to convert audio to text | ||
user_input = r.recognize_google(audio) | ||
st.write(f"You said: {user_input}") | ||
return user_input | ||
except sr.UnknownValueError: | ||
# Handle case where speech is not understood | ||
st.write("Sorry, I could not understand what you said.") | ||
return None | ||
except sr.RequestError as e: | ||
# Handle request errors with Google Speech Recognition API | ||
st.write(f"Could not request results from Google Speech Recognition service; {e}") | ||
return None | ||
|
||
# Function to extract text from uploaded PDF files | ||
def get_pdf_text(pdf_docs): | ||
text = "" | ||
for pdf in pdf_docs: | ||
# Read PDF and extract text from each page | ||
pdf_reader = PdfReader(pdf) | ||
for page in pdf_reader.pages: | ||
text += page.extract_text() | ||
return text | ||
|
||
# Function to split large text into smaller chunks using RecursiveCharacterTextSplitter | ||
def get_text_chunks(text): | ||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000) | ||
chunks = text_splitter.split_text(text) # Split the text into manageable chunks | ||
return chunks | ||
|
||
# Function to create and save a FAISS vector store from the text chunks | ||
def get_vector_store(text_chunks): | ||
# Use Google Generative AI embeddings to convert text into vectors | ||
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") | ||
# Create a FAISS vector store from the text chunks and embeddings | ||
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings) | ||
# Save the vector store locally for later use | ||
vector_store.save_local("faiss_index") | ||
|
||
# Function to get the conversational chain for question-answering using a template | ||
def get_conversational_chain(): | ||
|
||
# Define a prompt template for generating detailed answers | ||
prompt_template = """ | ||
Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in | ||
provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n | ||
Context:\n {context}?\n | ||
Question: \n{question}\n | ||
Answer: | ||
""" | ||
|
||
# Initialize the Google Generative AI model (gemini-pro) with a specific temperature setting | ||
model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3) | ||
|
||
# Set up the prompt template with input variables for context and question | ||
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"]) | ||
# Load the question-answering chain with the model and prompt | ||
chain = load_qa_chain(model, chain_type="stuff", prompt=prompt) | ||
|
||
return chain | ||
|
||
# Function to handle the user input (both text and voice), perform similarity search, and respond | ||
def user_input(user_question): | ||
# Initialize Google Generative AI embeddings for vector search | ||
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") | ||
|
||
# Load the local FAISS index with embeddings, allow for dangerous deserialization | ||
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True) | ||
# Perform similarity search in the vector store based on the user's question | ||
docs = new_db.similarity_search(user_question) | ||
|
||
# Get the question-answering conversational chain | ||
chain = get_conversational_chain() | ||
|
||
# Run the chain to get the answer from the retrieved documents | ||
response = chain( | ||
{"input_documents": docs, "question": user_question}, | ||
return_only_outputs=True | ||
) | ||
|
||
# Use pyttsx3 to speak the response | ||
speak(response["output_text"]) | ||
# Display the response on the Streamlit UI | ||
st.write("Reply: ", response["output_text"]) | ||
|
||
# Main function for the Streamlit app | ||
def main(): | ||
# Set the page configuration with a title | ||
st.set_page_config("Beyond GPS Navigation") | ||
st.header("Beyond GPS Navigator for Blind") # App header | ||
|
||
# Get user's text input from Streamlit input box | ||
user_question = st.text_input("Ask your query") | ||
# Button to trigger voice input | ||
voice_input_button = st.button("Voice Input") | ||
|
||
# If the voice input button is clicked, listen to the user's voice | ||
if voice_input_button: | ||
user_question = listen() # Capture the user's voice input | ||
if user_question: | ||
user_input(user_question) # Process the input | ||
|
||
# If there's text input from the user, process it | ||
if user_question: | ||
user_input(user_question) | ||
|
||
# Sidebar menu for uploading PDF documents | ||
with st.sidebar: | ||
st.title("Menu:") | ||
# Allow users to upload multiple PDF files | ||
pdf_docs = st.file_uploader("Upload your route data and Click on the Submit & Process Button", accept_multiple_files=True) | ||
# Button to process the uploaded PDF documents | ||
if st.button("Submit & Process"): | ||
with st.spinner("Processing..."): # Show spinner while processing | ||
raw_text = get_pdf_text(pdf_docs) # Extract text from the PDF files | ||
text_chunks = get_text_chunks(raw_text) # Split the text into chunks | ||
get_vector_store(text_chunks) # Save the text chunks in a vector store | ||
st.success("Done") # Notify the user when processing is complete | ||
|
||
# Run the Streamlit app | ||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
import os | ||
import streamlit as st | ||
from dotenv import load_dotenv | ||
import google.generativeai as gen_ai | ||
import speech_recognition as sr | ||
from gtts import gTTS | ||
from tempfile import TemporaryFile | ||
|
||
# Load environment variables | ||
load_dotenv() | ||
|
||
# Configure Streamlit page settings | ||
st.set_page_config( | ||
page_title="Chat with Gemini-Pro!", | ||
page_icon=":brain:", # Favicon emoji | ||
layout="centered", # Page layout option | ||
) | ||
|
||
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY") | ||
|
||
# Set up Google Gemini-Pro AI model | ||
gen_ai.configure(api_key=GOOGLE_API_KEY) | ||
model = gen_ai.GenerativeModel('gemini-pro') | ||
|
||
# Function to translate roles between Gemini-Pro and Streamlit terminology | ||
def translate_role_for_streamlit(user_role): | ||
if user_role == "model": | ||
return "assistant" | ||
else: | ||
return user_role | ||
|
||
# Function to recognize speech input | ||
def recognize_speech(): | ||
r = sr.Recognizer() | ||
with sr.Microphone() as source: | ||
st.write("Speak now...") | ||
audio = r.listen(source) | ||
|
||
try: | ||
user_prompt = r.recognize_google(audio) | ||
st.write("You said:", user_prompt) | ||
return user_prompt | ||
except sr.UnknownValueError: | ||
st.write("Sorry, I could not understand your audio.") | ||
return "" | ||
except sr.RequestError as e: | ||
st.write("Could not request results from Google Speech Recognition service; {0}".format(e)) | ||
return "" | ||
|
||
# Function to output voice | ||
def speak(text): | ||
tts = gTTS(text=text, lang='en') | ||
with TemporaryFile(suffix=".wav", delete=False) as f: | ||
tts.write_to_fp(f) | ||
filename = f.name | ||
st.audio(filename, format='audio/wav') | ||
|
||
|
||
# Initialize chat session in Streamlit if not already present | ||
if "chat_session" not in st.session_state: | ||
st.session_state.chat_session = model.start_chat(history=[]) | ||
|
||
# Display the chatbot's title on the page | ||
st.title("🤖 Gemini Pro - ChatBot") | ||
|
||
# Display the chat history | ||
for message in st.session_state.chat_session.history: | ||
with st.chat_message(translate_role_for_streamlit(message.role)): | ||
st.markdown(message.parts[0].text) | ||
|
||
# Input field for user's message | ||
voice_input = st.checkbox("Voice Input") | ||
if voice_input: | ||
user_prompt = recognize_speech() | ||
else: | ||
user_prompt = st.text_input("Ask Gemini-Pro...") | ||
|
||
if user_prompt: | ||
# Add user's message to chat and display it | ||
st.chat_message("user").markdown(user_prompt) | ||
|
||
# Send user's message to Gemini-Pro and get the response | ||
gemini_response = st.session_state.chat_session.send_message(user_prompt) | ||
|
||
# Display Gemini-Pro's response | ||
with st.chat_message("assistant"): | ||
st.markdown(gemini_response.text) | ||
speak(gemini_response.text) |
Oops, something went wrong.