Skip to content

Commit

Permalink
Merge pull request #194 from J-B-Mugundh/navi-bot
Browse files Browse the repository at this point in the history
Added GPS-Free Navigational Chatbot
  • Loading branch information
UTSAVS26 authored Oct 8, 2024
2 parents d75ddf8 + 589b599 commit 7825fb7
Show file tree
Hide file tree
Showing 5 changed files with 455 additions and 0 deletions.
102 changes: 102 additions & 0 deletions Advanced_Projects/NaviBot-Voice-Assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# **NaviBot-Voice-Assistant**

### 🎯 **Goal**

The main goal of NaviBot-Voice-Assistant is to assist visually impaired users by providing a voice-based interaction system that helps them navigate complex routes. The assistant uses voice commands and PDF-based route data to offer step-by-step guidance. It enhances accessibility by combining voice recognition, text-to-speech, and advanced AI-powered query systems.

### 🧵 **Dataset**

NaviBot does not require any pre-existing dataset. Instead, it utilizes user-uploaded PDF route data for navigation assistance. However, if you're working with specific datasets for training or additional use cases, please link them here.

### 🧾 **Description**

NaviBot-Voice-Assistant allows users to upload PDF route data, process it into text chunks, and query it using either text or voice inputs. It uses Google Generative AI for question answering, FAISS for vector storage, and Langchain to build conversational AI. The assistant is designed for seamless interaction and voice guidance, making navigation more accessible for blind users.

### 🧮 **What I had done!**

- Integrated **speech recognition** using `speech_recognition` to enable voice inputs.
- Added **text-to-speech** functionality using `pyttsx3` for voice-based responses.
- Allowed users to **upload PDF documents** to extract and process route data.
- Utilized **Google Generative AI embeddings** to transform text data into vectors.
- Implemented **FAISS** for efficient similarity search of navigation instructions.
- Built a conversational chain for answering questions based on the processed text.
- Deployed the application using **Streamlit** to create a user-friendly web interface.

### 🚀 **Models Implemented**

- **Google Generative AI for Embeddings**: Chosen for its advanced natural language understanding and high accuracy in generating meaningful embeddings.
- **FAISS (Facebook AI Similarity Search)**: Selected for its high-performance similarity search on large-scale vectors, allowing for efficient retrieval of relevant route instructions.

### 📚 **Libraries Needed**

- `streamlit`
- `pyttsx3`
- `speech_recognition`
- `PyPDF2`
- `langchain`
- `langchain_google_genai`
- `google.generativeai`
- `langchain_community.vectorstores`
- `FAISS`
- `dotenv`

### 📊 **Exploratory Data Analysis Results**

This project does not involve a traditional EDA, as it focuses on real-time PDF processing and AI-powered voice interaction. However, if visualizations or processing statistics (e.g., text chunk sizes, similarity scores) are generated, they can be displayed here.

### 📈 **Performance of the Models based on the Accuracy Scores**

The performance of the system can be evaluated based on:
- **Response accuracy**: How well the system provides relevant answers from the context.
- **Speech recognition accuracy**: How accurately it converts voice input into text.
- **Embedding quality**: How well the AI model embeds text for similarity search.

### 💻 How to run

To get started with NaviBot-Voice-Assistant, follow these steps:

1. Navigate to the project directory:

```bash
cd NaviBot-Voice-Assistant
```

3. Activating venv (optional)

```bash
conda create -n venv python=3.10+
conda activate venv
```

4. Install dependencies:

```python
pip install -r requirements.txt
```

5. Configure environment variables
```
Rename `.env-sample` to `.env` file
Replace the API your Google API Key,
```
Kindly follow refer to this site for getting [your own key](https://ai.google.dev/tutorials/setup)
<br/>

6. Run the chatbot:

```bash
streamlit run app.py
```

PS: Try running other files as well for different use case.

### 📢 **Conclusion**

NaviBot-Voice-Assistant successfully integrates voice recognition and AI-powered question-answering to assist visually impaired users in navigating routes. It ensures high interaction accuracy by leveraging state-of-the-art models like Google Generative AI and FAISS for efficient retrieval and processing. The integration of these technologies provides a reliable and accessible navigation experience for its users.

### ✒️ **Signature**

**[J B Mugundh]**
GitHub: [Github](https://github.com/J-B-Mugundh)
LinkedIn: [Linkedin](https://www.linkedin.com/in/mugundhjb/)

161 changes: 161 additions & 0 deletions Advanced_Projects/NaviBot-Voice-Assistant/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
import streamlit as st
import pyttsx3
import speech_recognition as sr
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import google.generativeai as genai
from langchain_community.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
# Retrieve Google API key from environment variables
os.getenv("GOOGLE_API_KEY")
# Configure Generative AI API using the API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Initialize pyttsx3 for voice output (Text-to-Speech engine)
engine = pyttsx3.init()

# Function to speak the given text using pyttsx3
def speak(text):
engine.say(text)
engine.runAndWait()

# Function to listen for voice input using the microphone
def listen():
r = sr.Recognizer()
# Use the microphone as the audio source
with sr.Microphone() as source:
st.write("Listening...") # Notify the user that it's listening
r.adjust_for_ambient_noise(source) # Adjust for noise in the environment
audio = r.listen(source) # Capture the audio

try:
# Use Google Speech Recognition to convert audio to text
user_input = r.recognize_google(audio)
st.write(f"You said: {user_input}")
return user_input
except sr.UnknownValueError:
# Handle case where speech is not understood
st.write("Sorry, I could not understand what you said.")
return None
except sr.RequestError as e:
# Handle request errors with Google Speech Recognition API
st.write(f"Could not request results from Google Speech Recognition service; {e}")
return None

# Function to extract text from uploaded PDF files
def get_pdf_text(pdf_docs):
text = ""
for pdf in pdf_docs:
# Read PDF and extract text from each page
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text

# Function to split large text into smaller chunks using RecursiveCharacterTextSplitter
def get_text_chunks(text):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
chunks = text_splitter.split_text(text) # Split the text into manageable chunks
return chunks

# Function to create and save a FAISS vector store from the text chunks
def get_vector_store(text_chunks):
# Use Google Generative AI embeddings to convert text into vectors
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
# Create a FAISS vector store from the text chunks and embeddings
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
# Save the vector store locally for later use
vector_store.save_local("faiss_index")

# Function to get the conversational chain for question-answering using a template
def get_conversational_chain():

# Define a prompt template for generating detailed answers
prompt_template = """
Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
Context:\n {context}?\n
Question: \n{question}\n
Answer:
"""

# Initialize the Google Generative AI model (gemini-pro) with a specific temperature setting
model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3)

# Set up the prompt template with input variables for context and question
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
# Load the question-answering chain with the model and prompt
chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)

return chain

# Function to handle the user input (both text and voice), perform similarity search, and respond
def user_input(user_question):
# Initialize Google Generative AI embeddings for vector search
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Load the local FAISS index with embeddings, allow for dangerous deserialization
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
# Perform similarity search in the vector store based on the user's question
docs = new_db.similarity_search(user_question)

# Get the question-answering conversational chain
chain = get_conversational_chain()

# Run the chain to get the answer from the retrieved documents
response = chain(
{"input_documents": docs, "question": user_question},
return_only_outputs=True
)

# Use pyttsx3 to speak the response
speak(response["output_text"])
# Display the response on the Streamlit UI
st.write("Reply: ", response["output_text"])

# Main function for the Streamlit app
def main():
# Set the page configuration with a title
st.set_page_config("Beyond GPS Navigation")
st.header("Beyond GPS Navigator for Blind") # App header

# Get user's text input from Streamlit input box
user_question = st.text_input("Ask your query")
# Button to trigger voice input
voice_input_button = st.button("Voice Input")

# If the voice input button is clicked, listen to the user's voice
if voice_input_button:
user_question = listen() # Capture the user's voice input
if user_question:
user_input(user_question) # Process the input

# If there's text input from the user, process it
if user_question:
user_input(user_question)

# Sidebar menu for uploading PDF documents
with st.sidebar:
st.title("Menu:")
# Allow users to upload multiple PDF files
pdf_docs = st.file_uploader("Upload your route data and Click on the Submit & Process Button", accept_multiple_files=True)
# Button to process the uploaded PDF documents
if st.button("Submit & Process"):
with st.spinner("Processing..."): # Show spinner while processing
raw_text = get_pdf_text(pdf_docs) # Extract text from the PDF files
text_chunks = get_text_chunks(raw_text) # Split the text into chunks
get_vector_store(text_chunks) # Save the text chunks in a vector store
st.success("Done") # Notify the user when processing is complete

# Run the Streamlit app
if __name__ == "__main__":
main()
88 changes: 88 additions & 0 deletions Advanced_Projects/NaviBot-Voice-Assistant/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import os
import streamlit as st
from dotenv import load_dotenv
import google.generativeai as gen_ai
import speech_recognition as sr
from gtts import gTTS
from tempfile import TemporaryFile

# Load environment variables
load_dotenv()

# Configure Streamlit page settings
st.set_page_config(
page_title="Chat with Gemini-Pro!",
page_icon=":brain:", # Favicon emoji
layout="centered", # Page layout option
)

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# Set up Google Gemini-Pro AI model
gen_ai.configure(api_key=GOOGLE_API_KEY)
model = gen_ai.GenerativeModel('gemini-pro')

# Function to translate roles between Gemini-Pro and Streamlit terminology
def translate_role_for_streamlit(user_role):
if user_role == "model":
return "assistant"
else:
return user_role

# Function to recognize speech input
def recognize_speech():
r = sr.Recognizer()
with sr.Microphone() as source:
st.write("Speak now...")
audio = r.listen(source)

try:
user_prompt = r.recognize_google(audio)
st.write("You said:", user_prompt)
return user_prompt
except sr.UnknownValueError:
st.write("Sorry, I could not understand your audio.")
return ""
except sr.RequestError as e:
st.write("Could not request results from Google Speech Recognition service; {0}".format(e))
return ""

# Function to output voice
def speak(text):
tts = gTTS(text=text, lang='en')
with TemporaryFile(suffix=".wav", delete=False) as f:
tts.write_to_fp(f)
filename = f.name
st.audio(filename, format='audio/wav')


# Initialize chat session in Streamlit if not already present
if "chat_session" not in st.session_state:
st.session_state.chat_session = model.start_chat(history=[])

# Display the chatbot's title on the page
st.title("🤖 Gemini Pro - ChatBot")

# Display the chat history
for message in st.session_state.chat_session.history:
with st.chat_message(translate_role_for_streamlit(message.role)):
st.markdown(message.parts[0].text)

# Input field for user's message
voice_input = st.checkbox("Voice Input")
if voice_input:
user_prompt = recognize_speech()
else:
user_prompt = st.text_input("Ask Gemini-Pro...")

if user_prompt:
# Add user's message to chat and display it
st.chat_message("user").markdown(user_prompt)

# Send user's message to Gemini-Pro and get the response
gemini_response = st.session_state.chat_session.send_message(user_prompt)

# Display Gemini-Pro's response
with st.chat_message("assistant"):
st.markdown(gemini_response.text)
speak(gemini_response.text)
Loading

0 comments on commit 7825fb7

Please sign in to comment.