The AI-Powered Conversational Assistant is designed to provide intelligent, context-aware responses to user queries by leveraging state-of-the-art LLMs (Large Language Models) and retrieval-augmented generation (RAG) techniques. It combines information retrieval, natural language understanding, and conversational AI to deliver accurate, concise, and user-friendly answers.
- Accepts user queries via a clean Streamlit interface.
- Provides answers grounded in relevant content retrieved from a vector database.
- Combines Pinecone for vector database storage and Hugging Face models for text embeddings.
- Re-ranks retrieved documents using BM25 for enhanced relevancy.
- Employs Google Generative AI (Gemini) and Nvidia API for generating responses.
- Provides clarity and readability in generated answers.
- Extracts client names (e.g., JP, Z) and document types (e.g., MSA, SoW) using a tailored LLM-based pipeline.
- Extracts structured and unstructured data from documents using LlamaParse.
- Processes text, tables, and metadata while preserving original formatting.
- Summarizes extracted content using Nvidia's API to ensure concise and context-rich outputs.
- Appends metadata and stores summarized data efficiently.
- Embeds processed data using HuggingFace's Sentence Transformer model.
- Stores embedded vectors in a Pinecone vector database for efficient retrieval.
The system uses LlamaParse to parse documents and extract their contents into structured formats. Key instructions followed during parsing include:
- Text Extraction: Extract text exactly as it appears, preserving headers, bullet points, and formatting.
- Table Handling:
- Maintain the integrity of multi-page tables as single entities.
- Preserve column headers consistently across pages.
- Treat new headers as indicators of new tables.
- Metadata Attachment: Attach client name, document type, and file name metadata to extracted elements.
After parsing, the extracted content is summarized using Nvidia's API for better retrieval and analysis. The summarization process involves:
- Maintaining context and avoiding critical information loss.
- Summarizing both textual content and tables.
- Attaching metadata such as client name, document type, and file name.
from llama_parse import LlamaParse
from llama_index.core.node_parser import MarkdownElementNodeParser
from langchain import LLMChain, PromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
# Parsing and summarizing the document
path = 'path/to/your/document.pdf'
chunk_size = 1024
chunk_overlap = 200
client_name = 'JP'
doc_type = 'MSA Agreement'
docs, tables = process_and_attach_metadata(path, chunk_size, chunk_overlap, client_name, doc_type, ins1)
After summarization, the extracted and summarized text and tables are embedded using HuggingFace's Sentence Transformer (all-mpnet-base-v2
) and stored in Pinecone for efficient vector-based similarity search.
from sentence_transformers import SentenceTransformer
from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore
# Initialize embeddings and Pinecone index
pc = Pinecone(api_key="your-pinecone-api-key")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vector_store = PineconeVectorStore(index=pc.Index("your-index-name"), embedding=embeddings)
# Add documents to Pinecone
vector_store.add_documents(documents=combined_documents, ids=uuids)
- Python 3.8+
- Required Python Libraries:
streamlit
langchain
pinecone
sentence-transformers
google-generativeai
langchain-nvidia-ai-endpoints
- Clone the repository:
git clone https://github.com/your-repo/ai-conversational-assistant.git cd ai-conversational-assistant
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables for API keys:
export GOOGLE_API_KEY="your-google-api-key" export NVIDIA_API_KEY="your-nvidia-api-key" export PINECONE_API_KEY="your-pinecone-api-key"
- Start the Streamlit app:
streamlit run app.py
- Enter your query in the text input field.
- View the AI-generated response and relevant retrieved content.
- Add support for multi-modal inputs (e.g., images, audio).
- Enhance summarization capabilities using other LLM APIs.
- Integrate additional LLMs for diverse use cases.
We welcome contributions! Please fork the repository and submit a pull request with your proposed changes.
This project is licensed under the MIT License. See the LICENSE
file for details.