SimpleRAG is a repository designed to demonstrate the use of Retrieval-Augmented Generation (RAG) with Milvus and LangChain.
Before setting up SimpleRAG, ensure you have the following:
-
LLamaParse API Key:
- Sign up and obtain your API key from LLamaParse.
-
Milvus Installation:
- Follow the official Milvus installation guide to set up a standalone Milvus instance using Docker.
- Linux and Mac OS is recommended.
-
Environment Requirements:
- Anaconda
- (Recommended) A GPU server for hosting the embedding model & LLM.
-
Clone the Repository:
git clone https://github.com/haozhuang0000/SimpleRAG.git cd SimpleRAG
-
Create a Conda Environment:
conda create -n simplerag python=3.11 conda activate simplerag
-
Install Dependencies:
pip install -r requirements.txt
Create a .env
file in the root directory of the project and configure the following variables:
VDB_HOST=YOUR_MILVUS_IP_ADDRESS
VDB_PORT=YOUR_MILVUS_PORT
EMBEDDING_HOST=YOUR_EMBEDDING_MODEL_IP_ADDRESS
EMBEDDING_PORT=YOUR_EMBEDDING_MODEL_PORT
OLLAMA_HOST=YOUR_OLLAMA_IP_ADDRESS
OLLAMA_PORT=YOUR_OLLAMA_PORT
LLAMAPARSER_API_KEY=your_llamaparse_api_key
- Create a folder
_static
& Put your PDF files under_static
python main.py
The Retrieval-Augmented Generation (RAG) process in SimpleRAG follows these steps:
- Initialize Collection: Create a new collection in Milvus to store document embeddings.
- Process PDFs: Use LLamaParse to process all PDF files located in the
_static
directory.
- Document Chunks: Break down parsed documents into chunks for processing.
- Generate Embeddings: Use the embedding model to generate embeddings for each document chunk.
- Store in Milvus: Insert the generated embeddings into the Milvus vector database for future retrieval.
- Query Embedding: Convert the user's query into an embedding using the same embedding model.
- Similarity Search: Perform a similarity search in Milvus to find chunks that are most relevant to the query embedding.
- Fetch Results: Retrieve the most relevant document chunks based on the similarity search results.
- Contextual Response: Utilize LangChain to generate a response from the language model, incorporating the retrieved context for response.