Retrieval Augmented Generation (RAG) with Mistral-7B-Instruct-v0.1 and LlamaIndex

This repository contains a complete implementation of Retrieval Augmented Generation (RAG) using Mistral-7B-Instruct-v0.1 for generating responses from a custom dataset. The main file app.py sets up a Flask web server to provide an interface for querying the RAG model.

This implementation is done in two key phases: indexing and retrieval & generation. First, during indexing, documents are split into text chunks, and their embeddings are stored in a vector database. Then, in the retrieval & generation phase, user queries are matched against these embeddings, prompting the LLM to generate responses based on the retrieved contexts.

Why Mistral-7B?, Mistral-7B, especially in its 4-bit quantized version, offers impressive performance while being efficient in memory usage ( other open-source LLMs such as Llama2, Mindy-7B, MoMo-70B, etc can also be utilized). Here ‘Mistral-7B-Instruct-v0.2-Q4_K_M.gguf’ is used for the efficient retrieval and generation tasks.

The stack includes the LlamaIndex framework, which provides SentenceWindowNodeParser, VectorStoreIndex, ServiceContext, and SentenceTransformerRerank for powerful, breeze querying, and accessing domain-specific data, outperforming alternatives like Langchain.

Instead of fine-tuning an LLM model, the embedding-based retrieval ensures scalability and avoids issues like model drift, cost, and complexity.

How to assess the RAG system? Check out the evaluation process and benchmarks in detail here. It offers further insights into the methodology and performance metrics.

(Note: A system with at least 12 GB GPU and 16 GB RAM is recommended for optimal performance.)

Setup

Set up the environment and install the necessary dependencies. You can do this using the following steps:

Clone this repository to your local machine:

git clone https://github.com/ChanukaRavishan/MistralRAG-LlamaIndex.git

Navigate to the repository directory:
```
cd MistralRAG-LlamaIndex
```
Install the required Python packages. You may use a virtual environment to manage dependencies:
```
pip install -r requirements.txt
```

Usage

1. Creating the vector dataset

To create a vector storage follow the instructions in the file: 'LLaMaCPP_python_creating_vector_storage.ipynb' Here, I'm using huggingface 'bge-small-en-v1.5 model' for the embedding generation and storing the embeddings in the VectorStoreIndex provided by LlamaIndex.

2. Starting the web server

To start the Flask web server and interact with the RAG model, run the following command:

nohup python app.py &

This will start the server locally on http://localhost:3000/. You can now visit this URL in your web browser to access the interface or if you are running this implementation on a remote machine, you can visit http://<your-remote-ip-address>:3000/.

API Endpoints

GET /query: This endpoint accepts a query string as a parameter (message) and returns the response generated by the RAG model. Example usage:
```
http://localhost:3000/query?message=your_query_here
```

Main File Explanation

The app.py file contains the main implementation for setting up the Flask web server and integrating the RAG model. Here's a breakdown of its components:

Initialization Functions:
- initialize_llm: Initializes the LlamaCPP model with specified parameters.
- initialize_query_engine: Initializes the query engine for executing queries with the RAG model.
Flask App Routes:
- /: Renders the index.html template in the templates folder.
- /query: Accepts a query string and returns the response generated by the RAG model.

Additional Notes

Ensure that you have the necessary models and resources available in the specified paths.
Customize the configuration and parameters according to your requirements.

Feel free to explore and modify the code to suit your needs. If you encounter any issues or have suggestions for improvement, please don't hesitate to open an issue or contribute to the repository.

Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
templates		templates
LLaMaCPP_python_creating_vector_storage.ipynb		LLaMaCPP_python_creating_vector_storage.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
webapp.png		webapp.png
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval Augmented Generation (RAG) with Mistral-7B-Instruct-v0.1 and LlamaIndex

Setup

Usage

1. Creating the vector dataset

2. Starting the web server

API Endpoints

Main File Explanation

Additional Notes

About

Releases

Packages

Languages

ChanukaRavishan/MistralRAG-LlamaIndex

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Generation (RAG) with Mistral-7B-Instruct-v0.1 and LlamaIndex

Setup

Usage

1. Creating the vector dataset

2. Starting the web server

API Endpoints

Main File Explanation

Additional Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages