From 0ed5ad98a5cc5e8ecaf4729216aa3f633b19c784 Mon Sep 17 00:00:00 2001 From: Daniele Martinoli Date: Wed, 11 Dec 2024 11:13:04 +0100 Subject: [PATCH] split process and ingest sub-commands and other comments from latest review meeting Signed-off-by: Daniele Martinoli --- docs/cli/ilab-rag-retrieval.md | 205 +++++++++++++++++++-------------- 1 file changed, 119 insertions(+), 86 deletions(-) diff --git a/docs/cli/ilab-rag-retrieval.md b/docs/cli/ilab-rag-retrieval.md index c7a4182..ad2331d 100644 --- a/docs/cli/ilab-rag-retrieval.md +++ b/docs/cli/ilab-rag-retrieval.md @@ -1,4 +1,9 @@ # Design Proposal - Embedding Ingestion Pipeline And RAG-Based Chat +**TODOs**: +* Vector store authentication options. +* Document versioning and data update policies. +* Unify prompt management in InstructLab. See (`chat_template` [configuration][chat_template] and +`augment_chat_template` [function][augment_chat_template]) **Author**: Daniele Martinoli @@ -8,45 +13,59 @@ This document proposes enhancements to the `ilab` CLI to support workflows utilizing Retrieval-Augmented Generation (RAG) artifacts within `InstructLab`. The proposed changes introduce new commands and options for the embedding ingestion and RAG-based chat pipelines: -* A new embedding ingestion command to process customer documentation, generate embeddings, and ingest them into a configured vector store. +* A new `ilab data` sub-command to process customer documentation. +* A new `ilab data` sub-command to generate and ingest embeddings from pre-processed documents into a configured vector store. * An option to enhance the chat pipeline by using the stored embeddings to augment the context of conversations, improving relevance and accuracy. -## 2. Proposed Commands +## 2. Proposed Pipelines ### 2.1 Working Assumption This proposal aims to serve as a reference design to develop a Proof of Concept for RAG workflows, while also laying the foundation for future implementations of state-of-the-art RAG artifacts tailored to specific use cases. -To minimize impact on current and future users, the following boundaries are defined: -* Existing commands will only be updated to add flags or configurations that are maintainable in future versions. - * Configuration parameters will be marked as `Optional` to not affect existing runtimes. -* New commands will be added using flags (e.g., `--abc`) or environment variables (e.g., `ILAB_ABC`) to avoid any configuration changes. - This approach ensures configuration compatibility even if the commands are deprecated later. +#### Command Options +To maintain compatibility and simplicity, no new configurations will be introduced for new commands. Instead, +the settings will be defined using the following hierarchy (options higher in the list overriding those below): +* CLI flags (e.g., `--FLAG`). +* Environment variables following a consistent naming convention, such as `ILAB_`. +* Default values, for all the applicable use cases. -### 2.2 Embedding Ingestion Pipeline Command -The proposal is to add a `process` command to the `data` command group, with an explicit `--rag` flag to trigger -the executions: +For example, the `vectordb-uri` argument can be implemented using the `click` module like this: +```py +@click.option( + "--vectordb-uri", + default='rag-output.db', + envvar="ILAB_VECTORDB_URI", +) ``` -ilab data process --rag /path/to/docs/folder + +#### Local embedding models +The embedding model used to generate text embeddings must be downloaded locally before executing the pipeline. + +For example, this command can be used to download the `sentence-transformers/all-minilm-l6-v2` model to the local models cache: +```bash +ilab model download -rp sentence-transformers/all-minilm-l6-v2 ``` -The rationale behind this choice is that the `data process` command can support future workflows, making its -introduction an investment to anticipate other needs. +If the configured embedding model has not been cached, the command execution will terminate with an error. This requirement applies +consistently to all new and updated commands. -Since the RAG behavior is the only functionality of this new command, executions without the `--rag` option will result -in no output for now. +### 2.2 Document Processing Pipeline +The proposal is to add a `process` sub-command to the `data` command group: +``` +ilab data process --input /path/to/docs/folder --output /path/to/processed/folder +``` #### Command Purpose -Generate the embeddings from the documents at */path/to/docs/folder* folder and store them in the -configured vector database. These embeddings are intended to be used as augmented context in a RAG-based chat pipeline. +Applies the transformation for the customer documents in `/path/to/docs/folder`. Processed artifacts are stored under `/path/to/processed/folder`. -#### Assumptions -The provided documents must be in JSON format according to the InstructLab schema: this is the schema generated -when transforming knowledge documents with the `ilab data generate` command (see -[Getting Started with Knowledge Contributions][1]). +***Notes**: +* In alignment with the current SDG implementation, the folder will not be navigated recursively. Only files located at the root level of the specified +folder will be considered. The same principle applies to all other options outlined below. +* To ensure consistency and avoid issues with document versioning or outdated artifacts, the destination folder will be cleared before execution. + This ensures it contains only the artifacts generated from the most recent run. -To simplify the execution of the transformation step, we introduce a `--transform` option which also includes the documents -transformation, leveraging on the `instructlab-sdg` modules. +The trasformation is based on the `instructlab-sdg` modules (the initial step of the `ilab data generate` pipeline) ### Why We Need It This command streamlines the `ilab data generate` pipeline and eliminates the requirement to define a `qna` document, @@ -57,58 +76,57 @@ which typically includes: The goal is not to generate training data for InstructLab-trained models but to utilize the documents for RAG workflows with pre-tuned models. -#### Supported Databases -The command may support various vector database types. A default configuration will align with the selected -InstructLab technology stack. +#### Usage +The generated artifacts can later be used to generete and ingest the embeddings into a vector database. -### Local embedding models -The embedding model used to generate the text embeddings must be downloaded locally before executing the pipeline. +### 2.3 Document Processing Pipeline Options -For example, this can be used to download the `sentence-transformers/all-minilm-l6-v2` model to the local models cache: -```bash -ilab model download -rp sentence-transformers/all-minilm-l6-v2 + +| Option Description | Default Value | CLI Flag | Environment Variable | +|--------------------|---------------|----------|----------------------| +| Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` | +| Name of the embedding model. | **TBD** | `--embedding-model` | `ILAB_EMBEDDING_MODEL_NAME` | + +### 2.4 Embedding Ingestion Pipeline +The proposal is to add an `ingest` sub-command to the `data` command group: ``` +ilab data ingest /path/to/docs/folder +``` + +#### Working Assumption +The documents at the specified path have already been processed using the `data process` command or an equivalent method +(see [Getting Started with Knowledge Contributions][ilab-knowledge]). -If the configured embedding model has not been cached, the execution will terminate with an error. +#### Command Purpose +Generate the embeddings from the pre-processed documents at */path/to/docs/folder* folder and store them in the +configured vector database. + +**Notes**: +* To ensure consistency and avoid issues with document versioning or outdated embeddings, the ingested collection will be cleared before execution. + This ensures it contains only the embeddings generated from the most recent run. + +### Why We Need It +To populate embedding vector stores with pre-processed information that can be used at chat inference time. + +#### Supported Databases +The command may support various vector database types. A default configuration will align with the selected +InstructLab technology stack. #### Usage The generated embeddings can later be retrieved from a vector database and converted to text, enriching the context for RAG-based chat pipelines. -#### Defining Command Options -To maintain compatibility and simplicity, no new configurations will be introduced for this command. Instead, -the settings will be defined using the following hierarchy (options higher in the list overriding those below): -* CLI flags (e.g., `--rag`). -* Environment variables following a consistent naming convention, such as `ILAB_`. -* Default values, for all the applicable use cases. - -For example, the `vectordb-uri` argument can be implemented using the `click` module like this: -```py -@click.option( - "--vectordb-uri", - default='rag-output.db', - envvar="ILAB_VECTORDB_URI", -) -``` - -### 2.3 Embedding Ingestion Pipeline Options +### 2.5 Embedding Ingestion Pipeline Options | Option Description | Default Value | CLI Flag | Environment Variable | |--------------------|---------------|----------|----------------------| -| Whether to include a transformation step. | `False` | `--transform` (boolean) | `ILAB_TRANSFORM` | -| The output path of transformed documents (serve as input for the embedding ingestion pipeline). Mandatory when `--transform` is used. | | `--transform-output` | `ILAB_TRANSFORM_OUTPUT` | | Vector DB implementation, one of: `milvuslite`, **TBD** | `milvuslite` | `--vectordb-type` | `ILAB_VECTORDB_TYPE` | | Vector DB service URI. | `./rag-output.db` | `--vectordb-uri` | `ILAB_VECTORDB_URI` | | Vector DB collection name. | `IlabEmbeddings` | `--vectordb-collection-name` | `ILAB_VECTORDB_COLLECTION_NAME` | -| Vector DB connection username. | | `--vectordb-username` | `ILAB_VECTORDB_USERNAME` | -| Vector DB connection password. | | `--vectordb-password` | `ILAB_VECTORDB_PASSWORD` | | Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` | | Name of the embedding model. | **TBD** | `--embedding-model` | `ILAB_EMBEDDING_MODEL_NAME` | -| Token to download private models. | | `--embedding-model-token` | `ILAB_EMBEDDING_MODEL_TOKEN` | - -**TODO**: vector store authentication options. -### 2.4 RAG Chat Pipeline Command +### 2.6 RAG Chat Pipeline Command The proposal is to add a `--rag` flag to the `model chat` command, like: ``` ilab model chat --rag @@ -124,11 +142,20 @@ enriching the conversational experience with relevant insights. * Append the retrieved context to the original LLM request. * Send the context augmented request to the LLM and return the response to the user. -### Local embedding models -Similar to the embedding ingestion pipeline, the embedding model required for generating text embeddings must be downloaded locally -before running the pipeline. +#### Prompt Template +A default non-configurable template is used with parameters to specify the user query and the context, like: +```text +Given the following information, answer the question. +Context: +{context} +Question: +{user_query} +Answer: +``` + +Future extensions should align prompt management with the existing InstructLab design. -### 2.5 RAG Chat Commands +### 2.7 RAG Chat Commands The `/r` command may be added to the `ilab model chat` command to dynamically toggle the execution of the RAG pipeline. The current status could be displayed with an additional marker on the chat status bar, as in (top right corner): @@ -158,25 +185,19 @@ The current status could be displayed with an additional marker on the chat stat ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` -### 2.6 RAG Chat Options -As we stated in [3.1 Working Assumptions](#31-working-assumption), we will introduce new configuration options for the spceific `chat` command, +### 2.8 RAG Chat Options +As we stated in [2.1 Working Assumptions](#21-working-assumption), we will introduce new configuration options for the spceific `chat` command, but we'll use flags and environment variables for the options that come from the embedding ingestion pipeline command. | Configuration FQN | Description | Default Value | CLI Flag | Environment Variable | |-------------------|-------------|---------------|----------|----------------------| | chat.rag.enabled | Enable or disable the RAG pipeline. | `false` | `--rag` (boolean)| `ILAB_CHAT_RAG_ENABLED` | | chat.rag.retriever.top_k | The maximum number of documents to retrieve. | `10` | `--retriever-top-k` | `ILAB_CHAT_RAG_RETRIEVER_TOP_K` | -| chat.rag.prompt | Prompt template for RAG-based queries. | Examples below | `--rag-prompt` | `ILAB_CHAT_RAG_PROMPT` | | | Vector DB implementation, one of: `milvuslite`, **TBD** | `milvuslite` | `--vectordb-type` | `ILAB_VECTORDB_TYPE` | | | Vector DB service URI. | `./rag-output.db` | `--vectordb-uri` | `ILAB_VECTORDB_URI` | | | Vector DB collection name. | `IlabEmbeddings` | `--vectordb-collection-name` | `ILAB_VECTORDB_COLLECTION_NAME` | -| | Vector DB connection username. | | `--vectordb-username` | `ILAB_VECTORDB_USERNAME` | -| | Vector DB connection password. | | `--vectordb-password` | `ILAB_VECTORDB_PASSWORD` | | | Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` | | | Name of the embedding model. | **TBD** | `--model` | `ILAB_EMBEDDING_MODEL_NAME` | -| | Token to download private models. | | `--model-token` | `ILAB_EMBEDDING_MODEL_TOKEN` | - -**TODO**: vector store authentication options. Equivalent YAML document for the newly proposed options: ```yaml @@ -185,29 +206,22 @@ chat: enabled: false retriever: top_k: 10 - prompt: | - Given the following information, answer the question. - Context: - {{context}} - Question: {{question}} - Answer: ``` -**TODO**: unify prompt management in InstructLab -### 2.7 References +### 2.9 References * [Haystack-DocumentSplitter](https://github.com/deepset-ai/haystack/blob/f0c3692cf2a86c69de8738d53af925500e8a5126/haystack/components/preprocessors/document_splitter.py#L55) is temporarily adopted with default settings until a splitter based on the [docling chunkers][chunkers] is integrated in Haystack. * [MilvusEmbeddingRetriever](https://github.com/milvus-io/milvus-haystack/blob/77b27de00c2f0278e28b434f4883853a959f5466/src/milvus_haystack/milvus_embedding_retriever.py#L18) -### 2.8 Workflow Visualization +### 2.10 Workflow Visualization Embedding ingestion pipeline: ![ingestion-mvp](../images/ingestion-mvp.png) RAG-based Chat pipeline: ![rag-chat](../images/rag-chat.png) -### 2.9 Proposed Implementation Stack +### 2.11 Proposed Implementation Stack > **ℹ️ Note:** This stack is still under review. The proposed list represents potential candidates based on the current state of discussions. The following technologies form the foundation of the proposed solution: @@ -216,11 +230,18 @@ The following technologies form the foundation of the proposed solution: * [MilvusLite](https://milvus.io/docs/milvus_lite.md): The default vector database for efficient storage and retrieval of embeddings. * [Docling](https://github.com/DS4SD/docling): Document processing tool. For more details, refer to William’s blog, [Docling: The missing document processing companion for generative AI](https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai). -## 3. Future Enhancements -### 3.1 Model Evaluation +## 3. Design Considerations +This solution must adopt a pluggable design to facilitate the easy integration of additional components: +* Vector stores: Support all selected implementations. +* Embedding models: Handle embedding models using the appropriate embedder implementation for the chosen framework. +* Indexing services: Future extensions should allow retrieval of embeddings from configurable APIs, thereby extending the concept of a vector store + to include these retrieval services. + +## 4. Future Enhancements +### 4.1 Model Evaluation **TODO** A separate ADR will be defined. -### 3.2 Additional RAG chat steps +### 4.2 Advanced RAG retrieval steps - [Ranking retriever's result][ranking]: ```bash ilab model chat --rag --ranking --ranking-top-k=5 --ranking-model=cross-encoder/ms-marco-MiniLM-L-12-v2 @@ -229,15 +250,27 @@ ilab model chat --rag --ranking --ranking-top-k=5 --ranking-model=cross-encoder/ ```bash ilab model chat --rag --query-expansion --query-expansion-prompt="$QUERY_EXPANSION_PROMPT" --query-expansion-num-of-queries=5 ``` +- Using retrieval strategy: +```bash +ilab model chat --rag --retrieval-strategy query-expansion --retrieval-strategy-options="prompt=$QUERY_EXPANSION_PROMPT;num_of_queries=5" +``` - ... -### 3.3 Containerized Index -Generate a containerized RAG artifact to expose a `/query` endpoint: +### 4.3 Containerized Indexing Service +Generate a containerized RAG artifact to expose a `/query` endpoint that can serve as an alternative source : +```bash +ilab data ingest --build-image --image-name=docker.io/user/my_rag_artifacts:1.0 +``` + +Then serve it and use it in a chat session: ```bash -ilab data process --rag --build-image --image-name=docker.io/user/my_rag_artifacts:1.0 +ilab serve --rag-embeddings --image-name=docker.io/user/my_rag_artifacts:1.0 --port 8123 +ilab model chat --rag --retriever-type api --retriever-uri http://localhost:8123 ``` -[1]: https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions +[ilab-knowledge]: https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions +[chat_template]: https://github.com/instructlab/instructlab/blob/0a773f05f8f57285930df101575241c649f591ce/src/instructlab/configuration.py#L244 +[augment_chat_template]: https://github.com/instructlab/instructlab/blob/48e3f7f1574ae50036d6e342b8d78d8eb9546bd5/src/instructlab/model/backends/llama_cpp.py#L281 [ranking]: https://docs.haystack.deepset.ai/v1.21/reference/ranker-api [expansion]: https://haystack.deepset.ai/blog/query-expansion [chunkers]: https://github.com/DS4SD/docling/blob/main/docs/concepts/chunking.md \ No newline at end of file