Skip to content

Commit

Permalink
split process and ingest sub-commands and other comments from latest …
Browse files Browse the repository at this point in the history
…review meeting

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
  • Loading branch information
dmartinol committed Dec 11, 2024
1 parent c1e5104 commit 0ed5ad9
Showing 1 changed file with 119 additions and 86 deletions.
205 changes: 119 additions & 86 deletions docs/cli/ilab-rag-retrieval.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Design Proposal - Embedding Ingestion Pipeline And RAG-Based Chat
**TODOs**:
* Vector store authentication options.
* Document versioning and data update policies.
* Unify prompt management in InstructLab. See (`chat_template` [configuration][chat_template] and
`augment_chat_template` [function][augment_chat_template])

**Author**: Daniele Martinoli

Expand All @@ -8,45 +13,59 @@
This document proposes enhancements to the `ilab` CLI to support workflows utilizing Retrieval-Augmented Generation
(RAG) artifacts within `InstructLab`. The proposed changes introduce new commands and options for the embedding ingestion
and RAG-based chat pipelines:
* A new embedding ingestion command to process customer documentation, generate embeddings, and ingest them into a configured vector store.
* A new `ilab data` sub-command to process customer documentation.
* A new `ilab data` sub-command to generate and ingest embeddings from pre-processed documents into a configured vector store.
* An option to enhance the chat pipeline by using the stored embeddings to augment the context of conversations, improving relevance and accuracy.

## 2. Proposed Commands
## 2. Proposed Pipelines
### 2.1 Working Assumption
This proposal aims to serve as a reference design to develop a Proof of Concept for RAG workflows, while
also laying the foundation for future implementations of state-of-the-art RAG artifacts tailored to specific use
cases.

To minimize impact on current and future users, the following boundaries are defined:
* Existing commands will only be updated to add flags or configurations that are maintainable in future versions.
* Configuration parameters will be marked as `Optional` to not affect existing runtimes.
* New commands will be added using flags (e.g., `--abc`) or environment variables (e.g., `ILAB_ABC`) to avoid any configuration changes.
This approach ensures configuration compatibility even if the commands are deprecated later.
#### Command Options
To maintain compatibility and simplicity, no new configurations will be introduced for new commands. Instead,
the settings will be defined using the following hierarchy (options higher in the list overriding those below):
* CLI flags (e.g., `--FLAG`).
* Environment variables following a consistent naming convention, such as `ILAB_<UPPERCASE_ARGUMENT_NAME>`.
* Default values, for all the applicable use cases.

### 2.2 Embedding Ingestion Pipeline Command
The proposal is to add a `process` command to the `data` command group, with an explicit `--rag` flag to trigger
the executions:
For example, the `vectordb-uri` argument can be implemented using the `click` module like this:
```py
@click.option(
"--vectordb-uri",
default='rag-output.db',
envvar="ILAB_VECTORDB_URI",
)
```
ilab data process --rag /path/to/docs/folder

#### Local embedding models
The embedding model used to generate text embeddings must be downloaded locally before executing the pipeline.

For example, this command can be used to download the `sentence-transformers/all-minilm-l6-v2` model to the local models cache:
```bash
ilab model download -rp sentence-transformers/all-minilm-l6-v2
```

The rationale behind this choice is that the `data process` command can support future workflows, making its
introduction an investment to anticipate other needs.
If the configured embedding model has not been cached, the command execution will terminate with an error. This requirement applies
consistently to all new and updated commands.

Since the RAG behavior is the only functionality of this new command, executions without the `--rag` option will result
in no output for now.
### 2.2 Document Processing Pipeline
The proposal is to add a `process` sub-command to the `data` command group:
```
ilab data process --input /path/to/docs/folder --output /path/to/processed/folder
```

#### Command Purpose
Generate the embeddings from the documents at */path/to/docs/folder* folder and store them in the
configured vector database. These embeddings are intended to be used as augmented context in a RAG-based chat pipeline.
Applies the transformation for the customer documents in `/path/to/docs/folder`. Processed artifacts are stored under `/path/to/processed/folder`.

#### Assumptions
The provided documents must be in JSON format according to the InstructLab schema: this is the schema generated
when transforming knowledge documents with the `ilab data generate` command (see
[Getting Started with Knowledge Contributions][1]).
***Notes**:
* In alignment with the current SDG implementation, the folder will not be navigated recursively. Only files located at the root level of the specified
folder will be considered. The same principle applies to all other options outlined below.
* To ensure consistency and avoid issues with document versioning or outdated artifacts, the destination folder will be cleared before execution.
This ensures it contains only the artifacts generated from the most recent run.

To simplify the execution of the transformation step, we introduce a `--transform` option which also includes the documents
transformation, leveraging on the `instructlab-sdg` modules.
The trasformation is based on the `instructlab-sdg` modules (the initial step of the `ilab data generate` pipeline)

### Why We Need It
This command streamlines the `ilab data generate` pipeline and eliminates the requirement to define a `qna` document,
Expand All @@ -57,58 +76,57 @@ which typically includes:
The goal is not to generate training data for InstructLab-trained models but to utilize the documents for RAG
workflows with pre-tuned models.

#### Supported Databases
The command may support various vector database types. A default configuration will align with the selected
InstructLab technology stack.
#### Usage
The generated artifacts can later be used to generete and ingest the embeddings into a vector database.

### Local embedding models
The embedding model used to generate the text embeddings must be downloaded locally before executing the pipeline.
### 2.3 Document Processing Pipeline Options

For example, this can be used to download the `sentence-transformers/all-minilm-l6-v2` model to the local models cache:
```bash
ilab model download -rp sentence-transformers/all-minilm-l6-v2

| Option Description | Default Value | CLI Flag | Environment Variable |
|--------------------|---------------|----------|----------------------|
| Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` |
| Name of the embedding model. | **TBD** | `--embedding-model` | `ILAB_EMBEDDING_MODEL_NAME` |

### 2.4 Embedding Ingestion Pipeline
The proposal is to add an `ingest` sub-command to the `data` command group:
```
ilab data ingest /path/to/docs/folder
```

#### Working Assumption
The documents at the specified path have already been processed using the `data process` command or an equivalent method
(see [Getting Started with Knowledge Contributions][ilab-knowledge]).

If the configured embedding model has not been cached, the execution will terminate with an error.
#### Command Purpose
Generate the embeddings from the pre-processed documents at */path/to/docs/folder* folder and store them in the
configured vector database.

**Notes**:
* To ensure consistency and avoid issues with document versioning or outdated embeddings, the ingested collection will be cleared before execution.
This ensures it contains only the embeddings generated from the most recent run.

### Why We Need It
To populate embedding vector stores with pre-processed information that can be used at chat inference time.

#### Supported Databases
The command may support various vector database types. A default configuration will align with the selected
InstructLab technology stack.

#### Usage
The generated embeddings can later be retrieved from a vector database and converted to text, enriching the
context for RAG-based chat pipelines.

#### Defining Command Options
To maintain compatibility and simplicity, no new configurations will be introduced for this command. Instead,
the settings will be defined using the following hierarchy (options higher in the list overriding those below):
* CLI flags (e.g., `--rag`).
* Environment variables following a consistent naming convention, such as `ILAB_<UPPERCASE_ARGUMENT_NAME>`.
* Default values, for all the applicable use cases.

For example, the `vectordb-uri` argument can be implemented using the `click` module like this:
```py
@click.option(
"--vectordb-uri",
default='rag-output.db',
envvar="ILAB_VECTORDB_URI",
)
```

### 2.3 Embedding Ingestion Pipeline Options
### 2.5 Embedding Ingestion Pipeline Options

| Option Description | Default Value | CLI Flag | Environment Variable |
|--------------------|---------------|----------|----------------------|
| Whether to include a transformation step. | `False` | `--transform` (boolean) | `ILAB_TRANSFORM` |
| The output path of transformed documents (serve as input for the embedding ingestion pipeline). Mandatory when `--transform` is used. | | `--transform-output` | `ILAB_TRANSFORM_OUTPUT` |
| Vector DB implementation, one of: `milvuslite`, **TBD** | `milvuslite` | `--vectordb-type` | `ILAB_VECTORDB_TYPE` |
| Vector DB service URI. | `./rag-output.db` | `--vectordb-uri` | `ILAB_VECTORDB_URI` |
| Vector DB collection name. | `IlabEmbeddings` | `--vectordb-collection-name` | `ILAB_VECTORDB_COLLECTION_NAME` |
| Vector DB connection username. | | `--vectordb-username` | `ILAB_VECTORDB_USERNAME` |
| Vector DB connection password. | | `--vectordb-password` | `ILAB_VECTORDB_PASSWORD` |
| Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` |
| Name of the embedding model. | **TBD** | `--embedding-model` | `ILAB_EMBEDDING_MODEL_NAME` |
| Token to download private models. | | `--embedding-model-token` | `ILAB_EMBEDDING_MODEL_TOKEN` |

**TODO**: vector store authentication options.

### 2.4 RAG Chat Pipeline Command
### 2.6 RAG Chat Pipeline Command
The proposal is to add a `--rag` flag to the `model chat` command, like:
```
ilab model chat --rag
Expand All @@ -124,11 +142,20 @@ enriching the conversational experience with relevant insights.
* Append the retrieved context to the original LLM request.
* Send the context augmented request to the LLM and return the response to the user.

### Local embedding models
Similar to the embedding ingestion pipeline, the embedding model required for generating text embeddings must be downloaded locally
before running the pipeline.
#### Prompt Template
A default non-configurable template is used with parameters to specify the user query and the context, like:
```text
Given the following information, answer the question.
Context:
{context}
Question:
{user_query}
Answer:
```

Future extensions should align prompt management with the existing InstructLab design.

### 2.5 RAG Chat Commands
### 2.7 RAG Chat Commands
The `/r` command may be added to the `ilab model chat` command to dynamically toggle the execution of the RAG pipeline.

The current status could be displayed with an additional marker on the chat status bar, as in (top right corner):
Expand Down Expand Up @@ -158,25 +185,19 @@ The current status could be displayed with an additional marker on the chat stat
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

### 2.6 RAG Chat Options
As we stated in [3.1 Working Assumptions](#31-working-assumption), we will introduce new configuration options for the spceific `chat` command,
### 2.8 RAG Chat Options
As we stated in [2.1 Working Assumptions](#21-working-assumption), we will introduce new configuration options for the spceific `chat` command,
but we'll use flags and environment variables for the options that come from the embedding ingestion pipeline command.

| Configuration FQN | Description | Default Value | CLI Flag | Environment Variable |
|-------------------|-------------|---------------|----------|----------------------|
| chat.rag.enabled | Enable or disable the RAG pipeline. | `false` | `--rag` (boolean)| `ILAB_CHAT_RAG_ENABLED` |
| chat.rag.retriever.top_k | The maximum number of documents to retrieve. | `10` | `--retriever-top-k` | `ILAB_CHAT_RAG_RETRIEVER_TOP_K` |
| chat.rag.prompt | Prompt template for RAG-based queries. | Examples below | `--rag-prompt` | `ILAB_CHAT_RAG_PROMPT` |
| | Vector DB implementation, one of: `milvuslite`, **TBD** | `milvuslite` | `--vectordb-type` | `ILAB_VECTORDB_TYPE` |
| | Vector DB service URI. | `./rag-output.db` | `--vectordb-uri` | `ILAB_VECTORDB_URI` |
| | Vector DB collection name. | `IlabEmbeddings` | `--vectordb-collection-name` | `ILAB_VECTORDB_COLLECTION_NAME` |
| | Vector DB connection username. | | `--vectordb-username` | `ILAB_VECTORDB_USERNAME` |
| | Vector DB connection password. | | `--vectordb-password` | `ILAB_VECTORDB_PASSWORD` |
| | Base directories where models are stored. | `$HOME/.cache/instructlab/models` | `--model-dir` | `ILAB_MODEL_DIR` |
| | Name of the embedding model. | **TBD** | `--model` | `ILAB_EMBEDDING_MODEL_NAME` |
| | Token to download private models. | | `--model-token` | `ILAB_EMBEDDING_MODEL_TOKEN` |

**TODO**: vector store authentication options.

Equivalent YAML document for the newly proposed options:
```yaml
Expand All @@ -185,29 +206,22 @@ chat:
enabled: false
retriever:
top_k: 10
prompt: |
Given the following information, answer the question.
Context:
{{context}}
Question: {{question}}
Answer:
```
**TODO**: unify prompt management in InstructLab
### 2.7 References
### 2.9 References
* [Haystack-DocumentSplitter](https://github.com/deepset-ai/haystack/blob/f0c3692cf2a86c69de8738d53af925500e8a5126/haystack/components/preprocessors/document_splitter.py#L55) is temporarily adopted with default settings until a splitter based on the [docling chunkers][chunkers] is integrated
in Haystack.
* [MilvusEmbeddingRetriever](https://github.com/milvus-io/milvus-haystack/blob/77b27de00c2f0278e28b434f4883853a959f5466/src/milvus_haystack/milvus_embedding_retriever.py#L18)
### 2.8 Workflow Visualization
### 2.10 Workflow Visualization
<!-- https://excalidraw.com/#json=PN2h_LM-Wd2WZYBJfZMDs,WQCq5NDbRXUH2qr8maFFNg -->
Embedding ingestion pipeline:
![ingestion-mvp](../images/ingestion-mvp.png)
RAG-based Chat pipeline:
![rag-chat](../images/rag-chat.png)
### 2.9 Proposed Implementation Stack
### 2.11 Proposed Implementation Stack
> **ℹ️ Note:** This stack is still under review. The proposed list represents potential candidates based on the current state of discussions.
The following technologies form the foundation of the proposed solution:
Expand All @@ -216,11 +230,18 @@ The following technologies form the foundation of the proposed solution:
* [MilvusLite](https://milvus.io/docs/milvus_lite.md): The default vector database for efficient storage and retrieval of embeddings.
* [Docling](https://github.com/DS4SD/docling): Document processing tool. For more details, refer to William’s blog, [Docling: The missing document processing companion for generative AI](https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai).
## 3. Future Enhancements
### 3.1 Model Evaluation
## 3. Design Considerations
This solution must adopt a pluggable design to facilitate the easy integration of additional components:
* Vector stores: Support all selected implementations.
* Embedding models: Handle embedding models using the appropriate embedder implementation for the chosen framework.
* Indexing services: Future extensions should allow retrieval of embeddings from configurable APIs, thereby extending the concept of a vector store
to include these retrieval services.
## 4. Future Enhancements
### 4.1 Model Evaluation
**TODO** A separate ADR will be defined.
### 3.2 Additional RAG chat steps
### 4.2 Advanced RAG retrieval steps
- [Ranking retriever's result][ranking]:
```bash
ilab model chat --rag --ranking --ranking-top-k=5 --ranking-model=cross-encoder/ms-marco-MiniLM-L-12-v2
Expand All @@ -229,15 +250,27 @@ ilab model chat --rag --ranking --ranking-top-k=5 --ranking-model=cross-encoder/
```bash
ilab model chat --rag --query-expansion --query-expansion-prompt="$QUERY_EXPANSION_PROMPT" --query-expansion-num-of-queries=5
```
- Using retrieval strategy:
```bash
ilab model chat --rag --retrieval-strategy query-expansion --retrieval-strategy-options="prompt=$QUERY_EXPANSION_PROMPT;num_of_queries=5"
```
- ...

### 3.3 Containerized Index
Generate a containerized RAG artifact to expose a `/query` endpoint:
### 4.3 Containerized Indexing Service
Generate a containerized RAG artifact to expose a `/query` endpoint that can serve as an alternative source :
```bash
ilab data ingest --build-image --image-name=docker.io/user/my_rag_artifacts:1.0
```

Then serve it and use it in a chat session:
```bash
ilab data process --rag --build-image --image-name=docker.io/user/my_rag_artifacts:1.0
ilab serve --rag-embeddings --image-name=docker.io/user/my_rag_artifacts:1.0 --port 8123
ilab model chat --rag --retriever-type api --retriever-uri http://localhost:8123
```

[1]: https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions
[ilab-knowledge]: https://github.com/instructlab/taxonomy?tab=readme-ov-file#getting-started-with-knowledge-contributions
[chat_template]: https://github.com/instructlab/instructlab/blob/0a773f05f8f57285930df101575241c649f591ce/src/instructlab/configuration.py#L244
[augment_chat_template]: https://github.com/instructlab/instructlab/blob/48e3f7f1574ae50036d6e342b8d78d8eb9546bd5/src/instructlab/model/backends/llama_cpp.py#L281
[ranking]: https://docs.haystack.deepset.ai/v1.21/reference/ranker-api
[expansion]: https://haystack.deepset.ai/blog/query-expansion
[chunkers]: https://github.com/DS4SD/docling/blob/main/docs/concepts/chunking.md

0 comments on commit 0ed5ad9

Please sign in to comment.