LLM - Azure OpenAI-based Retrieval Augmented Generation (RAG)

This custom step uses a Retrieval Augmented Generation (RAG) approach to provide right context to an Azure OpenAI Large Language Model (LLM) for answering a question.

LLMs require relevant context to provide useful answers, especially for questions based on a local corpus of knowledge.

A RAG approach, explained in simple terms, retrieves relevant data from a knowledge base and provides the same to an LLM to use as context. RAG-based are expected to reduce LLM hallucinations (i.e. an LLM provides irrelevant or false answers). This custom step implements RAG with a Chroma DB vector store and passes retrieved documents to an Azure OpenAI service.

IMPORTANT: Be aware that this custom step uses an Azure OpenAI service that results in data being sent over to the service. Ensure you use this only in accordance with your organization's policies on calling external LLMs.

A general idea

This animated gif provides a basic idea:

Assumptions

Current assumptions for this initial versions (future versions may improve upon the same):

Users choose either an existing Chroma DB vector database collection or load PDF, SAS dataset, pandas DataFrame or CSV files to an existing or new Chroma DB collection.
Users may load all PDFs in a directory on the SAS Server (filesystem), or select a PDF/sas7bdat/DataFrame/CSV of their choice.
The code assumes use of a Chroma DB vector store. Users may choose to replace this with other supported vector stores.
The code uses the langchain LLM framework.
PDFs (containing text), CSV, SAS datasets and pandas DataFrames are currently the only loadable file format allowed. Users are however free to ingest various other document types into a Chroma DB collection beforehand, using the "Vector Databases - Hydrate Chroma DB collection" SAS Studio Custom Step (refer documentation)
User has already configured Azure OpenAI to deploy both an embedding function and LLM service, or knows the deployment names.

Requirements

A SAS Viya 4 environment version 2024.01 or later.
Python: Python version 3.10 is recommended to avoid package support or dependency issues.
Python packages to be installed:
Valid Azure OpenAI service with embedding & large language models deployed. Refer here for instructions

Parameters

Input Parameters

Source file location (optional, default is Context already loaded): in case you wish to present new source files to use as context, choose either selecting a folder, file,SAS dataset. pandas DataFrame or a CSV file. Otherwise, provide the name of an existing vector store collection in Configuration. Note that if choosing a SAS dataset, you must open an input port and attach a table to the custom step.
Source column ( required if SAS dataset, DataFrame or CSV selected): in case a SAS dataset, pandas DataFrame or a CSV file's selected, users must specify a column within the data source as the main "document" source. The other fields will be considered metadata.
System prompt (text area, default provided, required): a default system prompt which instructs the LLM on how to handle the question is provided. Note it makes use of template variables {context} and {question} referring to the context and question respectively. Edit this system prompt if you'd like to change the style of the response.
Question (text area, required): Provide your question to the LLM. Note that this will be added to additional system prompt, to create a prompt that will be passed to the LLM.

Configuration

Embedding model (text field, required): provide the name of your Azure OpenAI deployment of an OpenAI embedding model. For convenience, it's suggested to use the same name as the model you wish to use. For example, if your OpenAI embedding model happens to be text-embedding-3-small, use the same name for your deployment.
Vector store persistent path (text field, defaults to /tmp if blank): provide a path to a ChromaDB database. If blank, this defaults to /tmp on the filesystem.
Chroma DB collection name (text field): provide name of the Chroma DB collection you wish to use. If the collection does not exist, a new one will be created. Ensure you have write access to the persistent area.
Text generation model (text field, required): provide the name of an Azure OpenAI text generation deployment. For convenience, you may choose to use the same name as the OpenAI LLM. Example, gpt-35-turbo to gpt-35-turbo.
Azure OpenAI service details (file selector for key and text fields, required): provide a path to your Azure OpenAI access key. Ensure this key is saved within a text file in a secure location on the filesystem. Users are responsible for providing their keys to use this service. In addition, also refer to your Azure OpenAI service to obtain the service endpoint and region. The Azure OpenAI version can also be changed if required.

Output Specifications

Results (the answer from the LLM) are printed by default to the output window.

Temperature (numeric stepper, default 0, max 1): temperature for an LLM affects its abiity to predict the next word while generating responses. A rule of thumb is that a temperature closer to 0 indicates the model uses the predicted next word with the highest probability and provides stable responses, whereas a temperature of 1 increases the randomness with which the model predicts the next word which may lead to more creative responses.
Context size (numeric stepper, default 10): select how many similar results from the vector store should be retrieved and provided as context to the LLM. Note that a higher number results in more tokens provided as part of the prompt.
Output table (output port, option): attach either a CAS table or sas7bdat to the output port of this node to hold results. These results contain the LLM's answer, the original question and supporting retrieved results.

Run-time Control

Note: Run-time control is optional. You may choose whether to execute the main code of this step or not, based on upstream conditions set by earlier SAS programs. This includes nodes run prior to this custom step earlier in a SAS Studio Flow, or a previous program in the same session.

Refer this blog (https://communities.sas.com/t5/SAS-Communities-Library/Switch-on-switch-off-run-time-control-of-SAS-Studio-Custom-Steps/ta-p/885526) for more details on the concept.

The following macro variable,

_aor_run_trigger

will initialize with a value of 1 by default, indicating an "enabled" status and allowing the custom step to run.

If you wish to control execution of this custom step, include code in an upstream SAS program to set this variable to 0. This "disables" execution of the custom step.

To "disable" this step, run the following code upstream:

%global _aor_run_trigger;
%let _aor_run_trigger =0;

To "enable" this step again, run the following (it's assumed that this has already been set as a global variable):

%let _aor_run_trigger =1;

IMPORTANT: Be aware that disabling this step means that none of its main execution code will run, and any downstream code which was dependent on this code may fail. Change this setting only if it aligns with the objective of your SAS Studio program.

Documentation

Azure OpenAI service
Documentation for the chromadb Python package
Documentation for the "Vector Databases - Hydrate Chroma DB collection" SAS Studio Custom Step
An important note regarding sqlite
SAS Communities article on configuring Viya for Python integration
The SAS Viya Platform Deployment Guide (refer to SAS Configurator for Open Source within)
Options for persistent clients and client connections in Chroma
Langchain Python documentation
OpenAI API versions change periodically. Keep track of them here

SAS Program

Refer here for the SAS program used by the step. You'd find this useful for situations where you wish to execute this step through non-SAS Studio Custom Step interfaces such as the SAS Extension for Visual Studio Code, with minor modifications.

Installation & Usage

Refer to the steps listed here.

Created/contact:

Samiul Haque (samiul.haque@sas.com)
Sundaresh Sankaran (sundaresh.sankaran@sas.com)

Change Log

Version 1.3.3 (14NOV2024)
- Fix Python code creation by Renato
- Update Azure OpenAI GA version to 2024-10-21
Version 1.3.1 (10JUL2024)
- Added option for load from SAS dataset
- README / About tab minor edits
Version 1.2.1 (25JUN2024)
- Added option for load from pandas DataFrame
- Patched Azure OpenAI version - surfaced to UI
Version 1.0.1 (10JUN2024)
- Bug fix - missing import for folder upload
Version 1.0 (15MAY2024)
- Initial version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLM - Azure OpenAI-based Retrieval Augmented Generation (RAG)

A general idea

Table of Contents

Assumptions

Requirements

Parameters

Input Parameters

Configuration

Output Specifications

Run-time Control

Documentation

SAS Program

Installation & Usage

Created/contact:

Change Log

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLM - Azure OpenAI-based Retrieval Augmented Generation (RAG)

A general idea

Table of Contents

Assumptions

Requirements

Parameters

Input Parameters

Configuration

Output Specifications

Run-time Control

Documentation

SAS Program

Installation & Usage

Created/contact:

Change Log