There are a number of environment variables that can be used to configure the Azure OpenAI API Simulator.
Additionally, some configuration can be changed while the simulator is running using the config endpoint.
When running the Azure OpenAI API Simulator, there are a number of environment variables to configure:
Variable | Description |
---|---|
SIMULATOR_MODE |
The mode the simulator should run in. Current options are record , replay , and generate . |
SIMULATOR_API_KEY |
The API key used by the simulator to authenticate requests. If not specified a key is auto-generated (see the logs). It is recommended to set a deterministic key value in .env |
RECORDING_DIR |
The directory to store the recorded requests and responses (defaults to .recording ). |
OPENAI_DEPLOYMENT_CONFIG_PATH |
The path to a JSON file that contains the deployment configuration. See OpenAI Rate-Limiting |
ALLOW_UNDEFINED_OPENAI_DEPLOYMENTS |
If set to True (default), the simulator will generate OpenAI responses for any deployment. If set to False , the simulator will only generate responses for known deployments. |
AZURE_OPENAI_ENDPOINT |
The endpoint for the Azure OpenAI service, e.g. https://mysvc.openai.azure.com/ . Used by the simulator when forwarding requests. |
AZURE_OPENAI_KEY |
The API key for the Azure OpenAI service. Used by the simulator when forwarding requests |
AZURE_OPENAI_DEPLOYMENT |
The deployment name for your GPT model. Used by the simulator when forwarding requests. |
AZURE_OPENAI_EMBEDDING_DEPLOYMENT |
The deployment name for your embedding model. Used by the simulator when forwarding requests. |
AZURE_OPENAI_IMAGE_DEPLOYMENT |
The deployment name for your image generation model. Used by the simulator when forwarding requests. |
LOG_LEVEL |
The log level for the simulator. Defaults to INFO . |
LATENCY_OPENAI_* |
The latency to add to the OpenAI service when using generated output. See Latency for more details. |
RECORDING_AUTOSAVE |
If set to True (default), the simulator will save the recording after each request (see Large Recordings). |
EXTENSION_PATH |
The path to a Python file that contains the extension configuration. This can be a single python file or a package folder - see Extending the simulator |
There are also a set of environment variables that the test clients and tests will use. These are used to "point" the test clients at the a deployment of the simulator (local, or in Azure).
Variable | Description |
---|---|
TEST_OPENAI_ENDPOINT |
Used by test client code only. Defines the OpenAI-like endpoint that the test client will call. Most likely set to the location of your similator deployment. |
TEST_OPENAI_KEY |
Used by test client code. only. Defines the key that will be set to the TEST_OPENAI_ENDPOINT when making requests. Most likely set to the value of SIMULATOR_API_KEY . |
TEST_OPENAI_DEPLOYMENT |
Used by test client code only. Defines the GPT model deployment that the test client will request. |
TEST_OPENAI_EMBEDDING_DEPLOYMENT |
Used by test client code only. Defines the embedding model deployment that the test client will request. |
TEST_OPENAI_IMAGE_DEPLOYMENT |
Used by test client code only. Defines the image generation model deployment that the test client will request. |
You can set the environment variables in the shell before running the simulator, or on the command line before running commands.
However, when running the Azure OpenAI API Simulator locally you may find it more convinient to set them via a .env
file in the root directory.
The file sample.env
lives in the root of this repository, and provides a starting point for the environment variables you may want to set. Copy this file, rename the copy to .env
, and update the values as needed.
The .http
files for testing the endpoints also use the .env
file to set the environment variables for calling the API.
Note: when running the simulator it will auto-generate an API Key. This needs to be passed to the API when making requests. To avoid the API Key changing each time the simulator is run, set the
SIMULATOR_API_KEY
environment variable to a fixed value.
There are a number of environment variables that specify API endpoints. Each of these environment variables is named ending _ENDPOINT
. For all such environment variables the format is scheme://fqdn
or scheme://fqdn:port
. e.g. http://localhost:5000
or https://example.openai.azure.com
. You should not include a trailing forward slash in the value of the environment variable.
When running in record
mode, the simulator captures the duration of the forwarded response.
This is stored in the recording file and used to add latency to requests in replay
mode.
When running in generate
mode, the simulator can add latency to the response based on the LATENCY_OPENAI_*
environment variables.
Variable Prefix | Description |
---|---|
LATENCY_OPENAI_EMBEDDINGS |
Speficy the latency to add to embeddings requests in milliseconds using LATENCY_OPENAI_EMBEDDINGS_MEAN and LATENCY_OPENAI_EMBEDDINGS_STD_DEV |
LATENCY_OPENAI_COMPLETIONS |
Specify the latency to add to completions per completion token in milliseconds using LATENCY_OPEN_AI_COMPLETIONS_MEAN and LATENCY_OPEN_AI_COMPLETIONS_STD_DEV |
LATENCY_OPENAI_CHAT_COMPLETIONS |
Specify the latency to add to chat completions per completion token in milliseconds using LATENCY_OPEN_AI_CHAT_COMPLETIONS_MEAN and LATENCY_OPEN_AI_CHAT_COMPLETIONS_STD_DEV |
LATENCY_OPENAI_TRANSLATIONS |
Specify the latency to add to translations per MB of audio in milliseconds using LATENCY_OPEN_AI_TRANSLATIONS_MEAN and LATENCY_OPEN_AI_TRANSLATIONS_STD_DEV |
The default values are:
Prefix | Mean | Std Dev |
---|---|---|
LATENCY_OPENAI_EMBEDDINGS |
100 | 30 |
LATENCY_OPENAI_COMPLETIONS |
15 | 2 |
LATENCY_OPENAI_CHAT_COMPLETIONS |
19 | 6 |
LATENCY_OPENAI_TRANSLATIONS |
15000 | 0.5 |
The simulator contains built-in rate limiting for OpenAI endpoints but this is still being refined.
The current implementation is a combination of token- and request-based rate-limiting.
To control the rate-limiting, set the OPENAI_DEPLOYMENT_CONFIG_PATH
environment variable to the path to a JSON config file that defines the deployments and associated models and token limits. An example config file is shown below.
{
"deployment1": {
"model": "gpt-3.5-turbo",
"tokensPerMinute": 60000
},
"gpt-35-turbo-2k-token": {
"model": "gpt-3.5-turbo",
"tokensPerMinute": 2000
},
"gpt-35-turbo-1k-token": {
"model": "gpt-3.5-turbo",
"tokensPerMinute": 1000
}
}
The simulator supports a set of basic Open Telemetry configuration options. These are:
Variable | Description |
---|---|
OTEL_SERVICE_NAME |
Sets the value of the service name reported to Open Telemetry. Defaults to aoai-api-simulator |
OTEL_METRIC_EXPORT_INTERVAL |
The time interval (in milliseconds) between the start of two export attempts.. |
The simulator exposes a /++/config
endpoint that returns the current configuration of the simulator and allow the configuration to be updated dynamically.
This can be useful when you want to test how your application adapts to changing behaviour of the OpenAI endpoints.
A GET
request to this endpoint will return a JSON object with the current configuration:
{
"simulator_mode": "generate",
"latency": {
"open_ai_embeddings": { "mean": 100.0, "std_dev": 30.0 },
"open_ai_completions": { "mean": 15.0, "std_dev": 2.0 },
"open_ai_chat_completions": { "mean": 19.0, "std_dev": 6.0 }
},
"openai_deployments": {
"deployment1": { "tokens_per_minute": 60000, "model": "gpt-3.5-turbo" },
"gpt-35-turbo-1k-token": {
"tokens_per_minute": 1000,
"model": "gpt-3.5-turbo"
}
}
}
A PATCH
request can be used to update the configuration
The body of the request should be a JSON object with the configuration values to update.
For example, the following request will update the mean latency for OpenAI embeddings to 1 second (1000ms):
{ "latency": { "open_ai_embeddings": { "mean": 1000 } } }