Configuring the Azure OpenAI API Simulator

Configuring the Azure OpenAI API Simulator

There are a number of environment variables that can be used to configure the Azure OpenAI API Simulator.

Additionally, some configuration can be changed while the simulator is running using the config endpoint.

Environment Variables

When running the Azure OpenAI API Simulator, there are a number of environment variables to configure:

Variable	Description
`SIMULATOR_MODE`	The mode the simulator should run in. Current options are `record`, `replay`, and `generate`.
`SIMULATOR_API_KEY`	The API key used by the simulator to authenticate requests. If not specified a key is auto-generated (see the logs). It is recommended to set a deterministic key value in `.env`
`RECORDING_DIR`	The directory to store the recorded requests and responses (defaults to `.recording`).
`OPENAI_DEPLOYMENT_CONFIG_PATH`	The path to a JSON file that contains the deployment configuration. See OpenAI Rate-Limiting
`ALLOW_UNDEFINED_OPENAI_DEPLOYMENTS`	If set to `True` (default), the simulator will generate OpenAI responses for any deployment. If set to `False`, the simulator will only generate responses for known deployments.
`AZURE_OPENAI_ENDPOINT`	The endpoint for the Azure OpenAI service, e.g. `https://mysvc.openai.azure.com/`. Used by the simulator when forwarding requests.
`AZURE_OPENAI_KEY`	The API key for the Azure OpenAI service. Used by the simulator when forwarding requests
`AZURE_OPENAI_DEPLOYMENT`	The deployment name for your GPT model. Used by the simulator when forwarding requests.
`AZURE_OPENAI_EMBEDDING_DEPLOYMENT`	The deployment name for your embedding model. Used by the simulator when forwarding requests.
`AZURE_OPENAI_IMAGE_DEPLOYMENT`	The deployment name for your image generation model. Used by the simulator when forwarding requests.
`LOG_LEVEL`	The log level for the simulator. Defaults to `INFO`.
`LATENCY_OPENAI_*`	The latency to add to the OpenAI service when using generated output. See Latency for more details.
`RECORDING_AUTOSAVE`	If set to `True` (default), the simulator will save the recording after each request (see Large Recordings).
`EXTENSION_PATH`	The path to a Python file that contains the extension configuration. This can be a single python file or a package folder - see Extending the simulator

There are also a set of environment variables that the test clients and tests will use. These are used to "point" the test clients at the a deployment of the simulator (local, or in Azure).

Variable	Description
`TEST_OPENAI_ENDPOINT`	Used by test client code only. Defines the OpenAI-like endpoint that the test client will call. Most likely set to the location of your similator deployment.
`TEST_OPENAI_KEY`	Used by test client code. only. Defines the key that will be set to the `TEST_OPENAI_ENDPOINT` when making requests. Most likely set to the value of `SIMULATOR_API_KEY`.
`TEST_OPENAI_DEPLOYMENT`	Used by test client code only. Defines the GPT model deployment that the test client will request.
`TEST_OPENAI_EMBEDDING_DEPLOYMENT`	Used by test client code only. Defines the embedding model deployment that the test client will request.
`TEST_OPENAI_IMAGE_DEPLOYMENT`	Used by test client code only. Defines the image generation model deployment that the test client will request.

Setting Environment Variables via the `.env` File

You can set the environment variables in the shell before running the simulator, or on the command line before running commands.

However, when running the Azure OpenAI API Simulator locally you may find it more convinient to set them via a .env file in the root directory.

The file sample.env lives in the root of this repository, and provides a starting point for the environment variables you may want to set. Copy this file, rename the copy to .env, and update the values as needed.

The .http files for testing the endpoints also use the .env file to set the environment variables for calling the API.

Note: when running the simulator it will auto-generate an API Key. This needs to be passed to the API when making requests. To avoid the API Key changing each time the simulator is run, set the SIMULATOR_API_KEY environment variable to a fixed value.

Configuring Endpoints

There are a number of environment variables that specify API endpoints. Each of these environment variables is named ending _ENDPOINT. For all such environment variables the format is scheme://fqdn or scheme://fqdn:port. e.g. http://localhost:5000 or https://example.openai.azure.com. You should not include a trailing forward slash in the value of the environment variable.

Configuring Latency

When running in record mode, the simulator captures the duration of the forwarded response. This is stored in the recording file and used to add latency to requests in replay mode.

When running in generate mode, the simulator can add latency to the response based on the LATENCY_OPENAI_* environment variables.

Variable Prefix	Description
`LATENCY_OPENAI_EMBEDDINGS`	Speficy the latency to add to embeddings requests in milliseconds using `LATENCY_OPENAI_EMBEDDINGS_MEAN` and `LATENCY_OPENAI_EMBEDDINGS_STD_DEV`
`LATENCY_OPENAI_COMPLETIONS`	Specify the latency to add to completions per completion token in milliseconds using `LATENCY_OPEN_AI_COMPLETIONS_MEAN` and `LATENCY_OPEN_AI_COMPLETIONS_STD_DEV`
`LATENCY_OPENAI_CHAT_COMPLETIONS`	Specify the latency to add to chat completions per completion token in milliseconds using `LATENCY_OPEN_AI_CHAT_COMPLETIONS_MEAN` and `LATENCY_OPEN_AI_CHAT_COMPLETIONS_STD_DEV`
`LATENCY_OPENAI_TRANSLATIONS`	Specify the latency to add to translations per MB of audio in milliseconds using `LATENCY_OPEN_AI_TRANSLATIONS_MEAN` and `LATENCY_OPEN_AI_TRANSLATIONS_STD_DEV`

The default values are:

Prefix	Mean	Std Dev
`LATENCY_OPENAI_EMBEDDINGS`	100	30
`LATENCY_OPENAI_COMPLETIONS`	15	2
`LATENCY_OPENAI_CHAT_COMPLETIONS`	19	6
`LATENCY_OPENAI_TRANSLATIONS`	15000	0.5

Configuring Rate Limiting

The simulator contains built-in rate limiting for OpenAI endpoints but this is still being refined.

The current implementation is a combination of token- and request-based rate-limiting.

To control the rate-limiting, set the OPENAI_DEPLOYMENT_CONFIG_PATH environment variable to the path to a JSON config file that defines the deployments and associated models and token limits. An example config file is shown below.

{
  "deployment1": {
    "model": "gpt-3.5-turbo",
    "tokensPerMinute": 60000
  },
  "gpt-35-turbo-2k-token": {
    "model": "gpt-3.5-turbo",
    "tokensPerMinute": 2000
  },
  "gpt-35-turbo-1k-token": {
    "model": "gpt-3.5-turbo",
    "tokensPerMinute": 1000
  }
}

Open Telemetry Configuration

The simulator supports a set of basic Open Telemetry configuration options. These are:

Variable	Description
`OTEL_SERVICE_NAME`	Sets the value of the service name reported to Open Telemetry. Defaults to `aoai-api-simulator`
`OTEL_METRIC_EXPORT_INTERVAL`	The time interval (in milliseconds) between the start of two export attempts..

Config API Endpoint

The simulator exposes a /++/config endpoint that returns the current configuration of the simulator and allow the configuration to be updated dynamically. This can be useful when you want to test how your application adapts to changing behaviour of the OpenAI endpoints.

A GET request to this endpoint will return a JSON object with the current configuration:

{
  "simulator_mode": "generate",
  "latency": {
    "open_ai_embeddings": { "mean": 100.0, "std_dev": 30.0 },
    "open_ai_completions": { "mean": 15.0, "std_dev": 2.0 },
    "open_ai_chat_completions": { "mean": 19.0, "std_dev": 6.0 }
  },
  "openai_deployments": {
    "deployment1": { "tokens_per_minute": 60000, "model": "gpt-3.5-turbo" },
    "gpt-35-turbo-1k-token": {
      "tokens_per_minute": 1000,
      "model": "gpt-3.5-turbo"
    }
  }
}

A PATCH request can be used to update the configuration The body of the request should be a JSON object with the configuration values to update.

For example, the following request will update the mean latency for OpenAI embeddings to 1 second (1000ms):

{ "latency": { "open_ai_embeddings": { "mean": 1000 } } }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config.md

config.md

Configuring the Azure OpenAI API Simulator

Environment Variables

Setting Environment Variables via the `.env` File

Configuring Endpoints

Configuring Latency

Configuring Rate Limiting

Open Telemetry Configuration

Config API Endpoint

Files

config.md

Latest commit

History

config.md

File metadata and controls

Configuring the Azure OpenAI API Simulator

Environment Variables

Setting Environment Variables via the .env File

Configuring Endpoints

Configuring Latency

Configuring Rate Limiting

Open Telemetry Configuration

Config API Endpoint

Setting Environment Variables via the `.env` File