azure-llm-load-test

This tool is designed to load test large language models running in different runtimes / behind different Azure APIs.

Features

Generate a custom testing dataset tailored to settings similar to your use case.
Load test with customizable concurrency, duration, and warmup options.
Support for various Azure AI models, including Azure OAI, Azure OAI Embedding, Azure Model Catalog serverless and managed-compute.
Detailed results on throughput and latency statistics in JSON format.
Modular design, enabling developers to integrate new endpoints with minimal effort.
UI Support.
Batch support for embedding endpoints.

Limitations

We are using cl100k_base encoder from "tiktoken" package to count tokens. To change the token encoder, you can modify the function "num_tokens_from_string" in "plugins/plugin.py"

Supported Azure Endpoints

Azure Model as a Platform (Managed Compute).
Azure Model as a Service (Serverless API).
Azure-OpenAI.
Azure-OpenAI-Embeddings.
Cohere-Embeddings.

Requirements

Python 3.9 or newer

Usage

1- Generate Dataset (Optional):

python datasets/generate_dataset.py --tok_input_length 250 --tok_output_length 50 --N 100 --output_file datasets/random_text_dataset.jsonl

--tok_input_length: The length of the input. minimum 25.
--tok_output_length: The length of the output.
--N: The number of samples to generate.
--output_file: The name of the output file (default is random_text_dataset.jsonl).

2- Running the Tool:

usage: load_test.py [-h] [-c CONFIG] [-log {warn,warning,info,debug}]

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        config YAML file name
  -log {warn,warning,info,debug}, --log_level {warn,warning,info,debug}
                        Provide logging level. Example --log_level debug, default=warning

Configuration Options

The tool's behavior can be customized using a YAML configuration file. Take a look at config.yaml for an example.

Output

format: Specifies the format of the output file. Currently supports "json". Future support for "pickle" may be added.
dir: Directory where the output file will be saved.
file: Name of the output file.

Warmup

warmup: Boolean flag to enable or disable warmup requests before the actual load test.
warmup_options:
- requests: Number of warmup requests to send.
- timeout_sec: Timeout for each warmup request in seconds.

Storage

type: Specifies the type of storage. Currently, only "local" is supported. Future support for other storage types is planned.

Dataset

file: Path to the dataset file in JSONL format. You can also auto-generate a dataset using datasets/generate_dataset.py.
max_queries: Maximum number of queries to use from the dataset.
min_input_tokens: Minimum number of input tokens per query.
max_input_tokens: Maximum number of input tokens per query.
max_output_tokens: Maximum number of output tokens per query.
max_sequence_tokens: Maximum number of tokens in a sequence.

Load Options

type: Type of load pattern. Currently supports "constant". Future options include "loadgen" and "stair-step".
concurrency: Number of concurrent requests.
duration: Duration of the load test in seconds. Future support for different time formats like "100s", "10m", etc., is planned.

Plugins and Endpoint Types

Azure Model as a Platform (Managed Compute)

plugin: "azure_maap_plugin"
plugin_options:
- url: Endpoint URL for the Azure MaaP service.
- key: API key for authentication.
- deployment: Deployment name.

Azure Model as a Service (Serverless API)

plugin: "azure_serverless_plugin"
plugin_options:
- streaming: Boolean flag to enable or disable streaming.
- url: Endpoint URL for the Azure Serverless service. Don't forget to include /v1/chat/completions.
- model_name: Name of the model.
- key: API key for authentication.

Azure-OpenAI

plugin: "azure_openai_plugin"
plugin_options:
- streaming: Boolean flag to enable or disable streaming.
- url: Endpoint URL for the Azure-OpenAI service.
- key: API key for authentication.
- deployment: Deployment name.
- api_version: API version to use.

Azure-OpenAI-Embeddings

plugin: "azure_oai_embeddings_plugin"
plugin_options:
- url: Endpoint URL for the Azure-OpenAI Embeddings service.
- key: API key for authentication.
- deployment: Deployment name.
- api_version: API version to use.

Results:

The tool will produce a results summary logged to stdout, and detailed test results along with its summary in json format. The json output will have following:

Array of results with one element per request sent during the test.
Detailed summary report of the run.
All the config metadata related to run.

For example:

"results": [
...
  ],
  "config": {
...
    "load_options": {
      "type": "constant",
      "concurrency": 8,
      "duration": 20
...
  },
  "summary": {
    "output_tokens_throughput": 159.25729928295627,
    "input_tokens_throughput": 1592.5729928295625,
    "full_duration": 20.093270540237427,
    "total_requests": 16,
    "complete_request_per_sec": 0.79,  # number of competed requests / full_duration 
    "total_failures": 0,
    "failure_rate": 0.0


    #time per ouput_token
    "tpot": { 
      "min": 0.010512285232543946,
      "max": 0.018693844079971312,
      "median": 0.01216195583343506,
      "mean": 0.012808671338217597,
      "percentile_80": 0.012455177783966065,
      "percentile_90": 0.01592913103103638,
      "percentile_95": 0.017840550780296324,
      "percentile_99": 0.018523185420036312
    },
     #time to first token
    "ttft": {
      "min": 0.4043765068054199,
      "max": 0.5446293354034424,
      "median": 0.46433258056640625,
      "mean": 0.4660029411315918,
      "percentile_80": 0.51033935546875,
      "percentile_90": 0.5210948467254639,
      "percentile_95": 0.5295632600784301,
      "percentile_99": 0.54161612033844
    },
    #input token latency
    "itl": { 
      "min": 0.008117493672586566,
      "max": 0.01664590356337964,
      "median": 0.009861880810416522,
      "mean": 0.010531313198552402,
      "percentile_80": 0.010261738599844314,
      "percentile_90": 0.013813444118403915,
      "percentile_95": 0.015781731761280615,
      "percentile_99": 0.016473069202959836
    },
    #time to ack
    "tt_ack": { 
      "min": 0.404374361038208,
      "max": 0.544623851776123,
      "median": 0.464330792427063,
      "mean": 0.46600091457366943,
      "percentile_80": 0.5103373527526855,
      "percentile_90": 0.5210925340652466,
      "percentile_95": 0.5295597910881042,
      "percentile_99": 0.5416110396385193
    },
    "response_time": {
      "min": 2.102457046508789,
      "max": 3.7387688159942627,
      "median": 2.3843793869018555,
      "mean": 2.5091602653265,
      "percentile_80": 2.4795608520507812,
      "percentile_90": 2.992232322692871,
      "percentile_95": 3.541854977607727,
      "percentile_99": 3.6993860483169554
    },
    "output_tokens": {
      "min": 200,
      "max": 200,
      "median": 200.0,
      "mean": 200.0,
      "percentile_80": 200.0,
      "percentile_90": 200.0,
      "percentile_95": 200.0,
      "percentile_99": 200.0
    },
    "input_tokens": {
      "min": 2000,
      "max": 2000,
      "median": 2000.0,
      "mean": 2000.0,
      "percentile_80": 2000.0,
      "percentile_90": 2000.0,
      "percentile_95": 2000.0,
      "percentile_99": 2000.0
    },
    
  }
}

Contributing

Contributions to this tool are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
build		build
ci/azure		ci/azure
datasets		datasets
plugins		plugins
.flake8		.flake8
.gitignore		.gitignore
COPYING		COPYING
README.md		README.md
conda.yaml		conda.yaml
config.yaml		config.yaml
dataset.py		dataset.py
generation_pb2.py		generation_pb2.py
generation_pb2_grpc.py		generation_pb2_grpc.py
linter.sh		linter.sh
load_test.py		load_test.py
logging_utils.py		logging_utils.py
requirements.txt		requirements.txt
result.py		result.py
s3storage.py		s3storage.py
user.py		user.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

azure-llm-load-test

Features

Limitations

Supported Azure Endpoints

Requirements

Usage

Configuration Options

Output

Warmup

Storage

Dataset

Load Options

Plugins and Endpoint Types

Azure Model as a Platform (Managed Compute)

Azure Model as a Service (Serverless API)

Azure-OpenAI

Azure-OpenAI-Embeddings

Results:

Contributing

About

Releases

Packages

Languages

License

maljazaery/llm-load-test-azure

Folders and files

Latest commit

History

Repository files navigation

azure-llm-load-test

Features

Limitations

Supported Azure Endpoints

Requirements

Usage

Configuration Options

Output

Warmup

Storage

Dataset

Load Options

Plugins and Endpoint Types

Azure Model as a Platform (Managed Compute)

Azure Model as a Service (Serverless API)

Azure-OpenAI

Azure-OpenAI-Embeddings

Results:

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages