Releases · ray-project/llmperf

The latest version of LLMPerf brings a suite of significant updates designed to provide more in-depth and customizable benchmarking capabilities for LLM inference. These updates include:

Expanded metrics with quantile distribution (P25-99): Comprehensive data representation for deeper insights.
Customizable benchmarking parameters: Tailor parameters to fit specific use case scenarios.
Introduction of load test and correctness test: Assessing performance and accuracy under stress.
Broad compatibility: Supports a range of products including Anyscale Endpoints, OpenAI, Anthropic, together.ai, Fireworks.ai, Perplexity, Huggingface, Lepton AI, and various APIs supported by the LiteLLM project).
Easy addition of new LLMs via the LLMClient API.

The old LLMPerf code base can be found in the llmperf-legacy repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ray-project/llmperf

LLMPerf-v2.0