Releases · ray-project/ray-llm

18 Jan 20:38

sihanwang41

v0.5.0

5255abe

v0.5.0 Latest

Latest

What's Changed

Add tensorrt-llm backend (v0.6.1).
Add embedding backend.
Add Mixtral serve config.
Upgrading vllm support to (v0.2.5)
Upgrading ray to v2.9.1

Thanks for contributions from:
@avnishn
@csivanich
@sihanwang41
@Yard1
@tterrysun

Contributors

csivanich, sihanwang41, and 3 other contributors

Assets 2

28 Oct 00:09

avnishn

v0.4.0

c2a22af

v0.4.0

The following changes are introduced:

Renaming aviary to rayllm.
Support for reading models from gcs in addition to aws s3.
Increased testing for prompting.
New model configs for Falcon 7B and 40B.
Make frontend compatible with Ray Serve 2.7

Thanks for contributions from:
@avnishn
@csivanich
@shrekris-anyscale
@sihanwang41
@richardliaw
@Yard1

Contributors

richardliaw, csivanich, and 4 other contributors

Assets 2

04 Oct 21:33

Yard1

v0.3.1

3a4875b

v0.3.1

What's Changed

Add back LightGPT by @Yard1 in #64
Add Serve config for LightGPT by @shrekris-anyscale in #65
Fix serve config parsing by @gvspraveen in #66
[docs] Update changes to RayLLM by @richardliaw in #63
[docs] Update the application's config to be compatible with v0.3.0 by @YQ-Wang in #67

New Contributors

@gvspraveen made their first contribution in #66
@YQ-Wang made their first contribution in #67

Full Changelog: v0.3.0...v0.3.1

Contributors

gvspraveen, richardliaw, and 3 other contributors

Assets 2

02 Oct 23:25

Yard1

v0.3.0

470a5e2

v0.3.0

Please note that API stability is not expected until 1.0 release. This update introduces breaking changes.

This release introduces a new vLLM backend and removes the dependency on TGI. This is because TGI is not Apache 2.0 licensed anymore, and the new license is too restrictive for most organizations to run in production. On the other hand, vllm is Apache 2.0 licensed and is a better foundation to build on top of. There are some breaking changes to model configuration YAMLs related to the new vLLM backend.

Refer to the updated ray-llm/models/README.md file for details on the updated configuration file format.

What's changed?

Documentation
- Updated readme and documentation
API & SDK
- Updated the format of model configuration yamls.
Backend
- Completely replaced the text-generation-inference based backend with vLLM based backend. This means RayLLM now supports all models vLLM supports.
- Improved observability and metrics.
- Improved testing.

In order to use RayLLM, ensure you are using the official Docker image anyscale/aviary:latest.

Assets 2

04 Aug 03:34

Yard1

v0.2.0

6fe00c9

v0.2.0

What's changed?

Documentation
- Updated readme and documentation
API & SDK
- Full OpenAI API compatibility (Aviary can now be queried with the openai Python package)
  - /v1/completions
    - Parameters not yet supported (will be ignored): suffix, n, logprobs, echo, best_of, logit_bias, user
    - Additional parameters not present in OpenAI API: top_k, typical_p, watermark, seed
  - /v1/chat/completions
    - Parameters not yet supported (will be ignored): n, logprobs, echo, logit_bias, user
    - Additional parameters not present in OpenAI API: top_k, typical_p, watermark, seed
  - /v1/models
  - /v1/models/<MODEL>
- Added frequency_penalty and presence_penalty parameters
- aviary run is now blocking by default and will clarify that rerunning aviary run will remove existing models
- Streamlined model configuration YAMLs
- Added model configuration YAMLs for llama-2
- Frontend Gradio app will now be started on /frontend route to avoid conflicts with backend
- openai package is now a dependency for Aviary
Backend
- Refactor of multiple internal APIs
  - Renamed Predictor to Engine
    - Engine combines the functionality of initializers, predictors and pipelines.
    - Removed Predictor and Pipeline
  - Removed shallow classes and simplified abstractions
  - Removed dead code
  - Broke up large files & improved file structure
- Removal of static batching
- Added OpenAI-style frequency_penalty and presence_penalty parameters
- Fixed generated special tokens not being returned correctly
- Standardization of modelling code on an Apache 2.0 fork of text-generation-inference
- Improved performance and stability
  - Added automatic warmup for supported models, ensuring that memory is used efficiently.
  - Made scheduler and scheduler policy less prone to errors.
- Made sure that HUGGING_FACE_HUB_TOKEN env var is propagated throughout all Aviary Backend processes to allow access to gated models such as llama-2
- Added unit testing for core Aviary components
- Added validations for user supplied parameters
- Improved error handling and reporting
  - Error responses will now have correct status codes
- Added basic observability for tokens & requests through Ray Metrics (piped through to Prometheus/Grafana)

This update introduces breaking changes to model configuration YAMLs and the Aviary SDK. Refer to the migration guide below for more details.

In order to use Aviary backend, ensure you are using the official Docker image anyscale/aviary:latest. Using the backend without Docker is not a supported usecase. anyscale/aviary:latest-tgi image has been superseded by anyscale/aviary:latest.

Migration Guide For Model YAMLs

In the most recent version of Aviary we introduce breaking changes in the model YAMLs. This guide will help you migrate your existing model YAMLs to the new format.

Changes

Move any fields under model_config.initialization to be under model_config
and then remove model_config.initialization.

Then remove the following sections/fields and everything that is under them:
- model_config.initializer
- model_config.pipeline
- model_config.batching

Rename model_config to engine_config.

In v0.2, we introduce Engine, the Aviary abstraction for interacting with a model. In short, Engine combines the functionality of initializers, pipelines, and predictors.

Pipeline and initializer parameters are no longer configurable.
In v0.2 we remove the option to specify static batching and instead do continuous batching by default for performance improvement.
Add the Scheduler and Policy configs.

The scheduler is a component of the engine that determines which requests to run inference on. The policy is a component of the scheduler that determines the scheduling strategy. These components previously existed in Aviary, however they weren't explicitly configurable.

Previously the following parameters were specified under model_config.generation:
- max_batch_total_tokens
- max_total_tokens
- max_waiting_tokens
- max_input_length
- max_batch_prefill_tokens
rename max_waiting_tokens to max_iterations_curr_batch

place these parameters under engine_config.scheduler.policy

for example:
```
engine_config:
  scheduler:
    policy:
      max_iterations_curr_batch: 100
      max_batch_total_tokens: 100000
      max_total_tokens: 100000
      max_input_length: 100
      max_batch_prefill_tokens: 100000
```

Assets 2

28 Jul 18:03

Yard1

v0.1.2

56ab835

v0.1.2

What's Changed

Updated TGI version
Fixed small issues

Full Changelog: v0.1.1...v0.1.2

Assets 2

15 Jul 23:14

Yard1

v0.1.1

4a2fd22

v0.1.1

What's Changed

Performance, reliability and consistency improvements and fixes for continuous batching
Progress on OpenAI API
Execution hooks
Fixed missing Ray Dashboard dependencies in docker images

Note: This update requires changes to model config YAMLs

Full Changelog: v0.1.0...v0.1.1

Assets 2

03 Jul 21:31

Yard1

v0.1.0

bbfe3c5

v0.1.0

What's Changed

Ray Serve-native continuous batching support through Hugging Face text-generation-inference models
Fixed exceptions when frontend is deployed with non-default port

Note: This update breaks existing APIs and requires changes to model config YAMLs

Full Changelog: v0.0.3...v0.1.0

Assets 2

21 Jun 20:57

Yard1

v0.0.3

3da7715

v0.0.3

What's Changed

Added streaming support in both backend and frontend
Aviary now follows the multi-application Ray Serve convention
Refactored parts of SDK (more changes are coming)
Added CI
Minor tweaks to frontend
Added typing-extensions as a dependency to fix import issues on python < 3.9

New Contributors

@kevin85421 made their first contribution in #12
@eltociear made their first contribution in #10

Full Changelog: v0.0.2...v0.0.3

Contributors

kevin85421 and eltociear

Assets 2

03 Jun 02:40

Yard1

v0.0.2

ac62571

v0.0.2

What's Changed

Slimmed down docker, removed unnecessary requirements and fixed the Ray Cluster Launcher configuration file causing an infinite worker node initializing loop - 54d0ebb
Increased maximum input length and reduced batch size - f063f15
Added ability to query OpenAI models through CLI (for comparison purposes) - 8e4e965
Added static news ticker to Gradio app - fe670ae

Full Changelog: v0.0.1...v0.0.2

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

Contributors

What's Changed

New Contributors

Contributors

What's changed?

What's changed?

Migration Guide For Model YAMLs

Changes

What's Changed

What's Changed

What's Changed

What's Changed

New Contributors

Contributors

What's Changed

Releases: ray-project/ray-llm

v0.5.0

What's Changed

Contributors

v0.4.0

Contributors

v0.3.1

What's Changed

New Contributors

Contributors

v0.3.0

What's changed?

v0.2.0

What's changed?

Migration Guide For Model YAMLs

Changes

v0.1.2

What's Changed

v0.1.1

What's Changed

v0.1.0

What's Changed

v0.0.3

What's Changed

New Contributors

Contributors

v0.0.2

What's Changed