This repository has been archived by the owner on May 28, 2024. It is now read-only.
v0.1.0
What's Changed
- Ray Serve-native continuous batching support through Hugging Face text-generation-inference models
- Fixed exceptions when frontend is deployed with non-default port
Note: This update breaks existing APIs and requires changes to model config YAMLs
Full Changelog: v0.0.3...v0.1.0