v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.
v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.
Functionality
- Add continuous batching support of Qwen 1.0 models.
- Enable hybrid data types for continuous batching feature, including
BF16_FP16, BF16_INT8, BF16_W8A8, BF16_INT4, BF16_NF4, W8A8_INT8, W8A8_int4, W8A8_NF4
.
BUG fix
- Fixed the convert fault in Baichuan1 models.
What's Changed
Generated release nots
- [Doc] Add vllm benchmark docs. by @marvin-Yu in #448
- [Kernel] Add GPU kernels and enable LLaMA model. by @changqi1 in #372
- [Tools] Add Baichuan1/2 convert tool by @abenmao in #451
- [Layers] Add qwenRope support for Qwen1.0 in CB mode by @abenmao in #449
- [Framework] Remove duplicated code by @xiangzez in #450
- [Model] Support hybrid model in continuous batching. by @Duyi-Wang in #453
- [Version] v1.7.2. by @Duyi-Wang in #454
Full Changelog: v1.7.1...v1.7.2