v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.

Duyi-Wang released this 18 Jun 05:07

· 43 commits to main since this release

v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.

Functionality

Add continuous batching support of Qwen 1.0 models.
Enable hybrid data types for continuous batching feature, including BF16_FP16, BF16_INT8, BF16_W8A8, BF16_INT4, BF16_NF4, W8A8_INT8, W8A8_int4, W8A8_NF4.

BUG fix

Fixed the convert fault in Baichuan1 models.

What's Changed

Generated release nots

[Doc] Add vllm benchmark docs. by @marvin-Yu in #448
[Kernel] Add GPU kernels and enable LLaMA model. by @changqi1 in #372
[Tools] Add Baichuan1/2 convert tool by @abenmao in #451
[Layers] Add qwenRope support for Qwen1.0 in CB mode by @abenmao in #449
[Framework] Remove duplicated code by @xiangzez in #450
[Model] Support hybrid model in continuous batching. by @Duyi-Wang in #453
[Version] v1.7.2. by @Duyi-Wang in #454

Full Changelog: v1.7.1...v1.7.2

Contributors

marvin-Yu, abenmao, and 3 other contributors

Assets 2