From df57cb20f5ce90c86e9e8de4cdfd54fb4d027046 Mon Sep 17 00:00:00 2001 From: Duyi-Wang Date: Wed, 31 Jul 2024 15:26:46 +0800 Subject: [PATCH] [Version] v1.8.1. (#30) --- CHANGELOG.md | 12 ++++++++++++ VERSION | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 86d3122f..81592f69 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,16 @@ # CHANGELOG +# [Version v1.8.1](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.1) +v1.8.1 + +## Functionality +- Expose the interface of embedding lookup. + +## Performance +- Optimized the performance of grouped query attention (GQA). +- Enhanced the performance of creating keys for the oneDNN primitive cache. +- Set the [bs][nh][seq][hs] layout as the default for KV Cache, resulting in better performance. +- Improved the task split imbalance issue in self-attention. + # [Version v1.8.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0) v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support. diff --git a/VERSION b/VERSION index afa2b351..b9268dae 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.8.0 \ No newline at end of file +1.8.1 \ No newline at end of file