diff --git a/CHANGELOG.md b/CHANGELOG.md index 86d3122f..81592f69 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,16 @@ # CHANGELOG +# [Version v1.8.1](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.1) +v1.8.1 + +## Functionality +- Expose the interface of embedding lookup. + +## Performance +- Optimized the performance of grouped query attention (GQA). +- Enhanced the performance of creating keys for the oneDNN primitive cache. +- Set the [bs][nh][seq][hs] layout as the default for KV Cache, resulting in better performance. +- Improved the task split imbalance issue in self-attention. + # [Version v1.8.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0) v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support. diff --git a/VERSION b/VERSION index afa2b351..b9268dae 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.8.0 \ No newline at end of file +1.8.1 \ No newline at end of file