Skip to content

v1.8.1

Compare
Choose a tag to compare
@Duyi-Wang Duyi-Wang released this 31 Jul 08:08
· 16 commits to main since this release
df57cb2

v1.8.1

Functionality

  • Expose the interface of embedding lookup.

Performance

  • Optimized the performance of grouped query attention (GQA).
  • Enhanced the performance of creating keys for the oneDNN primitive cache.
  • Set the [bs][nh][seq][hs] layout as the default for KV Cache, resulting in better performance.
  • Improved the task split imbalance issue in self-attention.