Skip to content

v1.8.2

Latest
Compare
Choose a tag to compare
@Duyi-Wang Duyi-Wang released this 10 Oct 08:17
· 2 commits to main since this release
b43edc8

v1.8.2

Performance

  • Enable flash attention by default for W8A8 dtype to accelerate the performance of the 1st token.

Benchmark

  • When the number of ranks is 1, run in single mode to avoid the dependency on mpirun.
  • Support SNC-3 platform.