v1.8.2
Performance
- Enable flash attention by default for
W8A8
dtype to accelerate the performance of the 1st token.
Benchmark
- When the number of ranks is 1, run in single mode to avoid the dependency on
mpirun
. - Support
SNC-3
platform.
v1.8.2
W8A8
dtype to accelerate the performance of the 1st token.mpirun
.SNC-3
platform.