-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
To load single DPAS B matrix instead of two per 2D block io instructi…
…on from the transposed memory (#2628) To load single DPAS B matrix per 2D block io instruction from the column major matrix in memory gets better performance for flash attention. Because unlike the row major matrix, the values, which includes more than one DPAS B operands returned by a single 2D transposed block IO, cannot be used as DPAS operands directly. We have to shuffle the value in the register before pass it to the DPAS instruction and this is not optimized by the IGC for now.
- Loading branch information
1 parent
9952acf
commit ee755e8
Showing
3 changed files
with
35 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters