Skip to content

Commit

Permalink
Enable AMD BF16 Grouped Gemm (#3526)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
  • Loading branch information
jwfromm authored and facebook-github-bot committed Dec 31, 2024
1 parent b53af65 commit df2c145
Show file tree
Hide file tree
Showing 63 changed files with 5,019 additions and 12 deletions.
6 changes: 3 additions & 3 deletions fbgemm_gpu/experimental/gen_ai/bench/quantize_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -598,7 +598,7 @@ def quantize_fixed_nk(self, x, w):
return (
x,
w,
torch.tensor(m_values).to(dtype=torch.int32, device=x[0].device),
torch.tensor(m_values).to(dtype=torch.int64, device=x[0].device),
output,
)

Expand All @@ -622,7 +622,7 @@ def quantize(self, x, w):
m_values = None
return x, w, m_values, output

def compute(self, x, w, m_values, output):
def compute(self, x, w, m_values, _):
return torch.ops.fbgemm.bf16bf16bf16_grouped(
x,
w,
Expand All @@ -642,7 +642,7 @@ def name(self) -> str:

@property
def hip(self) -> bool:
return False
return True

@property
def cuda(self) -> bool:
Expand Down
Loading

0 comments on commit df2c145

Please sign in to comment.