New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enable AMD BF16 Grouped Gemm #3526

Closed

jwfromm wants to merge 2 commits into pytorch:main from jwfromm:export-D67261862

Contributor

jwfromm commented Dec 22, 2024

Summary: Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

facebook-github-bot added the cla signed label

netlify bot commented Dec 22, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`92e2946`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/677444d98b3c7e00083128d3
😎 Deploy Preview	https://deploy-preview-3526--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Contributor

facebook-github-bot commented Dec 22, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from 6ae3c7a to 26ab8e8 Compare

December 31, 2024 18:32

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

26ab8e8

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862


          Improve cutlass bf16 grouped gemm

b53af65

Summary:
This diff cleans up some of the APIs for FBGEMM grouped gemm and updates CUTLASS bf16 grouped gemm to use a single kernel launch to initialize gemm arguments. This should help reduce overhead a bit.

The only notable API change exposed to the user is that all grouped gemm functions now return lists of outputs where bf16 previously returned a single tensor blob. This does mean that in some cases we'll have to do an extra `torch.stack` to unify the groups. If this turns out to be costly, I think we can instead have two grouped gemm implementations, one for dynamic (which returns a single tensor) and one for static shapes which returns a list of tensors.

Differential Revision: D67423469

Reviewed By: jianyuh

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

f2823db

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from 26ab8e8 to f2823db Compare

December 31, 2024 18:39

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

1dd3c70

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Differential Revision: D67261862

Reviewed By: zjing14

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

df2c145

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from f2823db to df2c145 Compare

December 31, 2024 18:51

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

825c2cd

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from df2c145 to 825c2cd Compare

December 31, 2024 19:00

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

fe36ce2

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from 825c2cd to fe36ce2 Compare

December 31, 2024 19:12

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862


          Enable AMD BF16 Grouped Gemm (pytorch#3526)

92e2946

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

jwfromm force-pushed the export-D67261862 branch from fe36ce2 to 92e2946 Compare

December 31, 2024 19:24

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request was exported from Phabricator. Differential Revision: D67261862

facebook-github-bot closed this in

4c0d4f7

Contributor

facebook-github-bot commented Dec 31, 2024

This pull request has been merged in 4c0d4f7.

facebook-github-bot added the Merged label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged