Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable AMD BF16 Grouped Gemm #3526

Closed
wants to merge 2 commits into from
Closed

Conversation

jwfromm
Copy link
Contributor

@jwfromm jwfromm commented Dec 22, 2024

Summary: Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862

Copy link

netlify bot commented Dec 22, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 92e2946
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/677444d98b3c7e00083128d3
😎 Deploy Preview https://deploy-preview-3526--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
Summary:
This diff cleans up some of the APIs for FBGEMM grouped gemm and updates CUTLASS bf16 grouped gemm to use a single kernel launch to initialize gemm arguments. This should help reduce overhead a bit.

The only notable API change exposed to the user is that all grouped gemm functions now return lists of outputs where bf16 previously returned a single tensor blob. This does mean that in some cases we'll have to do an extra `torch.stack` to unify the groups. If this turns out to be costly, I think we can instead have two grouped gemm implementations, one for dynamic (which returns a single tensor) and one for static shapes which returns a list of tensors.

Differential Revision: D67423469

Reviewed By: jianyuh
jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Differential Revision: D67261862

Reviewed By: zjing14
jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Dec 31, 2024
Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

Summary:
Pull Request resolved: pytorch#3526

X-link: facebookresearch/FBGEMM#608

Implementation of CK based BF16 Grouped Gemm. Currently performance is quite poor :(

Reviewed By: zjing14

Differential Revision: D67261862
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67261862

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 4c0d4f7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants