Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for tritongpu.upcast_mxfp operation #2700

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

etiotto
Copy link
Contributor

@etiotto etiotto commented Nov 13, 2024

Add initial support for the new tritongpu upcast_mxfp operation.

Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
@etiotto etiotto self-assigned this Nov 13, 2024
@etiotto etiotto linked an issue Nov 13, 2024 that may be closed by this pull request
@etiotto
Copy link
Contributor Author

etiotto commented Nov 13, 2024

Note: Merging upstream to 1cf7b1b31cde8c62611e421becd4648c7284d76c should make this PR smaller (changes to NVidia and AMD implementation of upcast_mxfp would be coming in from the merge).

// standalone values and returns them as a pair for (high 4 bits, low 4 bits).
std::pair<Value, Value> convertMxfp4x2ToBf16x2(RewriterBase &rewriter,
Location loc, Value v);
// Convert each value, which is an int8 containing 2 packed mxfp4 values,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is identical to upstream code as of commit 1cf7b1b31cde8c62611e421becd4648c7284d76c


return {v0, v1};
}
SmallVector<Value> convertMxfp4x2ToBf16x2(RewriterBase &rewriter, Location loc,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is identical to upstream code as of commit 1cf7b1b31cde8c62611e421becd4648c7284d76c

@@ -19,17 +19,6 @@ using namespace mlir::triton::gpu;

namespace {

Value mxfpScaleBf16(RewriterBase &rewriter, Location loc, Value v,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is identical to upstream code as of commit 1cf7b1b31cde8c62611e421becd4648c7284d76c

@@ -30,47 +30,6 @@ class UpcastMXFPOpPattern : public ConvertOpToLLVMPattern<UpcastMXFPOp> {
: ConvertOpToLLVMPattern<UpcastMXFPOp>(typeConverter, benefit),
targetInfo(targetInfo) {}

llvm::SmallVector<Value> unpackFP4Elements(Location loc,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1cf7b1b31cde8c62611e421becd4648c7284d76c

@etiotto etiotto marked this pull request as ready for review November 14, 2024 14:12
@whitneywhtsang
Copy link
Contributor

Note: Merging upstream to 1cf7b1b31cde8c62611e421becd4648c7284d76c should make this PR smaller (changes to NVidia and AMD implementation of upcast_mxfp would be coming in from the merge).

Merging in #2707.

Copy link
Contributor

@victor-eds victor-eds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is just copying, LGTM. As in previous cases we found relying on logical bitwise operations for this kind of operations was slower, does it make sense to have a ticket to change the code in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement support for TritonGPU::UpcastMXFPOp for Intel XPU BE
3 participants