Add MultiheadAttention to DirectMLX #600

PatriceVignola · 2024-06-24T19:21:39Z

No description provided.

fdwr · 2024-06-25T01:41:31Z

Libraries/DirectMLX.h

+ Optional<Expression> outputPresentValue;
+ };
+
+ inline MultiHeadAttentionOutputs MultiHeadAttention(


inline MultiheadAttentionOutputs MultiheadAttention(

Multihead is a single word (https://en.wiktionary.org/wiki/multihead / https://www.merriam-webster.com/dictionary/multiheaded), consistent with our enum DML_MULTIHEAD_ATTENTION_OPERATOR_DESC and with PyTorch (https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html). The people using hyphens don't know better that "multi" is a prefix :b.

fdwr · 2024-06-25T01:44:48Z

Libraries/DirectMLX.h

+ detail::GraphBuilder* builder = nullptr;
+
+ if (query)
+ {


(minor 🤷) assert(stackedKeyValue || (key && value)); for assertion consistency with the other branches?

fdwr · 2024-06-25T01:53:40Z

Libraries/DirectMLX.h

+ {
+ assert(stackedQueryKeyValue);
+ assert(stackedQueryKeyValueTensor.sizes.size() >= 5);
+ batchSize = stackedQueryKeyValueTensor.sizes[stackedQueryKeyValueTensor.sizes.size() - 5];


stackedQueryKeyValueTensor.sizes.size() - 5

I worry about callers using DMLX and directly populating tensors from some model description, and then DMLX accessing invalid negative indices here because the tensor size is too small, especially if that model description comes from external data that is not completely under the program's control. We could say that it's the responsibility of the caller to validate all these sizes up-front before calling DMLX, but even DML validates tensor sizes before accessing any potentially invalid indices. Can we strengthen these mere asserts which only happen in debug builds to an std::invalid_argument instead?

e.g.

DMLX_THROW_IF_NOT(stackedQueryKeyValueTensor.sizes.size() >= 5, std::invalid_argument); batchSize = stackedQueryKeyValueTensor.sizes[stackedQueryKeyValueTensor.sizes.size() - 5];

#if __cpp_exceptions #if DMLX_USE_WIL #define DMLX_THROW_IF_FAILED(_hr) THROW_IF_FAILED(_hr) #define DMLX_THROW(_hr) THROW_HR(_hr) #define DMLX_THROW_IF_NOT(condition, exceptionType) if (!(condition)) { throw exceptionType; } #else #define DMLX_THROW_IF_FAILED(_hr) if (FAILED(_hr)) { throw std::runtime_error(#_hr); } #define DMLX_THROW(_hr) throw std::runtime_error(#_hr); #define DMLX_THROW_IF_NOT(condition, exceptionType) if (!(condition)) { throw exceptionType; } #endif #else #define DMLX_THROW_IF_FAILED(_hr) if (FAILED(_hr)) { std::abort(); } #define DMLX_THROW(_hr) { std::abort(); } #define DMLX_THROW_IF_NOT(condition, exceptionType) { std::abort(); } #endif

I'm not proposing we turn every assert into an exception, as DML API validation will validate things too, and we don't need to doubly validate in DMLX, but at least to validate the cases where DMLX itself would access invalid memory.

fdwr

3 comments. LGTM otherwise.

PatriceVignola added 5 commits June 17, 2024 00:21

WIP

0ce2968

WIP

99da23e

WIP

6f4d675

WIP

b931d8d

WIP

e67c4f8

PatriceVignola requested a review from fdwr June 24, 2024 19:21

fdwr reviewed Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiheadAttention to DirectMLX #600

Add MultiheadAttention to DirectMLX #600

PatriceVignola commented Jun 24, 2024

fdwr Jun 25, 2024 •

edited

Loading

fdwr Jun 25, 2024

fdwr Jun 25, 2024 •

edited

Loading

fdwr left a comment •

edited

Loading

Add MultiheadAttention to DirectMLX #600

Are you sure you want to change the base?

Add MultiheadAttention to DirectMLX #600

Conversation

PatriceVignola commented Jun 24, 2024

fdwr Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

fdwr Jun 25, 2024

Choose a reason for hiding this comment

fdwr Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

fdwr left a comment • edited Loading

Choose a reason for hiding this comment

fdwr Jun 25, 2024 •

edited

Loading

fdwr Jun 25, 2024 •

edited

Loading

fdwr left a comment •

edited

Loading