[TE/JAX] XLA FFI calls for Softmax and FusedAttnBackward #1319

huanghua1994 · 2024-11-07T21:03:49Z

Description

This PR introduced the following primitives implemented with the new custom calls:

ScaledSoftmaxFwdPrimitive
ScaledSoftmaxBwdPrimitive
ScaledMaskedSoftmaxFwdPrimitive
ScaledMaskedSoftmaxBwdPrimitive
ScaledUpperTriangMaskedSoftmaxFwdPrimitive
ScaledUpperTriangMaskedSoftmaxBwdPrimitive
FusedAttnBwdPrimitive

Also added DequantizeFFI() in transformer_engine/jax/csrc/extensions/quantization.cpp although currently no Python function calls dequantize explicitly.

All C++ functions in transformer_engine/jax/csrc/extensions have FFI after this PR.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

huanghua1994 · 2024-11-07T21:09:18Z

/te-ci jax L1

huanghua1994 · 2024-11-07T21:11:40Z

@zlsh80826 The forward test in test_fused_attn.py is turned on in this commit, and it takes about 15 minutes to run on a single H100. I think since we need to run L1 tests for any update related to fused attention, it is better to test both the forward and backward primitives in CI.

Signed-off-by: Hua Huang <huah@nvidia.com>

FusedAttnBackward passed all testes in test_fused_attn.py. Dequantize is not used currently; finish it for completeness. Signed-off-by: Hua Huang <huah@nvidia.com>

Signed-off-by: Hua Huang <huah@nvidia.com>

for more information, see https://pre-commit.ci

huanghua1994 · 2024-11-09T04:27:06Z

CI L1 tests passed. Rebase to the main branch to verify #1314

huanghua1994 · 2024-11-09T04:27:33Z

/te-ci jax L1

tests/jax/test_fused_attn.py

Signed-off-by: Hua Huang <huah@nvidia.com>

huanghua1994 · 2024-11-10T17:00:29Z

/te-ci jax L1

zlsh80826

LGTM for the fused attn

phu0ngng

LGTM

huanghua1994 requested review from zlsh80826 and phu0ngng November 7, 2024 21:03

huanghua1994 and others added 4 commits November 8, 2024 13:28

FFI for all softmax functions

bf683bb

Signed-off-by: Hua Huang <huah@nvidia.com>

FFI for FusedAttnBackward and Dequantize

79a7d4a

FusedAttnBackward passed all testes in test_fused_attn.py. Dequantize is not used currently; finish it for completeness. Signed-off-by: Hua Huang <huah@nvidia.com>

Fix FusedAttnBackward FFI pybind & simplify

59895f0

Signed-off-by: Hua Huang <huah@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e891678

for more information, see https://pre-commit.ci

huanghua1994 force-pushed the xla-ffi-softmax-attnbwd branch from 73e00ba to e891678 Compare November 9, 2024 04:27

zlsh80826 reviewed Nov 9, 2024

View reviewed changes

tests/jax/test_fused_attn.py Outdated Show resolved Hide resolved

huanghua1994 added 2 commits November 10, 2024 08:58

Revert changes to tests/jax/test_fused_attn.py

9e318c6

Signed-off-by: Hua Huang <huah@nvidia.com>

.

5686706

Signed-off-by: Hua Huang <huah@nvidia.com>

zlsh80826 approved these changes Nov 11, 2024

View reviewed changes

phu0ngng approved these changes Nov 12, 2024

View reviewed changes

Merge branch 'main' into xla-ffi-softmax-attnbwd

b15e7cb

huanghua1994 merged commit 237b493 into NVIDIA:main Nov 12, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TE/JAX] XLA FFI calls for Softmax and FusedAttnBackward #1319

[TE/JAX] XLA FFI calls for Softmax and FusedAttnBackward #1319

huanghua1994 commented Nov 7, 2024 •

edited

Loading

huanghua1994 commented Nov 7, 2024

huanghua1994 commented Nov 7, 2024

huanghua1994 commented Nov 9, 2024

huanghua1994 commented Nov 9, 2024

huanghua1994 commented Nov 10, 2024

zlsh80826 left a comment

phu0ngng left a comment

[TE/JAX] XLA FFI calls for Softmax and FusedAttnBackward #1319

[TE/JAX] XLA FFI calls for Softmax and FusedAttnBackward #1319

Conversation

huanghua1994 commented Nov 7, 2024 • edited Loading

Description

Type of change

Checklist:

huanghua1994 commented Nov 7, 2024

huanghua1994 commented Nov 7, 2024

huanghua1994 commented Nov 9, 2024

huanghua1994 commented Nov 9, 2024

huanghua1994 commented Nov 10, 2024

zlsh80826 left a comment

Choose a reason for hiding this comment

phu0ngng left a comment

Choose a reason for hiding this comment

huanghua1994 commented Nov 7, 2024 •

edited

Loading