[JAX] Consolidate FFI and old descriptor implementation for fused attention. #1295

mgoldfarb-nvidia · 2024-10-28T18:14:00Z

Description

Some customers still depend on the older descriptor based approach to FFI for fused attention. This change consolidates the two methods to ensure we don't diverge in behavior or accidentally miss updates to one or the other.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Consolidate duplicate code.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

mgoldfarb-nvidia · 2024-10-28T18:29:00Z

/te-ci jax

phu0ngng · 2024-10-28T18:30:50Z

Hi @mgoldfarb-nvidia, thanks for pushing this PR.

I and @zlsh80826 also discussed a way to make FusedAttn APIs better by utilizing new features from FFI.
One of the related PR is #1289. Reese can comment more on this.

Note that we planed to remove all the legacy custom calls in the future when the new custom calls with FFI are stable and no unexpected changes from XLA. The legacy custom calls will be kept for a few months for fallback only. Customers are expected to switch to use new custom calls and in fact the new custom calls are enabled by default.

mgoldfarb-nvidia · 2024-10-28T18:34:13Z

Hi @mgoldfarb-nvidia, thanks for pushing this PR.

I and @zlsh80826 also discussed a way to make FusedAttn APIs better by utilizing new features from FFI. One of the related PR is #1289. Reese can comment more on this.

Note that we planed to remove all the legacy custom calls in the future when the new custom calls with FFI are stable and no unexpected changes from XLA. The legacy custom calls will be kept for a few months for fallback only. Customers are expected to switch to use new custom calls and in fact the new custom calls are enabled by default.

Thank @phu0ngng We have some customers depending on the old FFI interface and will definitely want to make sure we don't break too much until we can get them moved over. At a minimum we could try to separate the FFI layer from the underlying TE code just so its easier to rebase changes on other branches.

It looks like #1289 could be merged with this in a straightforward way.

huanghua1994 · 2024-10-28T18:41:13Z

This code will not pass the compilation:

static FusedAttnForwardImpl() has no type but called return ffi_with_cuda_error_check();
Error_Type FusedAttnForwardFFI() does not call return ffi_with_cuda_error_check();

Signed-off-by: Michael Goldfarb <mgoldfarb@nvidia.com>

mgoldfarb-nvidia · 2024-10-28T19:42:51Z

/te-ci jax

zlsh80826 · 2024-10-29T02:30:56Z

This is quite great! I did want to do the same thing.

mgoldfarb-nvidia · 2024-10-29T13:52:11Z

/te-ci jax

mgoldfarb-nvidia force-pushed the mgoldfarb/cleanup_ffi_fused_attn branch from 7ab4d8f to f7c936d Compare October 28, 2024 18:28

mgoldfarb-nvidia requested review from huanghua1994 and phu0ngng October 28, 2024 18:29

mgoldfarb-nvidia requested a review from zlsh80826 October 28, 2024 18:38

Consolidate FFI and old descriptor impleemntation for fused attention.

f529ed3

Signed-off-by: Michael Goldfarb <mgoldfarb@nvidia.com>

mgoldfarb-nvidia force-pushed the mgoldfarb/cleanup_ffi_fused_attn branch from f7c936d to f529ed3 Compare October 28, 2024 19:42

zlsh80826 approved these changes Oct 29, 2024

View reviewed changes

Merge branch 'main' into mgoldfarb/cleanup_ffi_fused_attn

44a4d0f

phu0ngng mentioned this pull request Oct 29, 2024

[TE/JAX] Disable FusedAttn with FFI by default #1298

Merged

12 tasks

phu0ngng approved these changes Oct 30, 2024

View reviewed changes

Merge branch 'main' into mgoldfarb/cleanup_ffi_fused_attn

ba7862e

phu0ngng merged commit c036765 into NVIDIA:main Oct 30, 2024
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Consolidate FFI and old descriptor implementation for fused attention. #1295

[JAX] Consolidate FFI and old descriptor implementation for fused attention. #1295

mgoldfarb-nvidia commented Oct 28, 2024

mgoldfarb-nvidia commented Oct 28, 2024

phu0ngng commented Oct 28, 2024

mgoldfarb-nvidia commented Oct 28, 2024 •

edited

Loading

huanghua1994 commented Oct 28, 2024

mgoldfarb-nvidia commented Oct 28, 2024

zlsh80826 commented Oct 29, 2024

mgoldfarb-nvidia commented Oct 29, 2024

[JAX] Consolidate FFI and old descriptor implementation for fused attention. #1295

[JAX] Consolidate FFI and old descriptor implementation for fused attention. #1295

Conversation

mgoldfarb-nvidia commented Oct 28, 2024

Description

Type of change

Changes

Checklist:

mgoldfarb-nvidia commented Oct 28, 2024

phu0ngng commented Oct 28, 2024

mgoldfarb-nvidia commented Oct 28, 2024 • edited Loading

huanghua1994 commented Oct 28, 2024

mgoldfarb-nvidia commented Oct 28, 2024

zlsh80826 commented Oct 29, 2024

mgoldfarb-nvidia commented Oct 29, 2024

mgoldfarb-nvidia commented Oct 28, 2024 •

edited

Loading