Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sliding window attn to TE-JAX API #1205

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

huanghua1994
Copy link
Collaborator

@huanghua1994 huanghua1994 commented Sep 25, 2024

Description

Recent models employ sliding window attention (SWA). Some frameworks use cuDNN fused attention through the TE-JAX Flash Attention API. The SWA support has not been exposed to this API yet. However, on the backend TE does have support for the SWA. This PR expose the SWA support to the Flash Attention API.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Changes

Please list the changes introduced in this PR:

Expose sliding window attention to the TE-JAX API

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Hua Huang <huah@nvidia.com>
Copy link
Collaborator

@mgoldfarb-nvidia mgoldfarb-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! Have a few small questions/comments.

tests/jax/test_fused_attn.py Show resolved Hide resolved
transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved
tests/jax/test_fused_attn.py Show resolved Hide resolved
@mingxu1067
Copy link
Collaborator

Could you port the SWA to flax and praxis modules as well?

Signed-off-by: Hua Huang <huah@nvidia.com>
Copy link
Collaborator

@mgoldfarb-nvidia mgoldfarb-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please address @mingxu1067 comments and will be in good shape.

Hua Huang and others added 2 commits September 27, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants