Dilated Attention planned or already implemented? #928

Coluding · 2023-11-19T20:04:33Z

Coluding
Nov 19, 2023

Hi all :)
I was wondering if dilated local attention like in the Longformer paper is already integrated? I could not find it but thought maybe of the Longformer paper reference in the docstring of the LocalAttention class that there may be something already implemented. If not, is the implementation planned?

Thanks in advance!

danthe3rd · 2023-11-20T09:23:46Z

danthe3rd
Nov 20, 2023
Collaborator

Hi @Coluding
We support the local attention, but not the dilated local attention.
We also don't have plans to implement it at the moment.

1 reply

Coluding Nov 20, 2023
Author

Okay thanks, I implemented it with a wrapper class around the LocalAttention in my codebase. If I have time will consider integrating it directly in the LocalAtttention and create a PR for that.

Coluding · 2023-11-23T10:45:12Z

Coluding
Nov 23, 2023
Author

Hi again,

is your implementation of LocalAttention also sparse-aware such that empty tokens are not represented in memory? I could not find any docs on that. Thanks in advance!

6 replies

Coluding Nov 24, 2023
Author

Okay great, thank you for you quick reply!

One last question regarding the local attention. When using as attention mechanism for the MultiHeadDispatch model, my memory explodes instantly. Is this normal behavior of the MultiHeadDispatch module or might there be anything going wrong? When using the LocalAttention solely the memory required is very low in comparison.

Thank you again!

danthe3rd Nov 24, 2023
Collaborator

The MultiHeadDispatch module is deprecated and will soon be removed. It's using an implementation from before Memory-Efficient Attention/Flash-Attention, so it's no longer competitive.
I recommend that you use xformers.ops.memory_efficient_attention for best performance

Coluding Nov 24, 2023
Author

Ah I see,

is the LocalAttention applicable for the memory_efficient_attention as well since memory_efficient_attention calls internal attention computation functions? Or would you say it does not matter, because the memory_efficient_attention performance surpasses the LocalAttention performance?
I have been trying some things in the last days and thought that the memory_efficient_attention attention had the best memory efficiency of all attention computations / mechanisms I tried on my TITAN V GPU. Does that align with your expectations/experiences?

danthe3rd Nov 24, 2023
Collaborator

is the LocalAttention applicable for the memory_efficient_attention as well

Yes, through a specific attn_bias ... that does not exist yet. Let me add it next week.

the memory_efficient_attention performance surpasses the LocalAttention performance

This might be true as well, depending on your problem size.

the memory_efficient_attention attention had the best memory efficiency of all attention computations / mechanisms I tried

This is not surprising :)

danthe3rd Nov 28, 2023
Collaborator

@Coluding we just landed a754db2 which adds the LowerTriangularFromBottomRightLocalAttentionMask which is basically causal+local attention

Coluding · 2023-11-24T13:58:47Z

Coluding
Nov 24, 2023
Author

Great! Thanks for your quick replies!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dilated Attention planned or already implemented? #928

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dilated Attention planned or already implemented? #928

Coluding Nov 19, 2023

Replies: 3 comments · 7 replies

danthe3rd Nov 20, 2023 Collaborator

Coluding Nov 20, 2023 Author

Coluding Nov 23, 2023 Author

Coluding Nov 24, 2023 Author

danthe3rd Nov 24, 2023 Collaborator

Coluding Nov 24, 2023 Author

danthe3rd Nov 24, 2023 Collaborator

danthe3rd Nov 28, 2023 Collaborator

Coluding Nov 24, 2023 Author

Coluding
Nov 19, 2023

Replies: 3 comments 7 replies

danthe3rd
Nov 20, 2023
Collaborator

Coluding Nov 20, 2023
Author

Coluding
Nov 23, 2023
Author

Coluding Nov 24, 2023
Author

danthe3rd Nov 24, 2023
Collaborator

Coluding Nov 24, 2023
Author

danthe3rd Nov 24, 2023
Collaborator

danthe3rd Nov 28, 2023
Collaborator

Coluding
Nov 24, 2023
Author