How to use init_cache() in NNX MHA correctly? #4290

windmaple · 2024-10-12T05:15:40Z

windmaple
Oct 12, 2024

So I'm trying to enable the KV cache. Here is my code:

`
import flax.nnx as nnx
import jax.numpy as jnp

batch_size=2
seqlen=40
emb_size=256

x = jnp.ones((batch_size, seqlen, emb_size))

mha = nnx.MultiHeadAttention(
in_features=emb_size,
num_heads=2,
decode=True,
rngs=nnx.Rngs(0)
)
shape = x.shape

mha.init_cache((x.shape[0], x.shape[1], x.shape[-1]), dtype=x.dtype)
mha(x)
print('success')
`

I got this error:

Traceback (most recent call last): File "/Users/windmaple/Desktop/kv_cache.py", line 23, in <module> mha(x) File "/Users/windmaple/miniconda2/envs/python3.10/lib/python3.10/site-packages/flax/nnx/nnx/nn/attention.py", line 504, in __call__ raise ValueError( ValueError: Autoregressive cache shape error, expected query shape (2, 1, 2, 128) instead got (2, 40, 2, 128).

It seems 'seqlen' dimension is not being used. Looking at the init_cache() code, it looks like it should be. The docstring only uses a 2 dimensional array as an example, so I can't quite figure this one out.

Answered by cgarciae

Oct 12, 2024

Hey @windmaple, when in decode mode only a single token must be passed at a time e.g:

batch_size = 2
seqlen = 40
emb_size = 256

x = jnp.ones((batch_size, seqlen, emb_size))

mha = nnx.MultiHeadAttention(
  in_features=emb_size, num_heads=2, decode=True, rngs=nnx.Rngs(0)
)
shape = x.shape

mha.init_cache((x.shape[0], x.shape[1], x.shape[-1]), dtype=x.dtype)
for i in range(seqlen): # iterate all tokens
  y = mha(x[:, i : i + 1])
print('success')

View full answer

cgarciae · 2024-10-12T15:14:48Z

cgarciae
Oct 12, 2024
Maintainer

Hey @windmaple, when in decode mode only a single token must be passed at a time e.g:

batch_size = 2
seqlen = 40
emb_size = 256

x = jnp.ones((batch_size, seqlen, emb_size))

mha = nnx.MultiHeadAttention(
  in_features=emb_size, num_heads=2, decode=True, rngs=nnx.Rngs(0)
)
shape = x.shape

mha.init_cache((x.shape[0], x.shape[1], x.shape[-1]), dtype=x.dtype)
for i in range(seqlen): # iterate all tokens
  y = mha(x[:, i : i + 1])
print('success')

2 replies

cgarciae Oct 12, 2024
Maintainer

I've create #4291 to improve the docstrings.

windmaple Oct 13, 2024
Author

Thanks! It works now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use init_cache() in NNX MHA correctly? #4290

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to use init_cache() in NNX MHA correctly? #4290

windmaple Oct 12, 2024

Replies: 1 comment · 2 replies

cgarciae Oct 12, 2024 Maintainer

cgarciae Oct 12, 2024 Maintainer

windmaple Oct 13, 2024 Author

windmaple
Oct 12, 2024

Replies: 1 comment 2 replies

cgarciae
Oct 12, 2024
Maintainer

cgarciae Oct 12, 2024
Maintainer

windmaple Oct 13, 2024
Author