-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix Tuning does not work with T5-3B #485
Comments
Hey @saparina, thanks for bringing this up. Your explanation of the issue makes a lot of sense, we'll look into a posible fix for prefix tuning. It's a bit odd however that the constraint |
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label. |
Is there any update on this issue? |
Sorry for the delay on this, it should be fixed once #621 is merged. |
Fixes #485. Allows passing head dim (`n_embd_per_head`) explicitly to prefix tuning to accomodate models where head dim is not equal to hidden dim / n_heads.
Environment info
adapter-transformers
version: 3.1.0Information
Model I am using: T5-3B
Adapter setup I am using (if any): Prefix Tuning
To reproduce
I believe this bug occurs every time T5-3B and Prefix Tuning are used, for example, it can be reproduced by running the summarization example:
Error is raised when prefix embeddings should be concatenated with keys and values:
Expected behavior
There is a mismatch in the sizes of prefix embeddings and keys and values. I think this is because the prefix size is based on
hidden_size
which is defined asd_model
for T5 models while it should benum_heads * d_kv
. For T5-Small, T5-Base and T5-Larged_model == num_heads * d_kv
(and these models work fine) but this is not true for T5-3B whered_model=1024
andnum_heads=32
,d_kv=128
.The text was updated successfully, but these errors were encountered: