Prefix Tuning does not work with T5-3B #485

saparina · 2023-01-25T17:28:41Z

Environment info

adapter-transformers version: 3.1.0

Information

Model I am using: T5-3B

Adapter setup I am using (if any): Prefix Tuning

To reproduce

I believe this bug occurs every time T5-3B and Prefix Tuning are used, for example, it can be reproduced by running the summarization example:

python examples/pytorch/summarization/run_summarization.py \
    --model_name_or_path t5-small \
    --do_train \
    --do_eval \
    --dataset_name cnn_dailymail \
    --dataset_config "3.0.0" \
    --source_prefix "summarize: " \
    --output_dir /tmp/tst-summarization \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --overwrite_output_dir \
    --predict_with_generate \
    --train_adapter \
    --adapter_config prefix_tuning

Error is raised when prefix embeddings should be concatenated with keys and values:

  File "/home/hpcsapa1/.conda/envs/adapters-torch1.9/lib/python3.9/site-packages/transformers/adapters/prefix_tuning.py", line 329, in forward
    key_states = torch.cat([prefix_keys, key_states], dim=2)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 128 for tensor number 1 in the list.

Expected behavior

There is a mismatch in the sizes of prefix embeddings and keys and values. I think this is because the prefix size is based on hidden_size which is defined as d_model for T5 models while it should be num_heads * d_kv. For T5-Small, T5-Base and T5-Large d_model == num_heads * d_kv (and these models work fine) but this is not true for T5-3B where d_model=1024 and num_heads=32, d_kv=128 .

The text was updated successfully, but these errors were encountered:

calpt · 2023-01-26T14:22:17Z

Hey @saparina, thanks for bringing this up.

Your explanation of the issue makes a lot of sense, we'll look into a posible fix for prefix tuning. It's a bit odd however that the constraint d_model == num_heads * d_kv is not fulfilled by all models as the HuggingFace documentation for T5 explicitly states that this is required to be the case: https://huggingface.co/docs/transformers/main/en/model_doc/t5#transformers.T5Config.d_kv

adapter-hub-bert · 2023-04-27T06:15:46Z

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

vijetadeshpande · 2023-09-29T00:31:28Z

Is there any update on this issue?

calpt · 2023-12-20T16:49:11Z

Sorry for the delay on this, it should be fixed once #621 is merged.

Fixes #485. Allows passing head dim (`n_embd_per_head`) explicitly to prefix tuning to accomodate models where head dim is not equal to hidden dim / n_heads.

saparina added the bug Something isn't working label Jan 25, 2023

adapter-hub-bert added the Stale label Apr 27, 2023

calpt removed the Stale label Apr 27, 2023

lenglaender added the do-not-stale This issue won't be automatically staled and closed after 90 days label May 15, 2023

calpt self-assigned this Dec 18, 2023

calpt mentioned this issue Dec 18, 2023

Fix Prefix-Tuning for T5 models where d_kv != d_model / num_heads #621

Merged

calpt closed this as completed in #621 Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefix Tuning does not work with T5-3B #485

Prefix Tuning does not work with T5-3B #485

saparina commented Jan 25, 2023

calpt commented Jan 26, 2023

adapter-hub-bert commented Apr 27, 2023

vijetadeshpande commented Sep 29, 2023

calpt commented Dec 20, 2023

Prefix Tuning does not work with T5-3B #485

Prefix Tuning does not work with T5-3B #485

Comments

saparina commented Jan 25, 2023

Environment info

Information

To reproduce

Expected behavior

calpt commented Jan 26, 2023

adapter-hub-bert commented Apr 27, 2023

vijetadeshpande commented Sep 29, 2023

calpt commented Dec 20, 2023