T5 + FlexAttention

This project implements T5's bucketed relative position bias using FlexAttention, a new PyTorch feature that allows for flexible and efficient attention operations.

Prerequisites

PyTorch (version 2.5 or higher)
huggingface transformers

Attention Gym

gym.py implements t5's position bias in flex attention, using the Attention Gym repository for visualization.

Integrate flex attention into huggingface T5

A simple monkey patching implementation is provided in patch_hf_t5.py.

Check for correctness

python test_correctness.py

note: the trained weights can cause more than 1% difference in the output, so some tests require re-initialized weights. (notice in the visualization above some biases are relatively large (~8))

Benchmark

bash run.sh # this runs `benchmark.py` a few times and generates some files.

output files:
- trace*.json files, go to chrome and open chrome://tracing/, click "Load" and select the json file.
- memory_snapshot*.json files, go to https://pytorch.org/memory_viz, drag and drop the json file.
- log.txt: benchmark results
setting: B=1, L=1024, model=google/t5-v1_1-xxl

📊 Show Performance Table

Configuration (B=1, L=1024, model=google/t5-v1_1-xxl)	Time (RTX3090)	Time (A100)
Baseline	294 ms	156 ms
Baseline + Compile	174 ms	63 ms
Flex Attention + Compile	160 ms	55 ms
(w/ mask) Baseline	294 ms	157 ms
(w/ mask) Baseline + Compile	166 ms	54 ms
(w/ mask) Flex Attention + Compile	147 ms	47 ms

Limitations

Currently score_mod captured tensors does not support gradient. (as of Sep 2024, torch nighlty)
There is a memory leak when running flex_attention() without any compile.

📊 Details

Configuration (B=1, L=1024, model=google/t5-v1_1-xxl)	Time (RTX3090)	Time (A100)
Baseline	294 ms	156 ms
Baseline + Compile	174 ms	63 ms
Flex Attention + no Compile	memory leak	memory leak
Flex Attention + Compile noly flex_attention	185 ms	70 ms
Flex Attention + Compile noly forward	186 ms	73 ms
Flex Attention + Compile whole model	160 ms	55 ms

- run `bash run_memory_leak.sh` to get the results.

License

This project is dual-licensed:

MIT License: All files except gym.py
BSD 3-Clause License: gym.py (adapted from Attention Gym)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
output_images		output_images
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
gym.py		gym.py
patch_hf_t5.py		patch_hf_t5.py
run.sh		run.sh
run_memory_leak.sh		run_memory_leak.sh
test_correctness.py		test_correctness.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T5 + FlexAttention

Prerequisites

Attention Gym

Integrate flex attention into huggingface T5

Check for correctness

Benchmark

Limitations

License

About

Releases

Packages

Languages

License

cccntu/t5-flex-attention

Folders and files

Latest commit

History

Repository files navigation

T5 + FlexAttention

Prerequisites

Attention Gym

Integrate flex attention into huggingface T5

Check for correctness

Benchmark

Limitations

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages