Skip to content

Commit

Permalink
workaround for gpt-j (#395)
Browse files Browse the repository at this point in the history
Some models initialize tensors during the first forward pass and reuse
it for next iterations. This causes model to recompile . One temporary
solution is to run torch model once before compilation. Related issue is
here: CentML/hidet#291

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
  • Loading branch information
2 people authored and vadiklyutiy committed Dec 19, 2024
1 parent 15ec205 commit 6500c3c
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions tests/benchmarks/bench_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ def bench_causal_lm(model_name, bs, genlen, dtype, backend, mode):
inputs = tokenizer(input_string_batch, return_tensors='pt')['input_ids'].cuda()

with torch.no_grad(), torch.autocast("cuda"):
# Temporary workaround for gpt-j
# gpt-j initializes tensors during the first forwasd pass
# which causes recompilation during the second forward pass
if model_name == 'EleutherAI/gpt-j-6B':
model(inputs)
model = comp_backend.compile(model)
latency = bench_gen_model(model, tokenizer, inputs, bs=bs, genlen=genlen)
del model
Expand Down

0 comments on commit 6500c3c

Please sign in to comment.