Skip to content

Commit

Permalink
workaround for gpt-j (#395)
Browse files Browse the repository at this point in the history
Some models initialize tensors during the first forward pass and reuse
it for next iterations. This causes model to recompile . One temporary
solution is to run torch model once before compilation. Related issue is
here: CentML/hidet#291

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
  • Loading branch information
zhumakhan and Zhumakhan authored Aug 2, 2024
1 parent 14edb9a commit 6bfd95f
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions tests/benchmarks/bench_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ def bench_causal_lm(model_name, bs, genlen, dtype, backend, mode):
inputs = tokenizer(input_string_batch, return_tensors='pt')['input_ids'].cuda()

with torch.no_grad(), torch.autocast("cuda"):
# Temporary workaround for gpt-j
# gpt-j initializes tensors during the first forwasd pass
# which causes recompilation during the second forward pass
if model_name == 'EleutherAI/gpt-j-6B':
model(inputs)
model = comp_backend.compile(model)
latency = bench_gen_model(model, tokenizer, inputs, bs=bs, genlen=genlen)
del model
Expand Down

0 comments on commit 6bfd95f

Please sign in to comment.