workaround for gpt-j (#395)

Some models initialize tensors during the first forward pass and reuse it for next iterations. This causes model to recompile . One temporary solution is to run torch model once before compilation. Related issue is here: CentML/hidet#291 Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
hidet-org · Aug 2, 2024 · 6bfd95f · 6bfd95f
1 parent 14edb9a
commit 6bfd95f
Showing 1 changed file with 5 additions and 0 deletions.
diff --git a/tests/benchmarks/bench_transformer.py b/tests/benchmarks/bench_transformer.py
@@ -59,6 +59,11 @@ def bench_causal_lm(model_name, bs, genlen, dtype, backend, mode):
     inputs = tokenizer(input_string_batch, return_tensors='pt')['input_ids'].cuda()
 
     with torch.no_grad(), torch.autocast("cuda"):
+        # Temporary workaround for gpt-j
+        # gpt-j initializes tensors during the first forwasd pass
+        # which causes recompilation during the second forward pass
+        if model_name == 'EleutherAI/gpt-j-6B':
+            model(inputs)
         model = comp_backend.compile(model)
         latency = bench_gen_model(model, tokenizer, inputs, bs=bs, genlen=genlen)
         del model