Skip to content

Commit

Permalink
Fix number of blocks when profiling contiguous pa (#496)
Browse files Browse the repository at this point in the history
  • Loading branch information
madamczykhabana authored Nov 14, 2024
1 parent 0548200 commit eca9a83
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion vllm/worker/hpu_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -1121,8 +1121,9 @@ def _prepare_decode(

padding_fn = None
if self.use_contiguous_pa:
block_bucket_size = max(max(block_list) + 1, len(block_list))
block_bucket_size = find_bucket(
max(block_list) + 1,
block_bucket_size,
self.bucketing_global_state.decode_block_bucket_cfg)
indices: List[Any]
indices = [None] * block_bucket_size
Expand Down

0 comments on commit eca9a83

Please sign in to comment.