Skip to content

Commit

Permalink
[change] Reduce calculated bit width of AV
Browse files Browse the repository at this point in the history
- This assumes that the attention is distributed among different values and not on one token
  • Loading branch information
Xeratec committed Oct 30, 2024
1 parent 5942086 commit 93c3986
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion PyITA/ITA.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ def _initialize_quantization_parameters(self):
elif i == 3: # QK
max_bit_width = np.log2(self.requant_eps_mult[i, :].astype(np.uint32) * self.P * 2**8).astype(np.uint32)
elif i == 4: # AV
max_bit_width = np.log2(self.requant_eps_mult[i, :].astype(np.uint32) * self.S * 2**8).astype(np.uint32)
max_bit_width = np.log2(self.requant_eps_mult[i, :].astype(np.uint32) * self.S * 2**5).astype(np.uint32)
elif i == 5: # OW
max_bit_width = np.log2(self.requant_eps_mult[i, :].astype(np.uint32) * self.E * 2**9).astype(np.uint32)
elif i == 6: # Sum OW
Expand Down

0 comments on commit 93c3986

Please sign in to comment.