Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding bit allocation #3

Open
Ali-Flt opened this issue Aug 20, 2024 · 1 comment
Open

Question regarding bit allocation #3

Ali-Flt opened this issue Aug 20, 2024 · 1 comment

Comments

@Ali-Flt
Copy link

Ali-Flt commented Aug 20, 2024

Hi,

In your paper you mention that you allocated 2, 3, or 4 bits to each layer of the model using a criteria. But in Fig. 1(d): Construct LUT and Query&Add, the binary weights are shown to be 8-bit. This has confused me a bit. Is the figure created with 8-bit weights in mind instead of <= 4 bit weights? Or am I misunderstanding the flow?

Another way I tried to interpret Fig. 1 is that the FP16 Shift and Query&Add blocks have to run once for every bit of W. For instance, if we have allocated 3 bits to a weight W, the ShiftAddLLM block runs 3 times, each time for one bit of the W. In this interpretation, each bit in the 8-bit binary weights in Fig. 1(d) correspond to one of the activation (x) values.

Could you please elaborate more on how the bit allocation maps to the ShiftAddLLM architecture?

@xlim1996
Copy link

Hi,
Could you find the code how they use this shift&add during inference?
I could only find that they unpacked the alpha and binary weight back to FP16 format, which is contradicting the methodology as shown in figure 1.

Best regards,
Lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants