Q4 Cache #477

Vhallo · 2024-05-29T21:23:37Z

Vhallo
May 29, 2024

Just noticed that it now works properly on lower context depths too like FP16. Nice work, seems much more on par with FP16! Are there any more updates to the Q4 Cache in the works or do you consider it to be in a good place right now?

turboderp · 2024-05-29T22:18:54Z

turboderp
May 29, 2024
Maintainer

I have some ideas I want to explore, like Q8, Q3 and Q2 modes, and I expect the kernels could be optimized a bit for performance. But it's not a very high priority right now. I'll probably be focusing on weight quantization, batch performance and a new speculative mode.

0 replies

Vhallo · 2024-05-29T23:09:34Z

Vhallo
May 29, 2024
Author

Q-Cache is definitely a prime feature of exl2 and more options like Q8 or Q6 sound interesting for sure. Always good to have a use for some extra vram at the least. But understandable that weight quantization and such has a higher priority right now.

Looking forward to see exl2 continue to improve!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q4 Cache #477

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Q4 Cache #477

Vhallo May 29, 2024

Replies: 2 comments

turboderp May 29, 2024 Maintainer

Vhallo May 29, 2024 Author

Vhallo
May 29, 2024

turboderp
May 29, 2024
Maintainer

Vhallo
May 29, 2024
Author