-
Beta Was this translation helpful? Give feedback.
Answered by
tavviv
May 9, 2024
Replies: 1 comment
-
Aphrodite does not support 4-bit cache. --load-in-4bit means the model will be automatically quanted down to 4bit using smoothquant. It'll use the system RAM to do this. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
houmie
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Aphrodite does not support 4-bit cache.
--load-in-4bit means the model will be automatically quanted down to 4bit using smoothquant. It'll use the system RAM to do this.