Replies: 1 comment 1 reply
-
Are you using the same LoRA in both cases? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First of all, exllama v2 is a really great module.
But there is one problem.
In the past exllama v1, there was a slight slowdown when using Lora, but it was approximately 10%.
However, in the case of exllama v2, it is good to support Lora, but when using Lora, the token creation speed slows down by almost 2 times.
(13B based on 4090: without lora 80 tokens/s => with lora 30 tokens/s)
Does anyone know how to solve this?
Beta Was this translation helpful? Give feedback.
All reactions