When using roller, the speed is too slow #254

tbaram2 · 2024-01-03T03:04:12Z

tbaram2
Jan 3, 2024

First of all, exllama v2 is a really great module.

But there is one problem.

In the past exllama v1, there was a slight slowdown when using Lora, but it was approximately 10%.

However, in the case of exllama v2, it is good to support Lora, but when using Lora, the token creation speed slows down by almost 2 times.
(13B based on 4090: without lora 80 tokens/s => with lora 30 tokens/s)

Does anyone know how to solve this?

turboderp · 2024-01-03T04:54:39Z

turboderp
Jan 3, 2024
Maintainer

Are you using the same LoRA in both cases?

1 reply

tbaram2 Jan 3, 2024
Author

in oobabooga.

--model TheBloke_MythoMax-L2-13B-GPTQ_gptq-4bit-32g-actorder_True
--lora nRuaif_BluemoonRP-L2-13B-This-time-will-be-better

with same model and lora in exllama v1, it was approximately 10% slow. but exllamav2 almost 60%~70% slow.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using roller, the speed is too slow #254

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

When using roller, the speed is too slow #254

tbaram2 Jan 3, 2024

Replies: 1 comment · 1 reply

turboderp Jan 3, 2024 Maintainer

tbaram2 Jan 3, 2024 Author

tbaram2
Jan 3, 2024

Replies: 1 comment 1 reply

turboderp
Jan 3, 2024
Maintainer

tbaram2 Jan 3, 2024
Author