Is AirLLM faster than llama.cpp? #206

Lizonghang · 2024-11-18T05:11:11Z

Dear Lyogavin,

Thanks for your wonderful work. I have a question about, does AirLLM run faster than llama.cpp? Do you have any data on that?

As I know, llama.cpp uses mmap to manage memory. When computation meets page faults, mmap automatically loads tensor weights from disk to memory and continue computation, and it also unloads less-used tensor weights when the memory load is high, all managed by the OS. So llama.cpp also supports very large LLMs, like the feature AirLLM provides.

I noticed that AirLLM uses prefetching to overlap disk IO latency and computation, will this be faster than llama.cpp (with mmap enabled)? And how much is the improvement?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is AirLLM faster than llama.cpp? #206

Is AirLLM faster than llama.cpp? #206

Lizonghang commented Nov 18, 2024 •

edited

Loading

Is AirLLM faster than llama.cpp? #206

Is AirLLM faster than llama.cpp? #206

Comments

Lizonghang commented Nov 18, 2024 • edited Loading

Lizonghang commented Nov 18, 2024 •

edited

Loading