When would you use ggml vs llama.cpp? #141

lefnire · 2023-05-10T02:34:28Z

lefnire
May 10, 2023

My sense was that ggml is the converter/quantizer util, and llama.cpp is the "app" (server, docker, etc). But after sitting with both projects some, I'm not sure I pegged it right. Eg, I originally thought you can only run inference from within llama.cpp - given all the app ecosystem stuff going on (llama_cpp_python, CLI, the dockerfile, etc). But I can run inference with ggml (eg from the README via make -j && ./bin/stablelm -m ./stablelm-base-alpha-3b/ggml-model-f16.bin -p "I believe the meaning of life is" -t 8 -n 64). If I wanted that in my web project, I could subprocess.run(...) it from the Python script. Given, subprocess.call is pretty dumb if llama_cpp_python's right over there - but llama.cpp doesn't yet support NeoX models from what I gather (StableLM, RedPajama, Dolly v2) - which is the best bet for commercial applications. So subprocess seems the best solution here? Given that difference in model support, I wonder now if these are two separate projects, rather than two parts of an overall "project ecosystem"?

Edit: not trying to compare repos in quality, trying to wrap my mind around them.

KerfuffleV2 · 2023-05-27T10:08:57Z

KerfuffleV2
May 27, 2023

(Explanation from a random person on the internet, so take that for what it's worth.)

GGML is a machine learning library, it's also a file format that some apps (like llama.cpp) use (generally those apps are based on the GGML library). Both the GGML repo and llama.cpp repo have examples of use.

My understanding is that GGML the library (and this repo) are more focused on the general machine learning library perspective: it moves slower than the llama.cpp repo and has less bleeding edge features, but it supports more types of models like Whisper for example.

The llama.cpp repo is more focused on running inference with LLaMA-based models. If you want to run LLaMA-based models, you'll probably find that using the examples or llama.cpp library to have more features and better performance most of the time.

I wonder now if these are two separate projects, rather than two parts of an overall "project ecosystem"?

llama.cpp has its own version of the GGML library but changes get synchronized back and forth from time to time. llama.cpp appears to be a testbed for new features and then once they're proven stable/beneficial then they'll probably get merged back here. So I think it's reasonable to think of the two projects as "part of an overall project ecosystem" even if they aren't necessarily in sync at a given moment.

1 reply

Connor56 Sep 30, 2024

Interesting, so llama.cpp can't support any kind of model, i.e. any model I could make using PyTorch, but ggml could?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When would you use ggml vs llama.cpp? #141

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

When would you use ggml vs llama.cpp? #141

lefnire May 10, 2023

Replies: 1 comment · 1 reply

KerfuffleV2 May 27, 2023

Connor56 Sep 30, 2024

lefnire
May 10, 2023

Replies: 1 comment 1 reply

KerfuffleV2
May 27, 2023