Replies: 1 comment 1 reply
-
(Explanation from a random person on the internet, so take that for what it's worth.) GGML is a machine learning library, it's also a file format that some apps (like llama.cpp) use (generally those apps are based on the GGML library). Both the GGML repo and llama.cpp repo have examples of use. My understanding is that GGML the library (and this repo) are more focused on the general machine learning library perspective: it moves slower than the llama.cpp repo and has less bleeding edge features, but it supports more types of models like Whisper for example. The llama.cpp repo is more focused on running inference with LLaMA-based models. If you want to run LLaMA-based models, you'll probably find that using the examples or llama.cpp library to have more features and better performance most of the time.
llama.cpp has its own version of the GGML library but changes get synchronized back and forth from time to time. llama.cpp appears to be a testbed for new features and then once they're proven stable/beneficial then they'll probably get merged back here. So I think it's reasonable to think of the two projects as "part of an overall project ecosystem" even if they aren't necessarily in sync at a given moment. |
Beta Was this translation helpful? Give feedback.
-
My sense was that ggml is the converter/quantizer util, and llama.cpp is the "app" (server, docker, etc). But after sitting with both projects some, I'm not sure I pegged it right. Eg, I originally thought you can only run inference from within llama.cpp - given all the app ecosystem stuff going on (llama_cpp_python, CLI, the dockerfile, etc). But I can run inference with ggml (eg from the README via
make -j && ./bin/stablelm -m ./stablelm-base-alpha-3b/ggml-model-f16.bin -p "I believe the meaning of life is" -t 8 -n 64
). If I wanted that in my web project, I couldsubprocess.run(...)
it from the Python script. Given, subprocess.call is pretty dumb if llama_cpp_python's right over there - but llama.cpp doesn't yet support NeoX models from what I gather (StableLM, RedPajama, Dolly v2) - which is the best bet for commercial applications. So subprocess seems the best solution here? Given that difference in model support, I wonder now if these are two separate projects, rather than two parts of an overall "project ecosystem"?Edit: not trying to compare repos in quality, trying to wrap my mind around them.
Beta Was this translation helpful? Give feedback.
All reactions