-
Hi everyone, could I get some help to load llama model on GPU? I am using Kubernetes to deploy local-ai and p2.x8large AWS instance for GPU. Here is the error message I got from the log when I try to load llama models (https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) From backend llama
From backend llama-cpp
Here is my configurations:
Could you please help to recommend which llama model would work for cuda? Thanks and really appreciated |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
the models yaml file, and gguf file need to be in the models folder. see this how to, on how to set that up - https://localai.io/howtos/easy-model/ |
Beta Was this translation helpful? Give feedback.
-
You have to use the url that points to the raw file. In your case it's https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q4_0.gguf and NOT https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf |
Beta Was this translation helpful? Give feedback.
You have to use the url that points to the raw file. In your case it's https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q4_0.gguf and NOT https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf