-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
How to run my custom model #1992
-
Hi! I swear that I have read the docs but I can not yet make localAI to use my model. Lets suppose that we are using the fantastic mlabonne's NeuralHermes I have copied the gguf file into models. Seems to be there {"object":"list","data":[{"id":"gpt-4","object":"model"},{"id":"gpt-4-vision-preview","object":"model"},{"id":"stablediffusion","object":"model"},{"id":"text-embedding-ada-002","object":"model"},{"id":"tts-1","object":"model"},{"id":"whisper-1","object":"model"},{"id":"MODEL_CARD","object":"model"},{"id":"llava-v1.6-7b-mmproj-f16.gguf","object":"model"},{"id":"neuralhermes-2.5-mistral-7b.Q6_K.gguf","object":"model"},{"id":"voice-en-us-amy-low.tar.gz","object":"model"}]} However when I call it I have the error
I have tried to setup a yaml file but the instructions are unclear. My machine is With GPT-4 model that seems to be an Hermes-2-Pro-Mistral-7B.Q6_K.gguf (precursor of NeuralHermes) It works What Im doing wrong? |
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments · 5 replies
-
@DavidGOrtega can you show the LocalAI logs with debug enabled? ( just set |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1 -
❤️ 1
-
@mudler here you are. It tries to load the model with every single backend. api_1 | 10:34PM DBG Request received: {"model":"neuralhermes-2.5-mistral-7b.Q6_K.gguf","language":"","n":0,"top_p":null,"top_k":null,"temperature":0.7,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":"A long time ago in a galaxy far, far away","instruction":"","input":null,"stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
api_1 | 10:34PM DBG `input`: &{PredictionOptions:{Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf Language: N:0 TopP:<nil> TopK:<nil> Temperature:0xc000247950 Maxtokens:<nil> Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:<nil> TypicalP:<nil> Seed:<nil> NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Context:context.Background.WithCancel Cancel:0x4ab9a0 File: ResponseFormat:{Type:} Size: Prompt:A long time ago in a galaxy far, far away Instruction: Input:<nil> Stop:<nil> Messages:[] Functions:[] FunctionCall:<nil> Tools:[] ToolsChoice:<nil> Stream:false Mode:0 Step:0 Grammar: JSONFunctionGrammarObject:<nil> Backend: ModelBaseName:}
api_1 | 10:34PM DBG Parameter Config: &{PredictionOptions:{Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf Language: N:0 TopP:0xc000247cc8 TopK:0xc000247ce0 Temperature:0xc000247950 Maxtokens:0xc000247cf0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000247d28 TypicalP:0xc000247d20 Seed:0xc000247d40 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:0xc000247cc0 Threads:0xc000247cb8 Debug:0xc000247d38 Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[A long time ago in a galaxy far, far away] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000247d08 MirostatTAU:0xc000247d00 Mirostat:0xc000247cf8 NGPULayers:0xc000247d30 MMap:0xc000247d38 MMlock:0xc000247d39 LowVRAM:0xc000247d39 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000247cb0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
api_1 | 10:34PM INF Trying to load the model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/bark/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/exllama2/run.sh
api_1 | 10:34PM INF [llama-cpp] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend llama-cpp
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: llama-cpp): {backendString:llama-cpp model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:43255'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager4277440127
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stdout Server listening on 127.0.0.1:43255
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stderr gguf_init_from_file: invalid magic characters '<!do'
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stderr llama_model_load: error loading model: llama_model_loader: failed to load model from /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stderr
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stderr llama_load_model_from_file: failed to load model
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stderr llama_init_from_gpt_params: error: failed to load model '/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf'
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:43255): stdout {"timestamp":1712874877,"level":"ERROR","function":"load_model","line":464,"message":"unable to load model","model":"/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf"}
api_1 | 10:34PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
api_1 | 10:34PM INF [llama-ggml] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend llama-ggml
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: llama-ggml): {backendString:llama-ggml model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-ggml
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:39703'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager417879904
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr 2024/04/11 22:34:37 gRPC Server listening at 127.0.0.1:39703
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr create_gpt_params: loading model /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr ggml_init_cublas: found 1 CUDA devices:
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr Device 0: NVIDIA L4, compute capability 8.9
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr llama.cpp: loading model from /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr error loading model: unknown (magic, version) combination: 6f64213c, 70797463; is this really a GGML file?
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr llama_load_model_from_file: failed to load model
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr llama_init_from_gpt_params: error: failed to load model '/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf'
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:39703): stderr load_binding_model: error: unable to load model
api_1 | 10:34PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
api_1 | 10:34PM INF [gpt4all] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend gpt4all
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: gpt4all): {backendString:gpt4all model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt4all
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:40211'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager1421742771
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40211): stderr 2024/04/11 22:34:39 gRPC Server listening at 127.0.0.1:40211
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40211): stderr load_model: error 'Model format not supported (no matching implementation found)'
api_1 | 10:34PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
api_1 | 10:34PM INF [bert-embeddings] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend bert-embeddings
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: bert-embeddings): {backendString:bert-embeddings model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:40185'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager1707657881
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40185): stderr 2024/04/11 22:34:41 gRPC Server listening at 127.0.0.1:40185
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40185): stderr bert_load_from_file: invalid model file '/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf' (bad magic)
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40185): stderr bert_bootstrap: failed to load model from '/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf'
api_1 | 10:34PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
api_1 | 10:34PM INF [rwkv] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend rwkv
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: rwkv): {backendString:rwkv model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/rwkv
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:44961'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager1380818262
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr 2024/04/11 22:34:43 gRPC Server listening at 127.0.0.1:44961
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv_file_format.inc:93: header.magic == 0x67676d66
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr Invalid file header
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv_model_loading.inc:158: rwkv_fread_file_header(file.file, model.header)
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv.cpp:63: rwkv_load_model_from_file(file_path, *ctx->model)
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:44961): stderr 2024/04/11 22:34:45 InitFromFile /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf failed
api_1 | 10:34PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
api_1 | 10:34PM INF [whisper] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend whisper
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: whisper): {backendString:whisper model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:37731'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager2190494607
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:37731): stderr 2024/04/11 22:34:45 gRPC Server listening at 127.0.0.1:37731
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:37731): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf'
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:37731): stderr whisper_model_load: loading model
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:37731): stderr whisper_model_load: invalid model data (bad magic)
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:37731): stderr whisper_init_with_params_no_state: failed to load model
api_1 | 10:34PM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
api_1 | 10:34PM INF [stablediffusion] Attempting to load
api_1 | 10:34PM INF Loading model 'neuralhermes-2.5-mistral-7b.Q6_K.gguf' with backend stablediffusion
api_1 | 10:34PM DBG Loading model in memory from file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf
api_1 | 10:34PM DBG Loading Model neuralhermes-2.5-mistral-7b.Q6_K.gguf with gRPC (file: /build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf) (backend: stablediffusion): {backendString:stablediffusion model:neuralhermes-2.5-mistral-7b.Q6_K.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000206000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
api_1 | 10:34PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion
api_1 | 10:34PM DBG GRPC Service for neuralhermes-2.5-mistral-7b.Q6_K.gguf will be running at: '127.0.0.1:40629'
api_1 | 10:34PM DBG GRPC Service state dir: /tmp/go-processmanager1314292932
api_1 | 10:34PM DBG GRPC Service Started
api_1 | 10:34PM DBG GRPC(neuralhermes-2.5-mistral-7b.Q6_K.gguf-127.0.0.1:40629): stderr 2024/04/11 22:34:47 gRPC Server listening at 127.0.0.1:40629
api_1 | 10:34PM DBG GRPC Service Ready
api_1 | 10:34PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:neuralhermes-2.5-mistral-7b.Q6_K.gguf ContextSize:512 Seed:1881277282 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/neuralhermes-2.5-mistral-7b.Q6_K.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
api_1 | 10:34PM INF [stablediffusion] Loads OK
api_1 | [92.119.140.132]:62353 500 - POST /v1/completions |
Beta Was this translation helpful? Give feedback.
All reactions
-
LocalAI's gpt4 model which is a precursor of this model works. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Silly me I downloaded a wrong file. The HF raw link 🤦 the clue can be seen clearly in the logs
|
Beta Was this translation helpful? Give feedback.
All reactions
-
How does HF work with the Docker configs? |
Beta Was this translation helpful? Give feedback.
All reactions
-
You can put the hf url in the yaml and the model will be downloaded from hf I think |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thinking about ways to patch this installation method for reranker-compatibility (especially for Windows) https://docs.dify.ai/tutorials/model-configuration/localai |
Beta Was this translation helpful? Give feedback.
Silly me I downloaded a wrong file. The HF raw link 🤦
the clue can be seen clearly in the logs