You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I met the below issues when I try to serve GPT2 as in guide, any one could help me to check if this is a error relate configuration:
$ python examples/inference/api_server_openai/query_http_requests.py
chunk content: {"generated_text":null,"tool_calls":null,"num_input_tokens":null,"num_input_tokens_batch":null,"num_generated_tokens":null,"num_generated_tokens_batch":null,"preprocessing_time":null,"generation_time":null,"timestamp":1715708425.9959323,"finish_reason":null,"error":{"object":"error","message":"Internal Server Error","internal_message":"Internal Server Error","type":"InternalServerError","param":{},"code":500}}
Traceback (most recent call last):
File "/home/rcp_user/yongqiang/llm-on-ray/examples/inference/api_server_openai/query_http_requests.py", line 90, in <module>
raise e
File "/home/rcp_user/yongqiang/llm-on-ray/examples/inference/api_server_openai/query_http_requests.py", line 85, in <module>
choices = json.loads(chunk)["choices"]
~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'choices'
check the logs with "ray logs cluster"
$ ray logs cluster worker-f07bb5a711a3c88ae7720f125167d0d2a7e64799231f33910f71ac72-01000000-1624109.err
--- Log has been truncated to last 1000 lines. Use `--tail` flag to toggle. Set to -1 for getting the entire file. ---
:job_id:01000000
:actor_name:ServeReplica:router:PredictorDeployment
/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
2024-05-14 13:39:51,629 - _logger.py - IPEX - WARNING - [NotSupported]fail to apply ipex.llm.optimize due to: Could not run 'ipex_prepack::linear_prepack' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'ipex_prepack::linear_prepack' is only available for these backends: [CPU, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
CPU: registered at /opt/workspace/ipex-cpu-dev/csrc/cpu/jit/cpu/kernels/RegisterOpContextClass.cpp:192 [kernel]
Meta: registered at ../aten/src/ATen/core/MetaFallbackKernel.cpp:23 [backend fallback]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:154 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:497 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:324 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:297 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:378 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:244 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:731 [backend fallback]
BatchedNestedTensor: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:758 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:27 [backend fallback]
Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:202 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:162 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:493 [backend fallback]
PreDispatch: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:166 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:158 [backend fallback]
, fallback to the origin model
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
ERROR 2024-05-14 13:40:25,987 router_PredictorDeployment dh04lq3r 442f8811-5cef-45f9-916b-e4e964ef92dc /v1/chat/completions replica.py:359 - Request failed:
ray::ServeReplica:router:PredictorDeployment.handle_request_with_rejection() (pid=1624109, ip=10.97.102.172)
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/ray/serve/_private/utils.py", line 168, in wrap_to_ray_error
raise exception
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1131, in call_user_method
result = await self._handle_user_method_result(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1038, in _handle_user_method_result
async for r in result:
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/llm_on_ray/inference/predictor_deployment.py", line 444, in openai_call
yield await self.handle_non_streaming(input, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/llm_on_ray/inference/predictor_deployment.py", line 242, in handle_non_streaming
return await self.handle_dynamic_batch((input, config))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/ray/serve/batching.py", line 579, in batch_wrapper
return await enqueue_request(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/ray/serve/batching.py", line 265, in _assign_func_results
results = await func_future
^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/llm_on_ray/inference/predictor_deployment.py", line 269, in handle_dynamic_batch
batch_results = self.predictor.generate(prompts, **config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/yongqiang/llm-on-ray/llm_on_ray/inference/predictors/transformer_predictor.py", line 123, in generate
gen_tokens = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/transformers/generation/utils.py", line 1622, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/transformers/generation/utils.py", line 2791, in _sample
outputs = self(
^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1305, in forward
transformer_outputs = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1119, in forward
outputs = block(
^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 616, in forward
hidden_states = self.ln_1(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 201, in forward
return F.layer_norm(
^^^^^^^^^^^^^
File "/home/rcp_user/anaconda3/envs/ray/lib/python3.11/site-packages/torch/nn/functional.py", line 2573, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected scalar type BFloat16 but found Float
INFO 2024-05-14 13:40:25,988 router_PredictorDeployment dh04lq3r 442f8811-5cef-45f9-916b-e4e964ef92dc /v1/chat/completions replica.py:373 - OPENAI_CALL ERROR 751.0ms
Below is the environment:
$ nvidia-smi
Tue May 14 13:42:53 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:1B:00.0 Off | Off |
| 0% 37C P8 30W / 450W | 17514MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:3D:00.0 Off | Off |
| 0% 38C P8 20W / 450W | 14502MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
I use the conda as virtual ENV, the python version is :
Hi, I met the below issues when I try to serve GPT2 as in guide, any one could help me to check if this is a error relate configuration:
Below is the environment:
I use the conda as virtual ENV, the python version is :
GPT2 configuration:
The text was updated successfully, but these errors were encountered: