-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting error while executing query_openai_sdk.py to test the inference #66
Comments
@yutianchen666 Could you help to reproduce the issue? I am not sure if it is OpenAI version causing api break. |
I used openai==0.28 version, since latest version gave error and recommoneded to use this version |
ok, I'll reproduce it soon |
@dkiran1 Thank you for your reporting. If you want to use Openai compatible sdk, please remove --simple parameter. After serving, please set |
Hi Yan, Thanks for the details, I tried the above mentioned steps, I could run inference server with falcon model, but on running d lead to undefined behavior! |
Hi @dkiran1 , we currently have limited bandwidth and hardware to test on Gaudi. Currently the Gaudi related part is not up to date. I just tested in docker, in # install llm-on-ray, assume mounted
pip install -e .
# install latest optimum[habana]
pip install optimum[habana] Make sure tranformers version is 4.34.1, which is required by optimum[habana], and caused your error. In addition, inference with gaudi does not require IPEX |
Hi Lin, Thanks a lot after doing pip install optimum[habana] neural-chat model along with query_openai_sdk is working fine. I will test other models and will post the status |
I tested falcon-7b,mpt-7b,mistral-7b and neural-chat model ,I could run inference server of these models , Iam getting response for neural-chat and mistral-7b model with query_openai_sdk.py , but its waiting for resposne for mpt-7b and flacon model |
Hi @dkiran1 , |
* support more models in finetune * modify dockerfile * fix bug caused by accelerate upgrade * add llama2 * fix error * fix error * test * fix error * update
I ran the infernce of Falcon-7b and neural-chat-7b-v3-1 models on ray server with below command
python inference/serve.py --config_file inference/models/neural-chat-7b-v3-1.yaml --simple
python inference/serve.py --config_file inference/models/falcon-7b.yaml --simple
I could run the test infernce with python examples/inference/api_server_simple/query_single.py --model_endpoint http://172.17.0.2:8000/neural-chat-7b-v3-1
I exported export OPENAI_API_BASE=http://172.17.0.2:8000/falcon-7b
export OPENAI_API_KEY=
and tried to run python examples/inference/api_server_openai/query_openai_sdk.py, Iam getting belwo error
File "/root/llm-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 45, in
models = openai.Model.list()
File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/abstract/listable_api_resource.py", line 60, in list
response, _, api_key = requestor.request(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 757, in _interpret_response_line
raise error.APIError(
openai.error.APIError: HTTP code 500 from API (Unexpected error, traceback: ray::ServeReplica:falcon-7b:PredictorDeployment.handle_request_streaming() (pid=15684, ip=172.17.0.2)
File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/utils.py", line 165, in wrap_to_ray_error
raise exception
File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 994, in call_user_method
await self._call_func_or_gen(
File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 750, in _call_func_or_gen
result = await result
File "/root/llm-ray/inference/predictor_deployment.py", line 84, in call
json_request: Dict[str, Any] = await http_request.json()
File "/usr/local/lib/python3.10/dist-packages/starlette/requests.py", line 244, in json
self._json = json.loads(body)
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).)
I installed open-api 0.28.0 version, Please let me know what could be the isssue, Iam I missing any installations?
The text was updated successfully, but these errors were encountered: