Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

调用/v1/chat/completions接口,用jmeter10并发进行压测,压测1分钟xinference就挂了,xinference==0.11.3 #1811

Open
WangxuP opened this issue Jul 8, 2024 · 17 comments
Assignees
Milestone

Comments

@WangxuP
Copy link

WangxuP commented Jul 8, 2024

Describe the bug

我们在压测xinference时候发现,V100 2卡,调用/v1/chat/completions接口,stream参数是True,模型用qwen-14b-chat,用jmeter10并发进行压测,压测1分钟xinference就挂了,如果stream是False,是可以的.

报错日志

2024-07-08 11:34:32,621 xinference.api.restful_api 8 INFO     Disconnected from client (via refresh/close) Address(host='192.168.32.13', port=30733) during chat.
INFO 07-08 11:34:32 async_llm_engine.py:158] Aborted request fcdb2432-3cda-11ef-af98-7e88271d2e8e.
2024-07-08 11:34:32,630 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,633 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,635 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,639 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,641 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,643 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state

requtirements.txt

accelerate==0.30.1
addict==2.4.0
aiobotocore==2.7.0
aiofiles==23.2.1
aiohttp==3.9.5
aioitertools==0.11.0
aioprometheus==23.12.0
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.3
altair==5.3.0
annotated-types==0.7.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
async-timeout==4.0.3
attrs==23.2.0
azure-core==1.30.1
azure-storage-blob==12.20.0
bcrypt==4.1.3
botocore==1.31.64
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.3
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.2.1
crcmod==1.7
cryptography==42.0.7
cycler==0.12.1
dataclasses-json==0.6.6
datasets==2.18.0
diffusers==0.28.2
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
ecdsa==0.19.0
einops==0.8.0
environs==9.5.0
exceptiongroup==1.2.1
fastapi==0.110.3
ffmpy==0.3.2
filelock==3.14.0
flatbuffers==24.3.25
fonttools==4.53.0
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.5.4
gradio==4.26.0
gradio_client==0.15.1
greenlet==3.0.3
grpcio==1.60.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.2
humanfriendly==10.0
idna==3.7
importlib_metadata==7.1.0
importlib_resources==6.4.0
interegular==0.3.3
isodate==0.6.1
jieba==0.42.1
Jinja2==3.1.4
jmespath==0.10.0
joblib==1.4.2
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langchain==0.1.0
langchain-community==0.0.20
langchain-core==0.1.23
langsmith==0.0.87
lark==1.1.9
llvmlite==0.42.0
lm-format-enforcer==0.10.1
lxml==5.2.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.3
matplotlib==3.9.0
mdurl==0.1.2
minio==7.2.7
modelscope==1.14.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
onnxruntime==1.15.0
openai==1.30.5
opencv-contrib-python==4.9.0.80
orjson==3.10.3
oss2==2.18.5
outlines==0.0.34
packaging==23.2
pandas==2.2.2
passlib==1.7.4
pdfminer.six==20231228
pdfplumber==0.11.0
peft==0.11.1
pillow==10.3.0
platformdirs==4.2.2
prometheus-fastapi-instrumentator==7.0.0
prometheus_client==0.20.0
protobuf==5.27.0
psutil==5.9.8
py-cpuinfo==9.0.0
pyarrow==16.1.0
pyarrow-hotfix==0.6
pyasn1==0.6.0
pycparser==2.22
pycryptodome==3.20.0
pydantic==2.7.2
pydantic_core==2.18.3
pydub==0.25.1
Pygments==2.18.0
pymilvus==2.4.0
pynvml==11.5.0
pyparsing==3.1.2
PyPDF2==3.0.1
pypdfium2==4.30.0
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-dotenv==1.0.1
python-jose==3.3.0
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.23.0
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rich==13.7.1
rpds-py==0.18.1
rsa==4.9
ruff==0.4.7
s3fs==2023.10.0
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.13.1
semantic-version==2.10.0
sentence-transformers==3.0.0
sentencepiece==0.2.0
shellingham==1.5.4
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.30
sse-starlette==2.1.0
starlette==0.37.2
sympy==1.12.1
tabulate==0.9.0
tblib==3.0.0
tenacity==8.3.0
threadpoolctl==3.5.0
tiktoken==0.6.0
timm==1.0.3
tokenizers==0.19.1
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch==2.3.0
torchvision==0.18.0
tqdm==4.66.4
transformers==4.41.0
triton==2.3.0
typer==0.11.1
typing-inspect==0.9.0
typing_extensions==4.12.1
tzdata==2024.1
ujson==5.10.0
urllib3==2.0.7
uvicorn==0.30.1
uvloop==0.19.0
vllm==0.4.3
vllm-flash-attn==2.5.8.post2
vllm_nccl_cu12==2.18.1.0.3.0
watchfiles==0.22.0
websockets==11.0.3
wrapt==1.16.0
xformers==0.0.26.post1
xinference==0.11.3
xoscar==0.3.0
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.19.1

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the gpu label Jul 8, 2024
@XprobeBot XprobeBot added this to the v0.13.1 milestone Jul 8, 2024
@WangxuP
Copy link
Author

WangxuP commented Jul 8, 2024

我在使用vllm的 /v1/chat/completions 接口的时候,是可以正常使用的,而且速度是要比xinference快。

@yunfwe
Copy link

yunfwe commented Jul 12, 2024

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87
使用 pip install xoscar==0.3.2 升级后再压测试试

@XprobeBot XprobeBot modified the milestones: v0.13.1, v0.13.2 Jul 12, 2024
@Dawnfz-Lenfeng
Copy link
Contributor

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题,报错信息基本一致,好像是 stream == True 之后就会触发这个BUG

@yunfwe
Copy link

yunfwe commented Jul 18, 2024

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题,报错信息基本一致,好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗?粘贴下报错日志看看

@Dawnfz-Lenfeng
Copy link
Contributor

Dawnfz-Lenfeng commented Jul 19, 2024

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题,报错信息基本一致,好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗?粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

@yunfwe
Copy link

yunfwe commented Jul 26, 2024

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题,报错信息基本一致,好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗?粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

换成vllm引擎呢,之前InvalidStateError: invalid state之后,会导致整个接口挂掉,无法继续响应任何请求,即使推理引擎还是正常的。

@XprobeBot XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024
Copy link

github-actions bot commented Aug 6, 2024

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Aug 6, 2024
@qinxuye qinxuye self-assigned this Aug 7, 2024
@vierachen
Copy link

这个问题是xoscar库的问题,目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题,报错信息基本一致,好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗?粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

换成vllm引擎呢,之前InvalidStateError: invalid state之后,会导致整个接口挂掉,无法继续响应任何请求,即使推理引擎还是正常的。

用vllm引擎,出现同样问题。请问有什么解决方案吗?

@linqingxu
Copy link

+1

@qinxuye
Copy link
Contributor

qinxuye commented Sep 25, 2024

升级到最新版。

@linqingxu
Copy link

用了最新版也还是一样。压测的时候并发7开始就会有请求报错,并发16全部失败

@qinxuye
Copy link
Contributor

qinxuye commented Sep 25, 2024

报错日志贴一下。

@linqingxu
Copy link

2024-09-25 02:58:02,170 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38154) during chat.
2024-09-25 02:58:02,181 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38166) during chat.
2024-09-25 02:58:02,189 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38168) during chat.
2024-09-25 02:58:02,202 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38206) during chat.
2024-09-25 02:58:02,216 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38180) during chat.
2024-09-25 02:58:02,221 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38194) during chat.
2024-09-25 02:58:02,224 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38200) during chat.
2024-09-25 02:58:02,228 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38176) during chat.
2024-09-25 02:58:02,233 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38208) during chat.
2024-09-25 02:58:02,240 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38210) during chat.
2024-09-25 02:58:02,248 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38150) during chat.
2024-09-25 02:58:02,255 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38244) during chat.
2024-09-25 02:58:02,261 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38234) during chat.
2024-09-25 02:58:02,274 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38226) during chat.
2024-09-25 02:58:02,278 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38212) during chat.
2024-09-25 02:58:02,282 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38138) during chat.

@linqingxu
Copy link

Error for prompt with length 5520: Traceback (most recent call last):
File "/opt/inference/benchmark/benchmark_runner.py", line 151, in send_request
data = json.loads(chunk)
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@linqingxu
Copy link

请问有解法吗

报错日志贴一下。

@qinxuye
Copy link
Contributor

qinxuye commented Sep 27, 2024

用我们的 benchmark 也能复现?

@linqingxu
Copy link

用我们的 benchmark 也能复现?

就是用的xinference提供的benchmark/benchmark_serving.py 0.15.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants