调用/v1/chat/completions接口,用jmeter10并发进行压测，压测1分钟xinference就挂了，xinference==0.11.3 #1811

WangxuP · 2024-07-08T07:35:20Z

Describe the bug

我们在压测xinference时候发现，V100 2卡，调用/v1/chat/completions接口，stream参数是True，模型用qwen-14b-chat，用jmeter10并发进行压测，压测1分钟xinference就挂了，如果stream是False,是可以的.

报错日志

2024-07-08 11:34:32,621 xinference.api.restful_api 8 INFO     Disconnected from client (via refresh/close) Address(host='192.168.32.13', port=30733) during chat.
INFO 07-08 11:34:32 async_llm_engine.py:158] Aborted request fcdb2432-3cda-11ef-af98-7e88271d2e8e.
2024-07-08 11:34:32,630 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,633 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,635 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,639 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,641 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state
2024-07-08 11:34:32,643 xinference.api.restful_api 8 ERROR    Chat completion stream got an error: invalid state
Traceback (most recent call last):
  File "/app/xinference/xinference/api/restful_api.py", line 1554, in stream_results
    async for item in iterator:
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/opt/xinference/xinference_venv/lib/python3.10/site-packages/xoscar/backends/core.py", line 88, in _listen
    future.set_result(message)
asyncio.exceptions.InvalidStateError: invalid state

requtirements.txt

accelerate==0.30.1
addict==2.4.0
aiobotocore==2.7.0
aiofiles==23.2.1
aiohttp==3.9.5
aioitertools==0.11.0
aioprometheus==23.12.0
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.3
altair==5.3.0
annotated-types==0.7.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
async-timeout==4.0.3
attrs==23.2.0
azure-core==1.30.1
azure-storage-blob==12.20.0
bcrypt==4.1.3
botocore==1.31.64
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.3
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.2.1
crcmod==1.7
cryptography==42.0.7
cycler==0.12.1
dataclasses-json==0.6.6
datasets==2.18.0
diffusers==0.28.2
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
ecdsa==0.19.0
einops==0.8.0
environs==9.5.0
exceptiongroup==1.2.1
fastapi==0.110.3
ffmpy==0.3.2
filelock==3.14.0
flatbuffers==24.3.25
fonttools==4.53.0
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.5.4
gradio==4.26.0
gradio_client==0.15.1
greenlet==3.0.3
grpcio==1.60.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.2
humanfriendly==10.0
idna==3.7
importlib_metadata==7.1.0
importlib_resources==6.4.0
interegular==0.3.3
isodate==0.6.1
jieba==0.42.1
Jinja2==3.1.4
jmespath==0.10.0
joblib==1.4.2
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langchain==0.1.0
langchain-community==0.0.20
langchain-core==0.1.23
langsmith==0.0.87
lark==1.1.9
llvmlite==0.42.0
lm-format-enforcer==0.10.1
lxml==5.2.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.3
matplotlib==3.9.0
mdurl==0.1.2
minio==7.2.7
modelscope==1.14.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
onnxruntime==1.15.0
openai==1.30.5
opencv-contrib-python==4.9.0.80
orjson==3.10.3
oss2==2.18.5
outlines==0.0.34
packaging==23.2
pandas==2.2.2
passlib==1.7.4
pdfminer.six==20231228
pdfplumber==0.11.0
peft==0.11.1
pillow==10.3.0
platformdirs==4.2.2
prometheus-fastapi-instrumentator==7.0.0
prometheus_client==0.20.0
protobuf==5.27.0
psutil==5.9.8
py-cpuinfo==9.0.0
pyarrow==16.1.0
pyarrow-hotfix==0.6
pyasn1==0.6.0
pycparser==2.22
pycryptodome==3.20.0
pydantic==2.7.2
pydantic_core==2.18.3
pydub==0.25.1
Pygments==2.18.0
pymilvus==2.4.0
pynvml==11.5.0
pyparsing==3.1.2
PyPDF2==3.0.1
pypdfium2==4.30.0
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-dotenv==1.0.1
python-jose==3.3.0
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.23.0
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rich==13.7.1
rpds-py==0.18.1
rsa==4.9
ruff==0.4.7
s3fs==2023.10.0
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.13.1
semantic-version==2.10.0
sentence-transformers==3.0.0
sentencepiece==0.2.0
shellingham==1.5.4
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.30
sse-starlette==2.1.0
starlette==0.37.2
sympy==1.12.1
tabulate==0.9.0
tblib==3.0.0
tenacity==8.3.0
threadpoolctl==3.5.0
tiktoken==0.6.0
timm==1.0.3
tokenizers==0.19.1
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch==2.3.0
torchvision==0.18.0
tqdm==4.66.4
transformers==4.41.0
triton==2.3.0
typer==0.11.1
typing-inspect==0.9.0
typing_extensions==4.12.1
tzdata==2024.1
ujson==5.10.0
urllib3==2.0.7
uvicorn==0.30.1
uvloop==0.19.0
vllm==0.4.3
vllm-flash-attn==2.5.8.post2
vllm_nccl_cu12==2.18.1.0.3.0
watchfiles==0.22.0
websockets==11.0.3
wrapt==1.16.0
xformers==0.0.26.post1
xinference==0.11.3
xoscar==0.3.0
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.19.1

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

WangxuP · 2024-07-08T10:32:22Z

我在使用vllm的 /v1/chat/completions 接口的时候，是可以正常使用的，而且速度是要比xinference快。

yunfwe · 2024-07-12T02:29:38Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87
使用 pip install xoscar==0.3.2 升级后再压测试试

Dawnfz-Lenfeng · 2024-07-18T07:23:05Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题，报错信息基本一致，好像是 stream == True 之后就会触发这个BUG

yunfwe · 2024-07-18T08:46:23Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题，报错信息基本一致，好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗？粘贴下报错日志看看

Dawnfz-Lenfeng · 2024-07-19T07:05:23Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题，报错信息基本一致，好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗？粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

yunfwe · 2024-07-26T06:31:21Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题，报错信息基本一致，好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗？粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

换成vllm引擎呢，之前InvalidStateError: invalid state之后，会导致整个接口挂掉，无法继续响应任何请求，即使推理引擎还是正常的。

github-actions · 2024-08-06T06:21:21Z

This issue is stale because it has been open for 7 days with no activity.

vierachen · 2024-08-20T02:01:57Z

这个问题是xoscar库的问题，目前已经合并到0.3.2版本 xorbitsai/xoscar#87 使用 pip install xoscar==0.3.2 升级后再压测试试

升级了好像还是有问题，报错信息基本一致，好像是 stream == True 之后就会触发这个BUG

升级后重启xinference了吗？粘贴下报错日志看看

2024-07-19 15:01:38,014 transformers.models.llama.modeling_llama 63561 WARNING  We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
2024-07-19 15:01:49,005 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 3.22 tokens/s.
2024-07-19 15:01:50,378 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 20.42 tokens/s.
2024-07-19 15:01:50,871 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 15.04 tokens/s.
2024-07-19 15:02:03,688 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 25.22 tokens/s.
2024-07-19 15:02:18,889 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36816) during chat.
2024-07-19 15:02:24,799 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 0.85 tokens/s.
2024-07-19 15:02:29,739 xinference.model.llm.pytorch.utils 63561 INFO     Average generation speed: 1.02 tokens/s.
2024-07-19 15:02:33,928 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36870) during chat.
2024-07-19 15:02:33,939 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36872) during chat.
2024-07-19 15:02:33,951 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36874) during chat.
2024-07-19 15:02:33,955 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36876) during chat.
2024-07-19 15:02:33,963 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36864) during chat.
2024-07-19 15:02:33,978 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36884) during chat.
2024-07-19 15:02:33,983 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36888) during chat.
2024-07-19 15:02:33,987 xinference.api.restful_api 63191 INFO     Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=36886) during chat.

版本是

xinference                              0.13.1
xoscar                                  0.3.2

换成vllm引擎呢，之前InvalidStateError: invalid state之后，会导致整个接口挂掉，无法继续响应任何请求，即使推理引擎还是正常的。

用vllm引擎，出现同样问题。请问有什么解决方案吗？

linqingxu · 2024-09-25T02:12:06Z

+1

qinxuye · 2024-09-25T08:12:05Z

升级到最新版。

linqingxu · 2024-09-25T09:26:17Z

用了最新版也还是一样。压测的时候并发7开始就会有请求报错，并发16全部失败

qinxuye · 2024-09-25T09:56:05Z

报错日志贴一下。

linqingxu · 2024-09-25T09:58:24Z

2024-09-25 02:58:02,170 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38154) during chat.
2024-09-25 02:58:02,181 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38166) during chat.
2024-09-25 02:58:02,189 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38168) during chat.
2024-09-25 02:58:02,202 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38206) during chat.
2024-09-25 02:58:02,216 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38180) during chat.
2024-09-25 02:58:02,221 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38194) during chat.
2024-09-25 02:58:02,224 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38200) during chat.
2024-09-25 02:58:02,228 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38176) during chat.
2024-09-25 02:58:02,233 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38208) during chat.
2024-09-25 02:58:02,240 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38210) during chat.
2024-09-25 02:58:02,248 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38150) during chat.
2024-09-25 02:58:02,255 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38244) during chat.
2024-09-25 02:58:02,261 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38234) during chat.
2024-09-25 02:58:02,274 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38226) during chat.
2024-09-25 02:58:02,278 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38212) during chat.
2024-09-25 02:58:02,282 xinference.api.restful_api 1 INFO Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=38138) during chat.

linqingxu · 2024-09-25T09:58:31Z

Error for prompt with length 5520: Traceback (most recent call last):
File "/opt/inference/benchmark/benchmark_runner.py", line 151, in send_request
data = json.loads(chunk)
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

linqingxu · 2024-09-27T04:29:17Z

请问有解法吗

报错日志贴一下。

qinxuye · 2024-09-27T07:14:43Z

用我们的 benchmark 也能复现？

linqingxu · 2024-09-27T07:16:37Z

用我们的 benchmark 也能复现？

就是用的xinference提供的benchmark/benchmark_serving.py 0.15.2

XprobeBot added the gpu label Jul 8, 2024

XprobeBot added this to the v0.13.1 milestone Jul 8, 2024

XprobeBot modified the milestones: v0.13.1, v0.13.2 Jul 12, 2024

XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024

github-actions bot added the stale label Aug 6, 2024

qinxuye self-assigned this Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

调用/v1/chat/completions接口,用jmeter10并发进行压测，压测1分钟xinference就挂了，xinference==0.11.3 #1811

调用/v1/chat/completions接口,用jmeter10并发进行压测，压测1分钟xinference就挂了，xinference==0.11.3 #1811

WangxuP commented Jul 8, 2024

WangxuP commented Jul 8, 2024

yunfwe commented Jul 12, 2024 •

edited

Loading

Dawnfz-Lenfeng commented Jul 18, 2024

yunfwe commented Jul 18, 2024

Dawnfz-Lenfeng commented Jul 19, 2024 •

edited

Loading

yunfwe commented Jul 26, 2024

github-actions bot commented Aug 6, 2024

vierachen commented Aug 20, 2024

linqingxu commented Sep 25, 2024

qinxuye commented Sep 25, 2024

linqingxu commented Sep 25, 2024

qinxuye commented Sep 25, 2024

linqingxu commented Sep 25, 2024

linqingxu commented Sep 25, 2024

linqingxu commented Sep 27, 2024

qinxuye commented Sep 27, 2024

linqingxu commented Sep 27, 2024

调用/v1/chat/completions接口,用jmeter10并发进行压测，压测1分钟xinference就挂了，xinference==0.11.3 #1811

调用/v1/chat/completions接口,用jmeter10并发进行压测，压测1分钟xinference就挂了，xinference==0.11.3 #1811

Comments

WangxuP commented Jul 8, 2024

Describe the bug

报错日志

requtirements.txt

Expected behavior

Additional context

WangxuP commented Jul 8, 2024

yunfwe commented Jul 12, 2024 • edited Loading

Dawnfz-Lenfeng commented Jul 18, 2024

yunfwe commented Jul 18, 2024

Dawnfz-Lenfeng commented Jul 19, 2024 • edited Loading

yunfwe commented Jul 26, 2024

github-actions bot commented Aug 6, 2024

vierachen commented Aug 20, 2024

linqingxu commented Sep 25, 2024

qinxuye commented Sep 25, 2024

linqingxu commented Sep 25, 2024

qinxuye commented Sep 25, 2024

linqingxu commented Sep 25, 2024

linqingxu commented Sep 25, 2024

linqingxu commented Sep 27, 2024

qinxuye commented Sep 27, 2024

linqingxu commented Sep 27, 2024

yunfwe commented Jul 12, 2024 •

edited

Loading

Dawnfz-Lenfeng commented Jul 19, 2024 •

edited

Loading