[BUG] inference ValueError #5685

zxrneu · 2024-06-19T06:40:38Z

Describe the bug
A clear and concise description of what the bug is.

run start cmd :

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m mii.entrypoints.openai_api_server  \
    --mode="./v0.7_qwen" --port=18888 --max-length=4096 --tensor-parallel=8 --replica-num=1

crash

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 650, in __call__
    self.generate()
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 31, in wrapper
    return func(self, *args, **kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 116, in generate
    next_tokens, done_tokens = self._process_logits(
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 18, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 190, in _process_logits
    next_tokens = self.sampler(next_token_logits,
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 69, in run_batch_sampler
    next_tokens = run_batch_processing(input_logits, requests, sampler_fns)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 32, in run_batch_processing
    output_list.append(process_fn(filtered_input))
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/generation/samplers.py", line 45, in __call__
    sampler = Categorical(logits=logits)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/categorical.py", line 70, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 151643)) of distribution Categorical(logits: torch.Size([1, 151643])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0')

To Reproduce
Steps to reproduce the behavior:

Simple inference script to reproduce

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m mii.entrypoints.openai_api_server  \
    --mode="./v0.7_qwen" --port=18888 --max-length=4096 --tensor-parallel=8 --replica-num=1

What packages are required and their versions

python3.10

pip list
Package                  Version
------------------------ -------------------
aniso8601                9.0.1
annotated-types          0.7.0
anyio                    4.4.0
asyncio                  3.4.3
blinker                  1.8.2
certifi                  2024.6.2
charset-normalizer       3.3.2
click                    8.1.7
cmake                    3.29.5.1
cuda-python              12.5.0
deepspeed                0.14.3
deepspeed-kernels        0.0.1.dev1698255861
deepspeed-mii            0.2.3
dnspython                2.6.1
email_validator          2.1.2
exceptiongroup           1.2.1
fastapi                  0.111.0
fastapi-cli              0.0.4
fastchat                 0.1.0
filelock                 3.15.1
Flask                    3.0.3
Flask-RESTful            0.3.10
fsspec                   2024.6.0
grpcio                   1.64.1
grpcio-tools             1.64.1
h11                      0.14.0
hjson                    3.1.0
httpcore                 1.0.5
httptools                0.6.1
httpx                    0.27.0
huggingface-hub          0.23.4
idna                     3.7
itsdangerous             2.2.0
Jinja2                   3.1.4
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
networkx                 3.3
ninja                    1.11.1.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-cutlass           3.5.0.0
nvidia-ml-py             12.555.43
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
orjson                   3.10.5
packaging                24.1
pillow                   10.3.0
pip                      24.0
protobuf                 5.27.1
psutil                   5.9.8
py-cpuinfo               9.0.0
pydantic                 1.10.11
pydantic_core            2.18.4
pydot                    2.0.0
Pygments                 2.18.0
pyparsing                3.1.2
python-dotenv            1.0.1
python-multipart         0.0.9
pytz                     2024.1
PyYAML                   6.0.1
pyzmq                    26.0.3
regex                    2024.5.15
requests                 2.32.3
rich                     13.7.1
safetensors              0.4.3
scipy                    1.13.1
setuptools               69.5.1
shellingham              1.5.4
shortuuid                1.0.13
six                      1.16.0
sniffio                  1.3.1
starlette                0.37.2
sympy                    1.12.1
tokenizers               0.19.1
torch                    2.3.1
tqdm                     4.66.4
transformers             4.41.2
treelib                  1.7.0
triton                   2.3.1
typer                    0.12.3
typing_extensions        4.12.2
ujson                    5.10.0
urllib3                  2.2.2
uvicorn                  0.30.1
uvloop                   0.19.0
watchfiles               0.22.0
websockets               12.0
Werkzeug                 3.0.3
wheel                    0.43.0
zmq                      0.0.0

How to run the script

Expected behavior
A clear and concise description of what you expected to happen.

ds_report output
Please run ds_report to give us details about your setup.

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

OS: Linux localhost 5.15.0-97-generic MPI 3.x support via mpi4py #107-Ubuntu SMP Wed Feb 7 13:26:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
GPU count and types [machines with x8 3090 each]
(if applicable) what DeepSpeed-MII version are you using
(if applicable) Hugging Face Transformers/Accelerate/etc. versions
Python version Python 3.10.0
Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

a516328765 · 2024-07-17T14:00:50Z

same question in qwen2-7b-int

zxrneu added bug Something isn't working inference labels Jun 19, 2024

tjruwase assigned HeyangQin Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] inference ValueError #5685

[BUG] inference ValueError #5685

zxrneu commented Jun 19, 2024

a516328765 commented Jul 17, 2024

[BUG] inference ValueError #5685

[BUG] inference ValueError #5685

Comments

zxrneu commented Jun 19, 2024

a516328765 commented Jul 17, 2024