Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] inference ValueError #5685

Open
zxrneu opened this issue Jun 19, 2024 · 1 comment
Open

[BUG] inference ValueError #5685

zxrneu opened this issue Jun 19, 2024 · 1 comment
Assignees
Labels
bug Something isn't working inference

Comments

@zxrneu
Copy link

zxrneu commented Jun 19, 2024

Describe the bug
A clear and concise description of what the bug is.

run start cmd :

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m mii.entrypoints.openai_api_server  \
    --mode="./v0.7_qwen" --port=18888 --max-length=4096 --tensor-parallel=8 --replica-num=1

crash

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 650, in __call__
    self.generate()
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 31, in wrapper
    return func(self, *args, **kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 116, in generate
    next_tokens, done_tokens = self._process_logits(
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 18, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 190, in _process_logits
    next_tokens = self.sampler(next_token_logits,
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 69, in run_batch_sampler
    next_tokens = run_batch_processing(input_logits, requests, sampler_fns)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 32, in run_batch_processing
    output_list.append(process_fn(filtered_input))
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/generation/samplers.py", line 45, in __call__
    sampler = Categorical(logits=logits)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/categorical.py", line 70, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 151643)) of distribution Categorical(logits: torch.Size([1, 151643])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0')

To Reproduce
Steps to reproduce the behavior:

  1. Simple inference script to reproduce
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m mii.entrypoints.openai_api_server  \
    --mode="./v0.7_qwen" --port=18888 --max-length=4096 --tensor-parallel=8 --replica-num=1
  1. What packages are required and their versions
python3.10

pip list
Package                  Version
------------------------ -------------------
aniso8601                9.0.1
annotated-types          0.7.0
anyio                    4.4.0
asyncio                  3.4.3
blinker                  1.8.2
certifi                  2024.6.2
charset-normalizer       3.3.2
click                    8.1.7
cmake                    3.29.5.1
cuda-python              12.5.0
deepspeed                0.14.3
deepspeed-kernels        0.0.1.dev1698255861
deepspeed-mii            0.2.3
dnspython                2.6.1
email_validator          2.1.2
exceptiongroup           1.2.1
fastapi                  0.111.0
fastapi-cli              0.0.4
fastchat                 0.1.0
filelock                 3.15.1
Flask                    3.0.3
Flask-RESTful            0.3.10
fsspec                   2024.6.0
grpcio                   1.64.1
grpcio-tools             1.64.1
h11                      0.14.0
hjson                    3.1.0
httpcore                 1.0.5
httptools                0.6.1
httpx                    0.27.0
huggingface-hub          0.23.4
idna                     3.7
itsdangerous             2.2.0
Jinja2                   3.1.4
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
networkx                 3.3
ninja                    1.11.1.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-cutlass           3.5.0.0
nvidia-ml-py             12.555.43
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
orjson                   3.10.5
packaging                24.1
pillow                   10.3.0
pip                      24.0
protobuf                 5.27.1
psutil                   5.9.8
py-cpuinfo               9.0.0
pydantic                 1.10.11
pydantic_core            2.18.4
pydot                    2.0.0
Pygments                 2.18.0
pyparsing                3.1.2
python-dotenv            1.0.1
python-multipart         0.0.9
pytz                     2024.1
PyYAML                   6.0.1
pyzmq                    26.0.3
regex                    2024.5.15
requests                 2.32.3
rich                     13.7.1
safetensors              0.4.3
scipy                    1.13.1
setuptools               69.5.1
shellingham              1.5.4
shortuuid                1.0.13
six                      1.16.0
sniffio                  1.3.1
starlette                0.37.2
sympy                    1.12.1
tokenizers               0.19.1
torch                    2.3.1
tqdm                     4.66.4
transformers             4.41.2
treelib                  1.7.0
triton                   2.3.1
typer                    0.12.3
typing_extensions        4.12.2
ujson                    5.10.0
urllib3                  2.2.2
uvicorn                  0.30.1
uvloop                   0.19.0
watchfiles               0.22.0
websockets               12.0
Werkzeug                 3.0.3
wheel                    0.43.0
zmq                      0.0.0
  1. How to run the script

Expected behavior
A clear and concise description of what you expected to happen.

ds_report output
Please run ds_report to give us details about your setup.

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Linux localhost 5.15.0-97-generic MPI 3.x support via mpi4py #107-Ubuntu SMP Wed Feb 7 13:26:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  • GPU count and types [machines with x8 3090 each]

  • (if applicable) what DeepSpeed-MII version are you using

  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions

  • Python version Python 3.10.0

  • Any other relevant info about your setup

Docker context
Are you using a specific docker image that you can share?

Additional context
Add any other context about the problem here.

@zxrneu zxrneu added bug Something isn't working inference labels Jun 19, 2024
@a516328765
Copy link

same question in qwen2-7b-int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

No branches or pull requests

3 participants