You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
self.run()
File "/home/lanyun/.conda/envs/ds/lib/python3.10/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 650, in __call__
self.generate()
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 31, in wrapper
return func(self, *args, **kwargs)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 116, in generate
next_tokens, done_tokens = self._process_logits(
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/utils.py", line 18, in wrapper
result = func(self, *args, **kwargs)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 190, in _process_logits
next_tokens = self.sampler(next_token_logits,
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 69, in run_batch_sampler
next_tokens = run_batch_processing(input_logits, requests, sampler_fns)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/postprocess.py", line 32, in run_batch_processing
output_list.append(process_fn(filtered_input))
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/mii/batching/generation/samplers.py", line 45, in __call__
sampler = Categorical(logits=logits)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/categorical.py", line 70, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/home/lanyun/.conda/envs/ds/lib/python3.10/site-packages/torch/distributions/distribution.py", line 68, in __init__
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 151643)) of distribution Categorical(logits: torch.Size([1, 151643])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0')
Describe the bug
A clear and concise description of what the bug is.
run start cmd :
crash
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
Please run
ds_report
to give us details about your setup.Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
OS: Linux localhost 5.15.0-97-generic MPI 3.x support via mpi4py #107-Ubuntu SMP Wed Feb 7 13:26:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
GPU count and types [machines with x8 3090 each]
(if applicable) what DeepSpeed-MII version are you using
(if applicable) Hugging Face Transformers/Accelerate/etc. versions
Python version Python 3.10.0
Any other relevant info about your setup
Docker context
Are you using a specific docker image that you can share?
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: