You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
../aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0 failed.
Exception in thread Thread-2 (generate_text):
Traceback (most recent call last):
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/work/CogAgent/app/openai_demo.py", line 348, in generate_text
model.generate(**model_inputs, **gen_kwargs)
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/transformers/generation/utils.py", line 2255, in generate
result = self._sample(
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/transformers/generation/utils.py", line 3300, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:21:04.0 Off | 0 |
| N/A 34C P0 37W / 250W | 2508MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
System Info / 系統信息
在执行app里的 python openai_demo.py --model_path THUDM/cogagent-9b-20241220 --host 0.0.0.0 --port 7870
出现以下错误
../aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion
probability tensor contains either
inf,
nanor element < 0
failed.Exception in thread Thread-2 (generate_text):
Traceback (most recent call last):
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/work/CogAgent/app/openai_demo.py", line 348, in generate_text
model.generate(**model_inputs, **gen_kwargs)
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/transformers/generation/utils.py", line 2255, in generate
result = self._sample(
File "/data1/liangzengyan/programs/miniconda3/envs/cognew/lib/python3.10/site-packages/transformers/generation/utils.py", line 3300, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:21:04.0 Off | 0 |
| N/A 34C P0 37W / 250W | 2508MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Name: transformers
Version: 4.48.1
Name: torch
Version: 2.5.1
Name: torchvision
Version: 0.20.1
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
python openai_demo.py --model_path THUDM/cogagent-9b-20241220 --host 0.0.0.0 --port 7870
Expected behavior / 期待表现
期望了解问题原因
The text was updated successfully, but these errors were encountered: