Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh #36

Open
shiro-kur opened this issue Aug 31, 2023 · 4 comments
Open

Comments

@shiro-kur
Copy link

What is the problem and the solution??

The provided data_path is /home/shiro/DNABERT_2/finetune
2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64:
2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']

  • This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Using cuda_amp half precision backend
    ***** Running training *****
    Num examples = 36,496
    Num Epochs = 5
    Instantaneous batch size per device = 8
    Total train batch size (w. parallel, distributed & accumulation) = 32
    Gradient Accumulation steps = 4
    Total optimization steps = 5,700
    Number of trainable parameters = 117,070,851
    0%| | 0/5700 [00:00<?, ?it/s]Traceback (most recent call last):
    File "", line 21, in _bwd_kernel
    KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_modified.py", line 332, in
train()
File "train_modified.py", line 314, in train
trainer.train()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step
self.scaler.scale(loss).backward()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward
_flash_attn_backward(do,
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward
_bwd_kernel[grid]( # type: ignore
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
0%| | 0/5700 [00:00<?, ?it/s

@shiro-kur
Copy link
Author

Here are my installed packages.

(dnabert2) shiro@GTUNE:~/DNABERT_2/finetune$ pip list
Package Version


absl-py 1.0.0
accelerate 0.22.0
anndata 0.7.6
antlr4-python3-runtime 4.9.3
appdirs 1.4.4
astor 0.8.1
astunparse 1.6.3
autograd 1.4
autograd-gamma 0.5.0
biopython 1.79
biothings-client 0.2.6
bleach 5.0.1
Brotli 1.0.9
cachetools 5.0.0
certifi 2023.7.22
charset-normalizer 2.0.12
click 8.1.2
cmake 3.27.2
coloredlogs 15.0.1
cycler 0.11.0
dash 2.0.0
dash-core-components 2.0.0
dash-dangerously-set-inner-html 0.0.2
dash-html-components 2.0.0
dash-table 5.0.0
docutils 0.19
einops 0.6.1
filelock 3.12.3
Flask 2.1.1
Flask-Compress 1.11
fonttools 4.32.0
formulaic 0.2.4
fsspec 2023.6.0
future 0.18.2
gast 0.3.3
google-auth 2.6.5
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.44.0
h5py 2.10.0
huggingface-hub 0.16.4
humanfriendly 10.0
idna 3.4
importlib-metadata 4.11.3
interface-meta 1.3.0
itsdangerous 2.1.2
Jinja2 3.1.1
joblib 1.1.0
Keras-Preprocessing 1.1.2
kiwisolver 1.4.2
lifelines 0.26.4
lit 17.0.0rc3
llvmlite 0.36.0
Markdown 3.3.6
markdown-it-py 2.1.0
MarkupSafe 2.1.1
matplotlib 3.5.1
mdurl 0.1.2
mhcflurry 2.0.5
mhcgnomes 1.7.0
mygene 3.2.2
natsort 8.1.0
np-utils 0.6.0
numba 0.53.0
numpy 1.18.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
omegaconf 2.3.0
opt-einsum 3.3.0
packaging 21.3
pandas 1.3.4
patsy 0.5.2
peft 0.3.0
Pillow 9.1.0
pip 23.2.1
pkginfo 1.9.6
plotly 5.4.0
protobuf 3.20.0
psutil 5.9.5
Pygments 2.14.0
pynndescent 0.5.6
pyparsing 3.0.8
python-dateutil 2.8.2
pytz 2022.1
PyYAML 6.0.1
readme-renderer 37.3
regex 2023.8.8
requests 2.26.0
requests-oauthlib 1.3.1
requests-toolbelt 0.10.1
rfc3986 2.0.0
rich 13.2.0
rsa 4.8
safetensors 0.3.3
scikit-learn 1.0.2
scipy 1.4.1
seaborn 0.11.2
serializable 0.2.1
setuptools 68.0.0
six 1.16.0
SNAF 0.5.2
statsmodels 0.13.1
tenacity 8.0.1
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.6.2.2
tensorflow 2.3.0
tensorflow-estimator 2.3.0
termcolor 1.1.0
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 1.13.0
torchaudio 0.13.0
torchvision 0.14.0
tqdm 4.62.3
transformers 4.29.2
triton 2.0.0.dev20221202
twine 4.0.2
typechecks 0.1.0
typing_extensions 4.7.1
umap-learn 0.5.2
urllib3 1.26.14
webencodings 0.5.1
Werkzeug 2.0.2
wheel 0.38.4
wrapt 1.14.0
xlrd 1.2.0
xmltodict 0.12.0
xmltramp2 3.1.1

@shiro-kur
Copy link
Author

Here is my GPU implementation.

(dnabert2) shiro@GTUNE:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU (UUID: GPU-776edd0d-aef5-ab3a-3750-32bfa854fecf)

(dnabert2) shiro@GTUNE:~$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

(dnabert2) shiro@GTUNE:~$ dpkg -l | grep cudnn
ii cudnn-local-repo-ubuntu2004-8.9.4.25 1.0-1 amd64 cudnn-local repository configuration files
ii libcudnn8 8.9.4.25-1+cuda11.8 amd64 cuDNN runtime libraries
ii libcudnn8-dev 8.9.4.25-1+cuda11.8 amd64 cuDNN development libraries and headers

@jiaojiaoguan
Copy link

I have the same error KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

Do you solve it?Thanks!

@shiro-kur
Copy link
Author

shiro-kur commented Feb 1, 2024

I just gave up..... Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants