Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Unexpected segmentation fault encountered in DataLoader workers. #3148

Open
huaibovip opened this issue May 19, 2024 · 2 comments
Open

Comments

@huaibovip
Copy link

huaibovip commented May 19, 2024

ERROR: Unexpected segmentation fault encountered in DataLoader workers.

Please NOTE: device: 3, GPU Compute Capability: 8.9, Driver API Version: 12.2, Runtime API Version: 11.7
W0519 00:39:16.069731  4952 gpu_resources.cc:164] device: 3, cuDNN Version: 8.4.
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_max_inplace_grad_add', current_value=8, default_value=0)
FLAGS(name='FLAGS_selected_gpus', current_value='3', default_value='')
FLAGS(name='FLAGS_cudnn_batchnorm_spatial_persistent', current_value=True, default_value=False)
=======================================================================
I0519 00:39:18.472234  4952 tcp_utils.cc:107] Retry to connect to 127.0.0.1:50832 while the server is not yet listening.
I0519 00:39:21.472563  4952 tcp_utils.cc:130] Successfully connected to 127.0.0.1:50832
I0519 00:39:21.480532  4952 process_group_nccl.cc:129] ProcessGroupNCCL pg_timeout_ 1800000
ERROR: Unexpected segmentation fault encountered in DataLoader workers.

PaddlePaddle 2.6.1
Centos 7
Python 3.10.14
cuda-version 11.7 h67201e3_3 conda-forge
cudatoolkit 11.7.1 h4bc3d14_13 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge

@cuicheng01
Copy link
Collaborator

您好,您是多卡还是单卡训练的呢?paddle的版本是?

@huaibovip
Copy link
Author

您好,您是多卡还是单卡训练的呢?paddle的版本是?

PaddlePaddle 2.6.1
Centos 7
Python 3.10.14
cuda-version 11.7 h67201e3_3 conda-forge
cudatoolkit 11.7.1 h4bc3d14_13 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge
多卡训练,降级paddlepaddle版到2.5.2后解决该问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants