Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11 #5690

Open
shag1802 opened this issue Jun 21, 2024 · 2 comments
Open

[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11 #5690

shag1802 opened this issue Jun 21, 2024 · 2 comments
Assignees
Labels
bug Something isn't working training

Comments

@shag1802
Copy link

I am trying to finetune a LLM by running a finetune script (https://github.com/PKU-YuanGroup/Video-LLaVA/blob/main/scripts/v1_5/finetune.sh). I am using zero2_offload.json. After running the script the script automatically terminates by giving return code = -11

This is the finetune_script I am using
image

This is the error I am getting
image

ds_report output
image

**System info **

  • OS: [Ubuntu 22.04.4 LTS]
  • GPU count and types [x4 Tesla T4]
  • Python version 3.10.14

Additional context
I am using a AWS Cloud , I have also checked issue #4002 , but the error still persists .

I am getting this output when I use df -h
image

Please help me resolve this error.

@shag1802 shag1802 added bug Something isn't working training labels Jun 21, 2024
@loadams
Copy link
Contributor

loadams commented Jul 22, 2024

@shag1802 - can you share your shm size if using docker at all?

@TengfeiSong000
Copy link

I got the same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

4 participants