Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On shutdown, a single job can never finish #396

Open
tazlin opened this issue Jan 12, 2025 · 0 comments
Open

On shutdown, a single job can never finish #396

tazlin opened this issue Jan 12, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@tazlin
Copy link
Member

tazlin commented Jan 12, 2025

I have observed that it is possible, following a SIGINT (control+c) shutdown initiation that sometimes a single job can never finish, with the following observations:

  • The job was stuck for a long (>1hr) time previous to the shutdown
  • The model for that job was stuck in PRELOADED_MODEL for that entire time
  • Even once it was the final job left, and that process the last process, it still did not start

The worker only will end if force killed or once deadlock protection kicks in.

I can attest that this does not happen regularly, but I have observed it at least 3 (of maybe 15) shutdowns.

@tazlin tazlin added the bug Something isn't working label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant