Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker should warn (or pause) when suspected low memory conditions are present #332

Open
tazlin opened this issue Oct 28, 2024 · 1 comment
Labels
bug Something isn't working enhancement New feature or request

Comments

@tazlin
Copy link
Member

tazlin commented Oct 28, 2024

From the perspective of a user, many RAM OOM or VRAM OOM crashes are cryptic and the error seen in the main terminal window is non-specific and confusing. One or more of the following should occur instead:

  • The worker when crashing should poll VRAM/RAM conditions and when below a threshold, note to the user that memory/video memory may be a factor
  • When impossible memory conditions exist (e.g., 100mb VRAM free where flux is about to be loaded) the worker should warn that its very likely going to fail in its attempt to load it.
    • Perhaps job popping could be paused as well until the model is shown to have loaded successfully.
  • The worker should warn when VRAM -> RAM rollover (an option in the nvidia drivers for windows only at time of writing) kicks in
    • There is a possibility to detect this directly
    • More generally, extremely slow it/s (associated with this condition) should probably be explicitly noted as occurring, and advice given to free up VRAM/RAM.
@tazlin tazlin added bug Something isn't working enhancement New feature or request labels Oct 28, 2024
@tazlin
Copy link
Member Author

tazlin commented Nov 3, 2024

The solution to this may have some overlap with the solution to #337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant