-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runner does not work with dask adaptive scaling client #326
Comments
I think we should change the heuristic for determining how many workers we have available by checking the client configuration and scaling strategies. |
You are running into this error: Lines 403 to 404 in f28bab0
because you start with 0 cores. If you change your argument from Would this be good enough for you? |
This seems to be a workaround, but I think actually detecting the configuration would be more reliable. Unfortunately I can't quite find the correct API in distributed. |
I've asked whether there's a better way on stack overflow (AFAIR that's the preferred channel for dask): https://stackoverflow.com/q/69326568/2217463 |
Why would the maximal number of cores matter instead of the currently available cores? |
It's a chicken and egg problem otherwise: the adaptive scaling of dask won't request new workers if there are no tasks in the queue. |
Hmm, then we would already query some points that will not be calculated yet. Why not change the following Lines 832 to 833 in a81be7a
to elif with_distributed and isinstance(ex, distributed.cfexecutor.ClientExecutor):
ncores = sum(n for n in ex._client.ncores().values())
return max(1, ncores) |
Minimal code to reproduce the error on local Jupyter notebook:
returns error:
The same thing happens when running on a cluster with manual scaling without giving enough time to connect to the workers. It seems adaptive does not see any workers and terminates the process.
The text was updated successfully, but these errors were encountered: