runner does not work with dask adaptive scaling client #326

Kostusas · 2021-09-21T13:22:10Z

Minimal code to reproduce the error on local Jupyter notebook:

import distributed
import adaptive
adaptive.notebook_extension()

cluster = distributed.LocalCluster()
cluster.adapt(minimum=0, maximum=5) # works with manual scaling cluster.scale(5)

client = distributed.Client(cluster)

learner = adaptive.Learner1D(lambda x: x, bounds=(-1, 1))
runner = adaptive.Runner(learner, executor=client, goal=lambda l: l.loss() < 0.01)
runner.live_info()

cluster.close()

returns error:

Task exception was never retrieved
future: <Task finished name='Task-327' coro=<live_info.<locals>.update() done, defined at /opt/conda/lib/python3.9/site-packages/adaptive/notebook_integration.py:217> exception=AssertionError()>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/adaptive/notebook_integration.py", line 226, in update
    status.value = _info_html(runner)
  File "/opt/conda/lib/python3.9/site-packages/adaptive/notebook_integration.py", line 258, in _info_html
    ("elapsed time", datetime.timedelta(seconds=runner.elapsed_time())),
  File "/opt/conda/lib/python3.9/site-packages/adaptive/runner.py", line 658, in elapsed_time
    assert self.task.cancelled()
AssertionError

The same thing happens when running on a cluster with manual scaling without giving enough time to connect to the workers. It seems adaptive does not see any workers and terminates the process.

The text was updated successfully, but these errors were encountered:

akhmerov · 2021-09-21T13:27:26Z

I think we should change the heuristic for determining how many workers we have available by checking the client configuration and scaling strategies.

basnijholt · 2021-09-21T14:51:56Z

You are running into this error:

adaptive/adaptive/runner.py

Lines 403 to 404 in f28bab0

    
           if self._get_max_tasks() < 1: 
        
               raise RuntimeError("Executor has no workers")

because you start with 0 cores.

If you change your argument from minimum=0 to minimum=1, Adaptive does detect the scaling correctly.

Would this be good enough for you?

akhmerov · 2021-09-25T12:59:09Z

This seems to be a workaround, but I think actually detecting the configuration would be more reliable. Unfortunately I can't quite find the correct API in distributed.

akhmerov · 2021-09-25T13:27:09Z

I've asked whether there's a better way on stack overflow (AFAIR that's the preferred channel for dask): https://stackoverflow.com/q/69326568/2217463

basnijholt · 2021-09-27T07:29:16Z

Why would the maximal number of cores matter instead of the currently available cores?

akhmerov · 2021-09-27T07:49:18Z

It's a chicken and egg problem otherwise: the adaptive scaling of dask won't request new workers if there are no tasks in the queue.

basnijholt · 2021-09-27T12:52:28Z

Hmm, then we would already query some points that will not be calculated yet.

Why not change the following

adaptive/adaptive/runner.py

Lines 832 to 833 in a81be7a

    
           elif with_distributed and isinstance(ex, distributed.cfexecutor.ClientExecutor): 
        
               return sum(n for n in ex._client.ncores().values())

to

elif with_distributed and isinstance(ex, distributed.cfexecutor.ClientExecutor): 
    ncores = sum(n for n in ex._client.ncores().values()) 
    return max(1, ncores)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runner does not work with dask adaptive scaling client #326

runner does not work with dask adaptive scaling client #326

Kostusas commented Sep 21, 2021

akhmerov commented Sep 21, 2021

basnijholt commented Sep 21, 2021

akhmerov commented Sep 25, 2021

akhmerov commented Sep 25, 2021

basnijholt commented Sep 27, 2021

akhmerov commented Sep 27, 2021 •

edited

Loading

basnijholt commented Sep 27, 2021 •

edited

Loading

runner does not work with dask adaptive scaling client #326

runner does not work with dask adaptive scaling client #326

Comments

Kostusas commented Sep 21, 2021

akhmerov commented Sep 21, 2021

basnijholt commented Sep 21, 2021

akhmerov commented Sep 25, 2021

akhmerov commented Sep 25, 2021

basnijholt commented Sep 27, 2021

akhmerov commented Sep 27, 2021 • edited Loading

basnijholt commented Sep 27, 2021 • edited Loading

akhmerov commented Sep 27, 2021 •

edited

Loading

basnijholt commented Sep 27, 2021 •

edited

Loading