Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scheduling by interleaving setup and teardown of an experiment #2560

Open
bodokaiser opened this issue Aug 28, 2024 · 9 comments
Open

Comments

@bodokaiser
Copy link

ARTIQ Feature Request

Problem this request addresses

Scheduling of experiment runs is not efficient, i.e., we "waste" about 1 second between experiment runs.

Describe the solution you'd like

Currently, the scheduler worker runs an experiment run's setup and teardown steps, e.g., exp.prepare and write_results, sequentially.

Can we have a worker thread for each stage? Could we also add a precompilation step?

@sbourdeauducq
Copy link
Member

Currently, the scheduler worker runs an experiment run's setup and teardown steps, e.g., exp.prepare and write_results, sequentially.

What do you expect?

The experiment needs to be prepared before it can run.
Results need to have been obtained before they can be written.

If you have two different experiments then those steps should already run in parallel.

Precompilation is already supported, please read the manual.

@architeuthidae
Copy link
Collaborator

Currently, the scheduler worker runs an experiment run's setup and teardown steps, e.g., exp.prepare and write_results, sequentially.

It could hardly be the other way around. On the other hand, experiment stages are already pipelined (and there can be multiple pipelines at any one time.) See the manual. There is also already support for precompilation.

@architeuthidae
Copy link
Collaborator

Oh, race condition. Excuse me.

@sbourdeauducq
Copy link
Member

i.e., we "waste" about 1 second between experiment runs.

What operating system is this?
Each experiment runs in a different worker process. How much of that second is just creating a process?
Though even then, it does not seem worthwhile to optimize since this only affects the first experiment in a sequence, then the rest gets pipelined.

@sbourdeauducq
Copy link
Member

If it really matters (but you should make the case for it, everything is a trade-off against code complexity) it should be possible to create some worker processes in advance.

@bodokaiser
Copy link
Author

What do you expect?

The experiment needs to be prepared before it can run. Results need to have been obtained before they can be written.

Yes, but I can prepare for the next experiment while writing the results for the previous one.

If you have two different experiments then those steps should already run in parallel.

Why not extend it for the same experiments (with different argument overwrites)?

Precompilation is already supported, please read the manual.

Yes, but how would I tell the scheduler to exploit precompile? Is it as simple as

class Experiment(EnvExperiment):

  def prepare(self):
    self._run = self.precompile(self.run)

  def run(self):
    self._run()

?

i.e., we "waste" about 1 second between experiment runs.

What operating system is this? Each experiment runs in a different worker process. How much of that second is just creating a process? Though even then, it does not seem worthwhile to optimize since this only affects the first experiment in a sequence, then the rest gets pipelined.

We are using Linux amd64. Yes, you are right, a starting point is get a better understanding of the actual bottlenecks. I can provide an analysis in a few weeks (also testing the new NAC3 compiler).

@dnadlinger
Copy link
Collaborator

Yes, but I can prepare for the next experiment while writing the results for the previous one.

This should already be the case. The next-to-be-run experiment is already prepared while the currently active one is still running.

@systemofapwne
Copy link

systemofapwne commented Nov 27, 2024

I just stumbled upon this topic, after we implemented precompilation into our experiments recently.
On our setup, compilation of rather complex experiments takes ~4s. Moving the compilation into the prepare pipeline reduced this deadtime between two consecutive experiments, which was a giant improvement for us.
This makes me wonder, why this is not done in the first place. I can only imagine a case where a result of a previous experiment influences how a consecutive experiment is executed.

@bodokaiser Your example code seems not to work in its current form. But this example down here eventually will

class Experiment(EnvExperiment):
  def build(self):
    self.setattr_device("core")
  
  def prepare(self):
    super().prepare()
    self._run = self.precompile(self.my_kernel)

  @kernel
  def my_kernel(self):
    # Reset FIFO (just in case)
    self.core.reset()
    self.core.break_realtime()

    # Do some kernel stuff on your core
    
    # Wait, until core is ready for next run.
    self.core.wait_until_mu(now_mu())

  def run(self):
    self._run()

The FIFO reset is just there in case there is still an experiment running. Might be dropped though.
The wait_until_mu() is required so that the masters run queue will wait for the core to finish. If not set, it will dispatch the compiled kernel and immediately jump to the prepare of the next scheduled experiment. Once that prepare then is finished, the master would dispatch a new kernel while the previous was still running. The wait_until_mu avoids this case.

Since we realized, writing this code into our experiments all over again becomes messy quite quickly, I wrote a decorator for it.
Experiments that shall be precompiled now look like this:

@precompile
class BaseExperiment(EnvExperiment):
    def build(self):
        # Whatever you need for build
        self.setattr_device("core")

    @kernel
    def run(self):
        print("Base Experiment")

@precompile
class MyExperiment(EnvExperiment):
    def build(self):
        # Whatever you need for build
        self.setattr_device("core")
        self.base = BaseExperiment(self)

    @kernel
    def run(self):
        self.base.run()
        print("Derived Experiment")

The source for the decorator code is here:

import functools
ALLOW_COMPILE = True
def precompile(cls):
    ''' 
    Hooks into prepare method and adds precompilation + tracking of compilation status.
    Tracking avoids sub-experiment individually getting compiled, if they were marked as precompiled too.
    This saves time, since the main experiment embeds the sub-experiments and compiles the sequence as a whole, so there is no gain to precompile sub-experiments.
    '''
    @functools.wraps(cls, updated=())
    class PrecompiledExperiment(cls):
        def prepare(self):
            global ALLOW_COMPILE                # Get global compile state...
            do_compile = ALLOW_COMPILE          # back it up...
            ALLOW_COMPILE = False               # disable compilation for imported experiments, when running their respective prepare() method
            try:
                super().prepare()               # run child experiment's prepare methods
            finally:
                ALLOW_COMPILE = do_compile      # Reset global state to what it was before..
            if do_compile: self._precompile()   # And compile (if initially, global state was True)
        
        @kernel
        def _wrapped_kernel(self):
            ''' This kernel will get precompiled. It wraps around the original kernel and adds FIFO reset and wait_until_mu'''
            # Reset FIFO
            self.core.reset()
            self.core.break_realtime()
            # Run actual kernel
            self.run()
            # Must be here or new experiments/kernels will be added to core before any previous experiment has finished!
            self.core.wait_until_mu(now_mu())


        def _precompile(self):
            ''' Allows precompilation of the run() method already in the prepare method for faster execution when multiple experiments are scheduled '''
            from time import time
            if not hasattr(self.run, '__wrapped__'): return             # If the run method is not decorated (e.g. by @kernel), skip precompilation.
            if not hasattr(self,"core"): self.setattr_device("core")    # Add core device, if not added yet

            # Precompile run() method
            start = time()
            kern = self.core.precompile(self._wrapped_kernel)
            print(f"PRECOMPILE time: {time() - start:.2f} s")

            # Replace original run method with precompiled version (aka monkey patch)
            self.run = kern
    
    return PrecompiledExperiment

@dnadlinger
Copy link
Collaborator

This makes me wonder, why this is not done in the first place. I can only imagine a case where a result of a previous experiment influences how a consecutive experiment is executed.

That's exactly it. In fact, this is often the natural case, as the "previous experiment" might be a calibration experiment that updates some dataset with a new π time, frequency, etc. that the new experiment ought to pull in. One can of course opportunistically precompile experiments and then check afterwards whether any dataset/controller/… dependencies changed. It is possible to write a framework that achieves this as well, but baseline ARTIQ is a bit limited here, as you can't start to execute run()(where those datasets would be read in, etc.) without being in the running phase, as the experiment might already start accessing hardware, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants