-
Notifications
You must be signed in to change notification settings - Fork 23
RFC.2
The agent configuration contains lines like this (here for xsede.stampede_ssh
):
"agent_spawner" : "POPEN",
"agent_launch_method" : "SSH",
"task_launch_method" : "SSH",
"mpi_launch_method" : "MPIRUN_RSH",
which mean:
- mpi units are launched via
mpirun_rsh
- other units are launched via
ssh
- sub-agents are started via
ssh
, too. -
mpirun_rsh
andssh
processes are spawned via Python'spopen
Now, with RFC.1 implemented, we do not have a clear separation between mpi- and non-mpi units anymore. Specifically, we promise to set OMP_NUM_THREADS
for the applications to inform it about the set of cores available for each process to use. The current fixed structure would require us to either centrally always set that env variable, or to set it specifically for each launch method, ie. to spread it over different source code files.
Change the configuration to:
"launch_methods" : ["ssh", "open_mp", "mpirun_rsh"]
and dynamically decide on each unit dynamically which LM should be invoked:
cud.cpu_processes = 2
cud.cpu_threads = 3
cud.cpu_process_type = rp.MPI
cud.cpu_thread_type = rp.OpenMP
Each LM would register for one or more thread and process types to handle, and the agent would invoke all LMs which apply to a given unit (here MPI and OpenMP). The OpenMP LM would only set the env variable, really, and could thus easily combined with the MPI LM - but just as easily be used with a Non-MPI LM like ssh.
This is technically not difficult to implement, as the core of the launch methods would basically remain unchanged. It is somewhat tedious to introduce the abstraction, as the LM needs to register what it is able to handle, and the executor needs to sort through that depending on the unit description. This is not really difficult though, and I also don't expect this to significantly affect performance.
As always when introducing a new abstraction, anybody looking at the code needs to understand what the abstraction is about, and why it was introduced - code complexity is increasing in this respect. But, OTOH, code clarity ultimately improves, for the reasons discussed in the problem statement.