HOWTO CLUSTER SETUP.txt

This file is part of PyBroMo: a single-molecule Brownian motion diffusion 
simulator for confocal smFRET experiments:

* http://opensmfs.github.io/PyBroMo/


Introduction
============
This is a quick howto of how to setup an IPython cluster.
For more info refer to the IPython documentation:

http://ipython.org/ipython-doc/dev/parallel/parallel_process.html

Requirements
============

You need to install IPython. The easiest way is to get it through
a scientific python distribution, like Anaconda.


Parallel computing on a single machine
======================================

Method 1
--------

Launch the notebook server and, from the cluster tab, start 4 engines.

Method 2
--------
Open a terminal (cmd.exe) and type:

ipcluster start -n 4


Parallel computing on many machines (Windows 7)
===============================================

IPython docs:
http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#starting-the-controller-and-engines-on-different-hosts

Here we configure 2 machines, one controller host that launch the simulation
and one "slave" host that performs the computation. This procedure can be 
extended to multiple "slave" machine just repeating this same configuration.

Windows note
------------
All the commands must be pasted in a cmd.exe terminal.

Setup the controller
--------------------

Only the first time we need to create an ipython profile.

    ipython profile create --parallel --profile=parallel

This command copies a new set of configuration files in 
IPYTHONDIR/profile_parallel, where IPYTHONDIR is usually a folder named 
.ipython in the user home folder (C:\Users\username\). These files can be 
customized to change the default behaviour, if needed.

Now, each time we want to start a parallel computation we begin starting
the controller:

    ipcontroller --profile=parallel --ip=169.232.130.141
	
(where you have to specify the controller ip address)
	
This command creates a file ipcontroller-engine.json that contains
the connection info that the other machines need in order to connect to the 
controller.
The file is located in IPYTHONDIR/profile_parallel/security.

We need to copy ipcontroller-engine.json to the computation machine. 
To automate this step I like to link the IPython folder into a Dropbox folder 
so that all the configuration files are automatically copied/updated on 
the different machines. 

Setup the "slave" machine
-------------------------

Also on the machine in which we run the computation it's useful to create
a profile (only the first time), with the same command as before:

    ipython profile create --parallel --profile=parallel
	
A new set of configuration files is created in
IPYTHONDIR/profile_parallel.

We can start a computation engine with the ipengine command, specifing the
path of the ipcontroller-engine.json file:

    ipengine --profile=parallel --file=C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json
	
or, we can write the file name in the configuration file so we don't need
to write it every time. To do so, edit the file ipengine_config.py
found in the previously created profile folder (IPYTHONDIR/profile_parallel).
Find the line:

    #c.IPEngineApp.url_file = u''
	
remove the trailing # and write the ipcontroller-engine.json path, in our
example:

    c.IPEngineApp.url_file = u'C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json'
	
Now to launch an engine simply type:

    ipengine --profile=parallel
	
It is suggested to launch as many engine as the number of cores. To launch
a second engine open a new terminal and type again the command, and so on.

To add another machine for computation just repeat the previous steps.

Launching the simulation
========================

Once the cluster is started (either in a single machine or on multiple 
machines) we are ready to launch a simulation.

On the controller machine start an IPython QtConsole or an 
IPython notebook using the profile "parallel":

    ipython qtconsole --profile=parallel
   
or 

    ipython notebook --profile=parallel

Then do:

from IPython.parallel import Client
rc = Client()
rc.ids

the last command shoud print the number of engines that were started.

Alternatively, if you have a qtconsole or notebook already started without
the profile parallel you can simply specify the path of the file
that contains the clients (engines) information. This file is 
ipcontroller-client.json (not -engines as before!) and is located in the 
profile folder. 

This trick is used by the PyBroMo notebooks so you don't need
to restart the notebook server after you launch the cluster.