Skip to content

Adding support for new clusters

swo edited this page Sep 17, 2014 · 3 revisions

Setting up the interaction

SmileTrain interacts with the cluster in a few ways: the submit command, submit parser, status command, and status parser. A bashrc can also be added for each cluster.

The functionality for submitting jobs to a cluster is in ssub.py.

Submit command

Called submit_cmd in ssub.py, this command is the thing you would type at the terminal to submit a job. For coyote (an MIT cluster), it is qsub.

Submit parser

Called parse_job in ssub.py, this is a function that parses the output returned by the submit command to extract the job name. On coyote, the qsub command might return a line like 1234[].wiley.coyote.mit.edu from which parse_job will extract the job name 1234[].

Status command

Called stat_cmd in ssub.py, this command is the thing you would type at the terminal to check on the status of your job. On coyote, it is a version of qstat.

Status parser

Called parse_stats in ssub.py, this is a function that parses the output returned by the status command to get a list of running jobs. After you submit a set of jobs, Smile Train keeps track of the status of those jobs by repeatedly checking the job status. If the jobs you submitted are no longer in the list of running jobs, then Smile Train will perform post-operation checks (like looking for the files that should have been created by the job). If those checks all pass, the next set of commands in the pipeline will be submitted. On coyote, qstat -x returns an xml-formatted list of all job names, which is parsed by parse_stats to produce a list of job names.

Bashrc's

When a job is submitted to a node, the environment may not be configured such that Smile Train can run. For example, the wrong version of python might be enabled or BioPython might not be loaded. In user.cfg, you can specify a specific bashrc that is added to the top of every submitted job. On coyote, bashrcs/coyote.sh is used; this file loads the modules for python2.7 and BioPython.

Coding up changes

Currently, support for new clusters is made by adding an if block to the __init__ method of the Ssub class in ssub.py.

Putting files in the right places

When looking up Greengenes taxonomies, SmileTrain uses a pickled dictionary. The taxonomy files from Greengenes have a format

229854  k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Legionellales; f__Legionellaceae; g__Legionella; s__
3761685 k__Bacteria; p__OD1; c__; o__; f__; g__; s__
3825327 k__Archaea; p__Crenarchaeota; c__MHVG; o__; f__; g__; s__

You can make these into a pickled dictionary with tools/setup_tools/pickle_taxonomies.py.