-
Notifications
You must be signed in to change notification settings - Fork 6
Adding support for new clusters
SmileTrain interacts with the cluster in a few ways: the submit command, submit parser, status command, and status parser. A bashrc can also be added for each cluster.
The functionality for submitting jobs to a cluster is in ssub.py
.
Called submit_cmd
in ssub.py
, this command is the thing you would type at the terminal to submit a job. For coyote (an MIT cluster), it is qsub
.
Called parse_job
in ssub.py
, this is a function that parses the output returned by the submit command to extract the job name. On coyote, the qsub
command might return a line like 1234[].wiley.coyote.mit.edu
from which parse_job
will extract the job name 1234[]
.
Called stat_cmd
in ssub.py
, this command is the thing you would type at the terminal to check on the status of your job. On coyote, it is a version of qstat
.
Called parse_stats
in ssub.py
, this is a function that parses the output returned by the status command to get a list of running jobs. After you submit a set of jobs, Smile Train keeps track of the status of those jobs by repeatedly checking the job status. If the jobs you submitted are no longer in the list of running jobs, then Smile Train will perform post-operation checks (like looking for the files that should have been created by the job). If those checks all pass, the next set of commands in the pipeline will be submitted. On coyote, qstat -x
returns an xml-formatted list of all job names, which is parsed by parse_stats
to produce a list of job names.
When a job is submitted to a node, the environment may not be configured such that Smile Train can run. For example, the wrong version of python might be enabled or BioPython might not be loaded. In user.cfg
, you can specify a specific bashrc that is added to the top of every submitted job. On coyote, bashrcs/coyote.sh
is used; this file loads the modules for python2.7 and BioPython.
Currently, support for new clusters is made by adding an if
block to the __init__
method of the Ssub
class in ssub.py
.
When looking up Greengenes taxonomies, SmileTrain uses a pickled dictionary. The taxonomy files from Greengenes have a format
229854 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Legionellales; f__Legionellaceae; g__Legionella; s__
3761685 k__Bacteria; p__OD1; c__; o__; f__; g__; s__
3825327 k__Archaea; p__Crenarchaeota; c__MHVG; o__; f__; g__; s__
You can make these into a pickled dictionary with tools/setup_tools/pickle_taxonomies.py
.