forked from cgat-developers/ruffus
-
Notifications
You must be signed in to change notification settings - Fork 1
/
USAGE.TXT
81 lines (59 loc) · 7.37 KB
/
USAGE.TXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Each stage or task in a computational pipeline is represented by a python function
Each python function can be called in parallel to run multiple jobs.
1. Import module::
from ruffus import *
2. Annotate functions with python decorators
e.g.::
from ruffus import *
import sys
def first_task():
print "First task"
@follows(first_task)
def second_task():
print "Second task"
@follows(second_task)
def final_task():
print "Final task"
Examples of decorators:
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| Decorator | Purpose | Example |
+========================+=====================================+=====================================================================================================+
|**@follows** | - Indicate task dependency | ``@follows(task1, "task2")`` |
| | | |
| | - mkdir prerequisite shorthand | ``@follows(task1, mkdir("my/directory/for/results"))`` |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@files** | - I/O parameters | ``@files(parameter_list)`` |
| | | |
| | - skips up-to-date jobs | ``@files(parameter_generating_function)`` |
| | | |
| | | ``@files(input, output, other_params_for_a_single_job)`` |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@split** | - Splits a single input into | ``@split ( tasks_or_file_names, output_files, [extra_parameters,...] )`` |
| | multiple output | |
| | - Globs in output can specify an | |
| | indeterminate number of files. | |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@transform** | - Applies the task function to | ``@transform ( tasks_or_file_names, suffix(suffix_string), output_pattern, [extra_parameters,..] )``|
| | transform input data to output. | |
| | | ``@transform ( tasks_or_file_names, regex(regex_pattern), output_pattern, [extra_parameters,...] )``|
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@merge** | - Merges multiple input | ``@merge (tasks_or_file_names, output, [extra_parameters,...] )`` |
| | into a single output. | |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@collate** | - Groups together sets of input | ``@collate ( tasks_or_file_names, regex(matching_regex), output_pattern, [extra_parameters,...] )`` |
| | into a few outputs | |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
|**@posttask** | - Call function after task | ``@posttask(signal_task_completion_function)`` |
| | | |
| | - touch file shorthand | ``@posttask(touch_file("task1.completed")`` |
+------------------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+
3. Print dependency graph if you necessary
- For a graphical flowchart in ``jpg``, ``svg``, ``dot``, ``png``, ``ps``, ``gif`` formats::
graph_printout ( open("flowchart.svg", "w"),
"svg",
list_of_target_tasks)
This requires ``dot`` to be installed
- For a text printout of all jobs ::
pipeline_printout(sys.stdout, list_of_target_tasks)
4. Run the pipeline::
pipeline_run(list_of_target_tasks, [list_of_tasks_forced_to_rerun, multiprocess = N_PARALLEL_JOBS])