Reusable and maintained Luigi tasks to incorporate in bioinformatics pipelines
Provides Luigi tasks for tools from samtools, bcftools, STAR, RSEM, vcfanno, GATK, Ensembl VEP and much more!
Reuses as much as possible the ExternalProgramTask
interface from the external_program contrib module
and extends its feature to make it work on modern scheduler such as Slurm.
Provides basic resource management for a local scheduler: all tasks are annotated with reasonable default cpus
and memory
parameters that can be tuned and constrained via the [resources]
configuration. In the case of externally scheduled tasks, the resource management is deferred.
Provides a command-line interface for interacting more conveniently with Luigi scheduler.
bioluigi list [--status STATUS] [--user USER] [--detailed] TASK_GLOB
bioluigi show TASK_ID
Here's a list of supported tools:
- sratoolkit with
prefetch
andfastq-dump
- bcftools
- FastQC
- MultiQC
- local
- Slurm
The most convenient way of using the pre-defined tasks is to yield them dynamically in the body of the run
function. It's also possible to require them since they inherit from luigi.Task
.
import luigi
from bioluigi.tasks import bcftools
def MyTask(luigi.Task)
def input(self):
return luigi.LocalTarget('source.vcf.gz')
def run(self):
yield bcftools.Annotate(self.input().path,
annotations_file,
self.output().path,
...,
scheduler='slurm',
cpus=8)
def output(self):
return luigi.LocalTarget('annotated.vcf.gz')
You can define your own scheduled task by implementing the ScheduledExternalProgramTask
class. Note that the default scheduler is local
and will use Luigi's [resources]
allocation mechanism.
import datetime
from bioluigi.scheduled_external_program import ScheduledExternalProgramTask
class MyScheduledTask(ScheduledExternalProgramTask):
scheduler = 'slurm'
walltime = datetime.timedelta(seconds=10)
cpus = 1
memory = 1
def program_args(self):
return ['sleep', '10']