From 909527a85b02349b9d879a2de2595574d631be36 Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Tue, 4 Jun 2024 17:30:20 -0400 Subject: [PATCH 1/6] first pass at updating Looper 1.8.0 docs --- docs/looper/changelog.md | 51 +- docs/looper/defining-a-project.md | 142 +-- docs/looper/faq.md | 4 +- docs/looper/grouping-jobs.md | 4 + docs/looper/looper-config.md | 14 +- docs/looper/multiple-pipelines.md | 15 +- .../pipeline-interface-specification.md | 103 +- docs/looper/pipeline-tiers.md | 2 +- docs/looper/pipestat.md | 17 +- docs/looper/running-a-pipeline.md | 4 +- docs/looper/support.md | 2 +- docs/looper/usage.md | 948 +++++++++++------- docs/looper/using-geofetch.md | 9 - docs/looper/variable-namespaces.md | 10 +- docs/looper/writing-a-pipeline-interface.md | 9 + 15 files changed, 788 insertions(+), 546 deletions(-) diff --git a/docs/looper/changelog.md b/docs/looper/changelog.md index 82e81927..00a7c05c 100644 --- a/docs/looper/changelog.md +++ b/docs/looper/changelog.md @@ -1,13 +1,58 @@ # Changelog This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. +## [1.8.1] -- 2024-06-05 -## [1.6.0] -- 2023-09-XX +### Fixed +- added `-v` and `--version` to the CLI + +## [1.8.0] -- 2024-06-04 + +### Added +- looper destroy now destroys individual results when pipestat is configured: https://github.com/pepkit/looper/issues/469 +- comprehensive smoketests: https://github.com/pepkit/looper/issues/464 +- allow rerun to work on both failed or waiting flags: https://github.com/pepkit/looper/issues/463 + +### Changed +- Migrated `argparse` CLI definition to a pydantic basis for all commands. See: https://github.com/pepkit/looper/issues/438 +- during project load, check if PEP file path is a file first, then check if it is a registry path: https://github.com/pepkit/looper/issues/456 +- Looper now uses FutureYamlConfigManager due to the yacman refactor v0.9.3: https://github.com/pepkit/looper/issues/452 + +### Fixed +- inferring project name when loading PEP from csv: https://github.com/pepkit/looper/issues/484 +- fix inconsistency resolving pipeline interface paths if multiple paths are supplied: https://github.com/pepkit/looper/issues/474 +- fix bug with checking for completed flags: https://github.com/pepkit/looper/issues/470 +- fix looper destroy not properly destroying all related files: https://github.com/pepkit/looper/issues/468 +- looper rerun now only runs failed jobs as intended: https://github.com/pepkit/looper/issues/467 +- looper inspect now inspects the looper config: https://github.com/pepkit/looper/issues/462 +- Load PEP from CSV: https://github.com/pepkit/looper/issues/456 +- looper now works with sample_table_index https://github.com/pepkit/looper/issues/458 + +## [1.7.1] -- 2024-05-28 + +### Fixed +- pin pipestat version to be between pipestat>=0.8.0,<0.9.0 https://github.com/pepkit/looper/issues/494 + + +## [1.7.0] -- 2024-01-26 + +### Added +- `--portable` flag to `looper report` to create a portable version of the html report +- `--lump-j` allows grouping samples into a defined number of jobs + +### Changed +- `--lumpn` is now `--lump-n` +- `--lump` is now `--lump-s` +- +### Added +- `looper link` creates symlinks for results grouped by record_identifier. It requires pipestat to be configured. [#72](https://github.com/pepkit/looper/issues/72) +- basic tab completion. ### Changed -- looper now works with pipestat v0.6.0 and greater -- looper table and check now use pipestat and therefore require pipestat configuration. [#390](https://github.com/pepkit/looper/issues/390) +- looper now works with pipestat v0.6.0 and greater. +- `looper table`, `check` now use pipestat and therefore require pipestat configuration. [#390](https://github.com/pepkit/looper/issues/390) - changed how looper configures pipestat [#411](https://github.com/pepkit/looper/issues/411) +- initializing pipeline interface also writes an example `output_schema.yaml` and `count_lines.sh` pipeline ## [1.5.1] -- 2023-08-14 diff --git a/docs/looper/defining-a-project.md b/docs/looper/defining-a-project.md index eb406af3..e5dbd4b3 100644 --- a/docs/looper/defining-a-project.md +++ b/docs/looper/defining-a-project.md @@ -2,128 +2,82 @@ ## 1. Start with a basic PEP -To start, you need a project defined in the [standard Portable Encapsulated Project (PEP) format](http://pep.databio.org). Start by [creating a PEP](https://pep.databio.org/en/latest/simple_example/). +To start, you need a project defined in the [standard Portable Encapsulated Project (PEP) format](http://pep.databio.org). Start by [creating a PEP](https://pep.databio.org/spec/simple-example/). -## 2. Connect the PEP to looper +## 2. Specify the Sample Annotation -### 2.1 Specify `output_dir` - -Once you have a basic PEP, you can connect it to looper. Just provide the required looper-specific piece of information -- `output-dir`, a parent folder where you want looper to store your results. You do this by adding a `looper` section to your PEP. The `output_dir` key is expected in the top level of the `looper` section of the project configuration file. Here's an example: +This information generally lives in a `project_config.yaml` file. +Simplest example: ```yaml -looper: - output_dir: "/path/to/output_dir" +pep_version: 2.0.0 +sample_table: sample_annotation.csv ``` -### 2.2 Configure pipestat - -*We recommend to read the [pipestat documentation](https://pipestat.databio.org) to learn more about the concepts described in this section* - -Additionally, you may configure pipestat, the tool used to manage pipeline results. Pipestat provides lots of flexibility, so there are multiple configuration options that you can provide in `looper.pipestat.sample` or `looper.pipestat.project`, depending on the pipeline level you intend to run. - -Please note that all the configuration options listed below *do not* specify the values passed to pipestat *per se*, but rather `Project` or `Sample` attribute names that hold these values. This way the pipestat configuration can change with pipeline submitted for every `Sample` if the PEP `sample_modifiers` are used. - -- `results_file_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the YAML results file that will be used to report results into. Default value: `pipestat_results_file`, so the path will be sourced from either `Sample.pipestat_results_file` or `Project.pipestat_results_file`. If the path provided this way is not absolute, looper will make it relative to `{looper.output_dir}`. -- `namespace_attribute`: name of the `Sample` or `Project` attribute that indicates the namespace to report into. Default values: `sample_name` for sample-level pipelines `name` for project-level pipelines , so the path will be sourced from either `Sample.sample_name` or `Project.name`. -- `config_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the pipestat configuration file. It's not needed in case the intended pipestat backend is the YAML results file mentioned above. It's required if the intended pipestat backend is a PostgreSQL database, since this is the only way to provide the database login credentials. Default value: `pipestat_config`, so the path will be sourced from either `Sample.pipestat_config` or `Project.pipestat_config`. +You can also add sample modifiers to the project file `derive` or `imply` attributes: -Non-configurable pipestat options: - -- `schema_path`: never specified here, since it's sourced from `{pipeline.output_schema}`, that is specified in the pipeline interface file -- `record_identifier`: is automatically set to `{pipeline.pipeline_name}`, that is specified in the pipeline interface file +For example: +If you have a project that contains samples of different types, then you can use an `imply` modifier in your PEP to select which pipelines you want to run on which samples, like this: ```yaml -name: "test123" -pipestat_results_file: "project_pipestat_results.yaml" -pipestat_config: "/path/to/project_pipestat_config.yaml" - sample_modifiers: - append: - pipestat_config: "/path/to/pipestat_config.yaml" - pipestat_results_file: "RESULTS_FILE_PLACEHOLDER" - derive: - attributes: ["pipestat_results_file"] - sources: - RESULTS_FILE_PLACEHOLDER: "{sample_name}/pipestat_results.yaml" - -looper: - output_dir: "/path/to/output_dir" - # pipestat configuration starts here - # the values below are defaults, so they are not needed, but configurable - pipestat: - sample: - results_file_attribute: "pipestat_results_file" - config_attribute: "pipestat_config" - namespace_attribute: "sample_name" - project: - results_file_attribute: "pipestat_results_file" - config_attribute: "pipestat_config" - namespace_attribute: "name" + imply: + - if: + protocol: "RRBS" + then: + pipeline_interfaces: "/path/to/pipeline_interface.yaml" + - if: + protocol: "ATAC" + then: + pipeline_interfaces: "/path/to/pipeline_interface2.yaml" ``` -## 3. Link a pipeline to your project - -Next, you'll need to point the PEP to the *pipeline interface* file that describes the command you want looper to run. - -### Understanding pipeline interfaces - -Looper links projects to pipelines through a file called the *pipeline interface*. Any looper-compatible pipeline must provide a pipeline interface. To link the pipeline, you simply point each sample to the pipeline interfaces for any pipelines you want to run. - -Looper pipeline interfaces can describe two types of pipeline: sample-level pipelines or project-level pipelines. Briefly, a sample-level pipeline is executed with `looper run`, which runs individually on each sample. A project-level pipeline is executed with `looper runp`, which runs a single job *per pipeline* on an entire project. Typically, you'll first be interested in the sample-level pipelines. You can read in more detail in the [pipeline tiers documentation](pipeline-tiers.md). -### Adding a sample-level pipeline interface - -Sample pipelines are linked by adding a sample attribute called `pipeline_interfaces`. There are 2 easy ways to do this: you can simply add a `pipeline_interfaces` column in the sample table, or you can use an *append* modifier, like this: +You can also use `derive` to derive attributes from the PEP: ```yaml sample_modifiers: - append: - pipeline_interfaces: "/path/to/pipeline_interface.yaml" -``` - -The value for the `pipeline_interfaces` key should be the *absolute* path to the pipeline interface file. The paths may also contain environment variables. Once your PEP is linked to the pipeline, you just need to make sure your project provides any sample metadata required by the pipeline. - -### Adding a project-level pipeline interface - -Project pipelines are linked in the `looper` section of the project configuration file: + derive: + attributes: [read1, read2] + sources: + # Obtain tutorial data from http://big.databio.org/pepatac/ then set + # path to your local saved files + R1: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r1.fastq.gz" + R2: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r2.fastq.gz" ``` -looper: - pipeline_interfaces: "/path/to/project_pipeline_interface.yaml" -``` -### How to link to multiple pipelines - -Looper decouples projects and pipelines, so you can have many projects using one pipeline, or many pipelines running on the same project. If you want to run more than one pipeline on a sample, you can simply add more than one pipeline interface, like this: +A more complicated example taken from [PEPATAC](https://pepatac.databio.org/en/latest/): ```yaml -sample_modifiers: - append: - pipeline_interfaces: ["/path/to/pipeline_interface.yaml", "/path/to/pipeline_interface2.yaml"] -``` - -Looper will submit jobs for both of these pipelines. +pep_version: 2.0.0 +sample_table: tutorial.csv -If you have a project that contains samples of different types, then you can use an `imply` modifier in your PEP to select which pipelines you want to run on which samples, like this: - - -```yaml sample_modifiers: + derive: + attributes: [read1, read2] + sources: + # Obtain tutorial data from http://big.databio.org/pepatac/ then set + # path to your local saved files + R1: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r1.fastq.gz" + R2: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r2.fastq.gz" imply: - - if: - protocol: "RRBS" - then: - pipeline_interfaces: "/path/to/pipeline_interface.yaml" - - if: - protocol: "ATAC" - then: - pipeline_interfaces: "/path/to/pipeline_interface2.yaml" + - if: + organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"] + then: + genome: hg38 + prealignment_names: ["rCRSd"] + deduplicator: samblaster # Default. [options: picard] + trimmer: skewer # Default. [options: pyadapt, trimmomatic] + peak_type: fixed # Default. [options: variable] + extend: "250" # Default. For fixed-width peaks, extend this distance up- and down-stream. + frip_ref_peaks: None # Default. Use an external reference set of peaks instead of the peaks called from this run ``` -## 5. Customize looper +## 3. Customize looper -That's all you need to get started linking your project to looper. But you can also customize things further. Under the `looper` section, you can provide a `cli` keyword to specify any command line (CLI) options from within the project config file. The subsections within this section direct the arguments to the respective `looper` subcommands. So, to specify, e.g. sample submission limit for a `looper run` command use: +You can also customize things further. Under the `looper` section, you can provide a `cli` keyword to specify any command line (CLI) options from within the project config file. The subsections within this section direct the arguments to the respective `looper` subcommands. So, to specify, e.g. sample submission limit for a `looper run` command use: ```yaml looper: diff --git a/docs/looper/faq.md b/docs/looper/faq.md index c3623ac0..39eaa62b 100644 --- a/docs/looper/faq.md +++ b/docs/looper/faq.md @@ -13,12 +13,12 @@ You can add that location to your path by appending it (`export PATH=$PATH:~/.lo ## How can I run my jobs on a cluster? -Looper uses the external package [divvy](http://code.databio.org/divvy) for cluster computing, making it flexible enough to use with any cluster resource environment. Please see the [tutorial on cluster computing with looper and divvy](running-on-a-cluster.md). +Looper uses the external package [divvy](https://pep.databio.org/divvy/) for cluster computing, making it flexible enough to use with any cluster resource environment. Please see the [tutorial on cluster computing with looper and divvy](running-on-a-cluster.md). ## What's the difference between `looper` and `pypiper`? -[`pypiper`](http://pypiper.readthedocs.io) is a more traditional workflow-building framework; it helps you build pipelines to process individual samples. [`looper`](http://looper.readthedocs.io) is completely pipeline-agnostic, and has nothing to do with individual processing steps; it operates groups of samples (as in a project), submitting the appropriate pipeline(s) to a cluster or server (or running them locally). The two projects are independent and can be used separately, but they are most powerful when combined. They complement one another, together constituting a comprehensive pipeline management system. +[`pypiper`](https://pep.databio.org/pypiper/) is a more traditional workflow-building framework; it helps you build pipelines to process individual samples. [`looper`](https://pep.databio.org/looper/) is completely pipeline-agnostic, and has nothing to do with individual processing steps; it operates groups of samples (as in a project), submitting the appropriate pipeline(s) to a cluster or server (or running them locally). The two projects are independent and can be used separately, but they are most powerful when combined. They complement one another, together constituting a comprehensive pipeline management system. ## Why isn't a sample being processed by a pipeline (`Not submitting, flag found: ['*_.flag']`)? diff --git a/docs/looper/grouping-jobs.md b/docs/looper/grouping-jobs.md index 9c247b4d..7915ae98 100644 --- a/docs/looper/grouping-jobs.md +++ b/docs/looper/grouping-jobs.md @@ -9,3 +9,7 @@ It's quite simple: if you want to run 100 samples in a single job submission scr ## Lumping jobs by input file size: `--lump` But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lumpn` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together. + +## Lumping jobs by input file size: `--lumpj` + +Or you can lump samples into number of jobs. \ No newline at end of file diff --git a/docs/looper/looper-config.md b/docs/looper/looper-config.md index 3c2d095c..e296c72b 100644 --- a/docs/looper/looper-config.md +++ b/docs/looper/looper-config.md @@ -8,7 +8,7 @@ Example looper config file using local PEP: pep_config: $HOME/hello_looper-master/project/project_config.yaml output_dir: "$HOME/hello_looper-master/output" pipeline_interfaces: - sample: ["$HOME/hello_looper-master/pipeline/pipeline_interface"] + sample: "$HOME/hello_looper-master/pipeline/pipeline_interface" project: "some/project/pipeline" ``` @@ -19,18 +19,18 @@ environment variables used by the PEP. Example looper config file using PEPhub project: ```yaml -pep_config: pephub::databio/looper:default +pep_config: pepkit/hello_looper:default output_dir: "$HOME/hello_looper-master/output" pipeline_interfaces: - sample: ["$HOME/hello_looper-master/pipeline/pipeline_interface"] - project: "$HOME/hello_looper-master/project/pipeline" + sample: "$HOME/hello_looper-master/pipeline/pipeline_interface_sample.yaml" + project: "$HOME/hello_looper-master/pipeline/pipeline_interface_project.yaml" ``` Where: - `output_dir` is pipeline output directory, where results will be saved. -- `pep_config` is a local config file or PEPhub registry path. (registry path should be specified in one -one of supported ways: `namespace/name`, `pephub::namespace/name`, `namespace/name:tag`, or `pephub::namespace/name:tag`) +- `pep_config` is a local config file or PEPhub registry path. (registry path should be specified in +one of supported ways: `namespace/name`, `namespace/name:tag`) - `pipeline interfaces` is a local path to project or sample pipelines. To run pipeline, go to the directory of .looper.config and execute command in your terminal: -`looper run --looper-config {looper_config_path}` or `looper runp --looper-config {looper_config_path}`. +`looper run --looper-config {looper_config_path}` or `looper runp --looper-config {looper_config_path}` (project-level pipeline). diff --git a/docs/looper/multiple-pipelines.md b/docs/looper/multiple-pipelines.md index adc29600..6eb42b3f 100644 --- a/docs/looper/multiple-pipelines.md +++ b/docs/looper/multiple-pipelines.md @@ -1,6 +1,19 @@ # A project with multiple pipelines -In earlier versions of looper (v < 1.0), we used a `protocol_mappings` section to map samples with different `protocol` attributes to different pipelines. In the current pipeline interface (looper v > 1.0), we eliminated the `protocol_mappings`, because this can now be handled using sample modifiers, simplifying the pipeline interface. Now, each pipeline has exactly 1 pipeline interface. You link to the pipeline interface with a sample attribute. If you want the same pipeline to run on all samples, it's as easy as using an `append` modifier like this: +In earlier versions of looper (v < 1.0), we used a `protocol_mappings` section to map samples with different `protocol` attributes to different pipelines. In the current pipeline interface (looper v > 1.0), we eliminated the `protocol_mappings`, because this can now be handled using sample modifiers, simplifying the pipeline interface. +Now, each pipeline has exactly 1 pipeline interface. + +The preferred method is to specify pipeline interfaces in the looper config file: + +```yaml +pep_config: pephub::databio/looper:default +output_dir: "$HOME/hello_looper-master/output" +pipeline_interfaces: + sample: "$HOME/hello_looper-master/pipeline/pipeline_interface" + project: "$HOME/hello_looper-master/project/pipeline" +``` + +However, you can also link to the pipeline interface with a sample attribute. If you want the same pipeline to run on all samples, it's as easy as using an `append` modifier like this: ``` sample_modifiers: diff --git a/docs/looper/pipeline-interface-specification.md b/docs/looper/pipeline-interface-specification.md index 7472b434..49b8b395 100644 --- a/docs/looper/pipeline-interface-specification.md +++ b/docs/looper/pipeline-interface-specification.md @@ -22,7 +22,7 @@ A pipeline interface may contain the following keys: - `pipeline_name` (REQUIRED) - A string identifying the pipeline, - `pipeline_type` (REQUIRED) - A string indicating a pipeline type: "sample" (for `run`) or "project" (for `runp`), -- `command_template` (REQUIRED) - A [Jinja2](https://jinja.palletsprojects.com/en/2.11.x/) template used to construct a pipeline command command to run. +- `command_template` (REQUIRED) - A [Jinja2](https://jinja.palletsprojects.com/en/2.11.x/) template used to construct a pipeline command to run. - `linked_pipeline_interfaces` (OPTIONAL) - A collection of paths to sample pipeline interfaces related to this pipeline interface (used only in project pipeline interfaces for `looper report` purposes). - `input_schema` (RECOMMENDED) - A [PEP Schema](http://eido.databio.org) formally defining *required inputs* for the pipeline - `output_schema` (RECOMMENDED) - A schema describing the *outputs* of the pipeline @@ -101,7 +101,7 @@ The input schema formally specifies the *input processed by this pipeline*. The 2. **Description**. The input schema is also useful to describe the inputs, including both required and optional inputs, thereby providing a standard way to describe a pipeline's inputs. In the schema, the pipeline author can describe exactly what the inputs mean, making it easier for users to learn how to structure a project for the pipeline. -Details for how to write a schema in in [writing a schema](http://eido.databio.org/en/latest/writing-a-schema/). The input schema format is an extended [PEP JSON-schema validation framework](http://pep.databio.org/en/latest/howto_validate/), which adds several capabilities, including +Details for how to write a schema in [writing a schema](http://eido.databio.org/en/latest/writing-a-schema/). The input schema format is an extended [PEP JSON-schema validation framework](http://pep.databio.org/en/latest/howto_validate/), which adds several capabilities, including - `required` (optional): A list of sample attributes (columns in the sample table) that **must be defined** - `required_files` (optional): A list of sample attributes that point to **input files that must exist**. @@ -111,52 +111,67 @@ If no `input_schema` is included in the pipeline interface, looper will not be a ### output_schema -The output schema formally specifies the *output produced by this pipeline*. It is used by downstream tools to that need to be aware of the products of the pipeline for further visualization or analysis. Like the input schema, it is based on JSON-schema, but *must* follow the [pipestat schema specification](http://pipestat.databio.org/en/latest/pipestat_specification/#pipestat-schema). +The output schema formally specifies the *output produced by this pipeline*. It is used by downstream tools to that need to be aware of the products of the pipeline for further visualization or analysis. Beginning with Looper 1.6.0 and Pipestat 0.6.0, the output schema is a JSON-schema: [pipestat schema specification](http://pipestat.databio.org/en/latest/pipestat_specification/#pipestat-schema). Here is an example output schema: ```yaml -number_of_things: - type: integer - multipleOf: 10 - minimum: 20 - description: "Number of things, min 20, multiple of 10" -smooth_bw: - type: file - value: - path: "aligned_{genome}/{sample_name}_smooth.bw" - title: "A smooth bigwig file" - description: "This stores a bigwig file path" -peaks_bed: - type: file - value: - path: "peak_calling_{genome}/{sample_name}_peaks.bed" - title: "Peaks in BED format" - description: "This stores a BED file path" -collection_of_things: - type: array - items: - type: string - description: "This stores collection of strings" -output_object: - type: object - properties: - GC_content_plot: - type: image - genomic_regions_plot: - type: image - value: - GC_content_plot: - path: "gc_content_{sample_name}.pdf" - thumbnail_path: "gc_content_{sample_name}.png" - title: "Plot of GC content" - genomic_regions_plot: - path: "genomic_regions_{sample_name}.pdf" - thumbnail_path: "genomic_regions_{sample_name}.png" - title: "Plot of genomic regions" - required: - - GC_content - description: "Object output with plots, the GC content plot is required" +title: An example output schema +description: An example description +type: object +properties: + pipeline_name: "default_pipeline_name" + samples: + type: object + properties: + number_of_things: + type: integer + description: "Number of things" + percentage_of_things: + type: number + description: "Percentage of things" + name_of_something: + type: string + description: "Name of something" + switch_value: + type: boolean + description: "Is the switch on or off" + output_file: + $ref: "#/$defs/file" + description: "This a path to the output file" + output_image: + $ref: "#/$defs/image" + description: "This a path to the output image" + md5sum: + type: string + description: "MD5SUM of an object" + highlight: true +$defs: + image: + type: object + object_type: image + properties: + path: + type: string + thumbnail_path: + type: string + title: + type: string + required: + - path + - thumbnail_path + - title + file: + type: object + object_type: file + properties: + path: + type: string + title: + type: string + required: + - path + - title ``` Looper uses the output schema in its `report` function, which produces a browsable HTML report summarizing the pipeline results. The output schema provides the relative locations to sample-level and project-level outputs produced by the pipeline, which looper can then integrate into the output results. If the output schema is not included, the `looper report` will be unable to locate and integrate the files produced by the pipeline and will therefore be limited to simple statistics. diff --git a/docs/looper/pipeline-tiers.md b/docs/looper/pipeline-tiers.md index 13c2593b..98d7906a 100644 --- a/docs/looper/pipeline-tiers.md +++ b/docs/looper/pipeline-tiers.md @@ -10,7 +10,7 @@ Looper doesn't require you to use this two-stage system, but it simply makes it ## Sample pipelines -The typical use case is sample-level pipelines. These are run with `looper run`. Pipeline interface defining a sample pipeline must to include `pipeline_type: "sample"` statement. +The typical use case is sample-level pipelines. These are run with `looper run`. Pipeline interface defining a sample pipeline must include `pipeline_type: "sample"` statement. ## Project pipelines diff --git a/docs/looper/pipestat.md b/docs/looper/pipestat.md index d05f165c..3b06d44b 100644 --- a/docs/looper/pipestat.md +++ b/docs/looper/pipestat.md @@ -5,7 +5,7 @@ Starting with version 1.4.0, looper supports additional functionality for [pipes 1. monitor the status of pipeline runs 2. summarize the results of pipelines -For non-pipestat-compatible pipelines, you can still use looper to run pipelines, but you won't be able to use `looper report` or `looper status` to manage their output. +For non-pipestat-compatible pipelines, you can still use looper to run pipelines, but you won't be able to use `looper report` or `looper check` to manage their output. ## Pipestat configuration overview Starting with version 1.6.0 configuring looper to work with pipestat has changed. @@ -14,7 +14,7 @@ Now, Looper will obtain pipestat configurations data from two sources: 1. pipeline interface 2. looper_config file -Looper will combine the necessary configuration data and write a new pipestat configuration file named `looper_pipestat_config.yaml` which looper will place in its output directory. Pipestat then uses this configuration file to create the required pipestatManager objects. See [Hello_Looper](https://github.com/pepkit/hello_looper) for a specific example. +Looper will combine the necessary configuration data and write a new pipestat configuration for each pipeline interface. Briefly, the Looper config file must contain a pipestat field. A project name must be supplied if running a project level pipeline. The user must also supply a file path for a results file if using a local file backend or database credentials if using a postgresql database backend. @@ -25,17 +25,6 @@ sample_table: annotation_sheet.csv pipeline_interfaces: sample: ./pipeline_interface1_sample_pipestat.yaml project: ./pipeline_interface1_project_pipestat.yaml -looper: - all: - output_dir: output -sample_modifiers: - append: - attr: "val" - derive: - attributes: [read1, read2] - sources: - SRA_1: "{SRR}_1.fastq.gz" - SRA_2: "{SRR}_2.fastq.gz" pipestat: project_name: TEST_PROJECT_NAME results_file_path: tmp_pipestat_results.yaml @@ -53,7 +42,7 @@ And the pipeline interface must include information required by pipestat such as ```yaml pipeline_name: example_pipestat_pipeline pipeline_type: sample -schema_path: pipeline_pipestat/pipestat_output_schema.yaml +output_schema: pipeline_pipestat/pipestat_output_schema.yaml command_template: > python {looper.piface_dir}/count_lines.py {sample.file} {sample.sample_name} {pipestat.results_file} diff --git a/docs/looper/running-a-pipeline.md b/docs/looper/running-a-pipeline.md index e2370e9f..c6aad0f7 100644 --- a/docs/looper/running-a-pipeline.md +++ b/docs/looper/running-a-pipeline.md @@ -1,11 +1,11 @@ # How to run a pipeline -You first have to [define your project](defining-a-project.md). This will give you a PEP linked to a pipeline. Next, we'll run the pipeline. +You first have to [define your project](defining-a-project.md) and a [config file](looper-config.md). This will give you a PEP linked to a pipeline. Next, we'll run the pipeline. The basic command is `looper run`. To run your pipeline, just: ```console -looper run project_config.yaml +looper run --looper-config .your_looper_config.yaml ``` This will submit a job for each sample. That's basically all there is to it; after this, there's a lot of powerful options and tweaks you can do to control your jobs. Here we'll just mention a few of them. diff --git a/docs/looper/support.md b/docs/looper/support.md index f844c355..450a0069 100644 --- a/docs/looper/support.md +++ b/docs/looper/support.md @@ -2,4 +2,4 @@ Please use the [issue tracker at GitHub](https://github.com/pepkit/looper/issues) to file bug reports or feature requests. -Looper supports Python 2.7 and Python 3, and has been tested in Linux. If you clone this repository and then an attempt at local installation, e.g. with `pip install --upgrade ./`, fails, this may be due to an issue with `setuptools` and `six`. A `FileNotFoundError` (Python 3) or an `IOError` (Python2), with a message/traceback about a nonexistent `METADATA` file means that this is even more likely the cause. To get around this, you can first manually `pip install --upgrade six` or `pip install six==1.11.0`, as upgrading from `six` from 1.10.0 to 1.11.0 resolves this issue, then retry the `looper` installation. +Looper supports Python 3, and has been tested in Linux. If you clone this repository and then an attempt at local installation, e.g. with `pip install --upgrade ./`, fails, this may be due to an issue with `setuptools` and `six`. A `FileNotFoundError` (Python 3) or an `IOError` (Python2), with a message/traceback about a nonexistent `METADATA` file means that this is even more likely the cause. To get around this, you can first manually `pip install --upgrade six` or `pip install six==1.11.0`, as upgrading from `six` from 1.10.0 to 1.11.0 resolves this issue, then retry the `looper` installation. diff --git a/docs/looper/usage.md b/docs/looper/usage.md index 2cd6e60b..c8c58a5f 100644 --- a/docs/looper/usage.md +++ b/docs/looper/usage.md @@ -18,7 +18,7 @@ Each task is controlled by one of the following commands: `run`, `rerun`, `runp` - `looper destroy`: Deletes all output results for this project. -- `looper inspect`: Display the Prioject or Sample information +- `looper inspect`: Display the Project or Sample information - `looper init`: Initialize a looper dotfile (`.looper.yaml`) in the current directory @@ -26,16 +26,14 @@ Each task is controlled by one of the following commands: `run`, `rerun`, `runp` Here you can see the command-line usage instructions for the main looper command and for each subcommand: ## `looper --help` ```console -version: 1.5.2-dev -usage: looper [-h] [--version] [--logfile LOGFILE] [--dbg] [--silent] - [--verbosity V] [--logdev] - {run,rerun,runp,table,report,destroy,check,clean,inspect,init,init-piface} +usage: looper [-h] [-v] [--silent] [--verbosity VERBOSITY] [--logdev] + {run,rerun,runp,table,report,destroy,check,clean,init,init_piface,link,inspect} ... -looper - A project job submission engine and project manager. +Looper Pydantic Argument Parser -positional arguments: - {run,rerun,runp,table,report,destroy,check,clean,inspect,init,init-piface} +commands: + {run,rerun,runp,table,report,destroy,check,clean,init,init_piface,link,inspect} run Run or submit sample jobs. rerun Resubmit sample jobs with failed flags. runp Run or submit project jobs. @@ -44,402 +42,628 @@ positional arguments: destroy Remove output files of the project. check Check flag status of current runs. clean Run clean scripts of already processed jobs. - inspect Print information about a project. init Initialize looper config file. - init-piface Initialize generic pipeline interface. + init_piface Initialize generic pipeline interface. + link Create directory of symlinks for reported results. + inspect Print information about a project. -options: +optional arguments: + --silent Whether to silence logging (default: False) + --verbosity VERBOSITY + Alternate mode of expression for logging level that + better accords with intuition about how to convey + this. (default: None) + --logdev Whether to log in development mode; possibly among + other behavioral changes to logs handling, use a more + information-rich message format template. (default: + False) + +help: -h, --help show this help message and exit - --version show program's version number and exit - --logfile LOGFILE Optional output file for looper logs (default: None) - --dbg Turn on debug mode (default: False) - --silent Silence logging. Overrides verbosity. - --verbosity V Set logging level (1-5 or logging module level name) - --logdev Expand content of logging message format. - -For subcommand-specific options, type: 'looper -h' -https://github.com/pepkit/looper + -v, --version show program's version number and exit ``` ## `looper run --help` ```console -usage: looper run [-h] [-i] [-d] [-t S] [-x S] [-y S] [-f] [--divvy DIVCFG] [-p P] [-s S] - [-c K [K ...]] [-u X] [-n N] [--looper-config LOOPER_CONFIG] - [-S YAML [YAML ...]] [-P YAML [YAML ...]] [-l N] [-k N] - [--sel-attr ATTR] [--sel-excl [E ...] | --sel-incl [I ...]] - [-a A [A ...]] - [config_file] - -Run or submit sample jobs. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - -i, --ignore-flags Ignore run status flags? Default=False - -d, --dry-run Don't actually submit the jobs. Default=False - -t S, --time-delay S Time delay in seconds between job submissions - -x S, --command-extra S String to append to every command - -y S, --command-extra-override S Same as command-extra, but overrides values in PEP - -f, --skip-file-checks Do not perform input file checks - -u X, --lump X Total input file size (GB) to batch into one job - -n N, --lumpn N Number of commands to batch into one job - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - -divvy arguments: - Configure divvy to change computing settings - - --divvy DIVCFG Path to divvy configuration file. Default=$DIVCFG env - variable. Currently: not set - -p P, --package P Name of computing resource package to use - -s S, --settings S Path to a YAML settings file with compute settings - -c K [K ...], --compute K [K ...] List of key-value pairs (k1=v1) - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper run [-h] [-i] [-t TIME_DELAY] [-d] [-x COMMAND_EXTRA] + [-y COMMAND_EXTRA_OVERRIDE] [-u LUMP] [-n LUMP_N] + [-j LUMP_J] [--divvy DIVVY] [-f] [-c COMPUTE [COMPUTE ...]] + [--package PACKAGE] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + -i, --ignore-flags Ignore run status flags (default: False) + -t TIME_DELAY, --time-delay TIME_DELAY + Time delay in seconds between job submissions (min: 0, + max: 30) (default: 0) + -d, --dry-run Don't actually submit jobs (default: False) + -x COMMAND_EXTRA, --command-extra COMMAND_EXTRA + String to append to every command (default: ) + -y COMMAND_EXTRA_OVERRIDE, --command-extra-override COMMAND_EXTRA_OVERRIDE + Same as command-extra, but overrides values in PEP + (default: ) + -u LUMP, --lump LUMP Total input file size (GB) to batch into one job + (default: None) + -n LUMP_N, --lump-n LUMP_N + Number of commands to batch into one job (default: + None) + -j LUMP_J, --lump-j LUMP_J + Lump samples into number of jobs. (default: None) + --divvy DIVVY Path to divvy configuration file. Default=$DIVCFG env + variable. Currently: not set (default: None) + -f, --skip-file-checks + Do not perform input file checks (default: False) + -c COMPUTE [COMPUTE ...], --compute COMPUTE [COMPUTE ...] + List of key-value pairs (k1=v1) (default: []) + --package PACKAGE Name of computing resource package to use (default: + None) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper runp --help` ```console -usage: looper runp [-h] [-i] [-d] [-t S] [-x S] [-y S] [-f] [--divvy DIVCFG] [-p P] [-s S] - [-c K [K ...]] [--looper-config LOOPER_CONFIG] [-S YAML [YAML ...]] - [-P YAML [YAML ...]] [-l N] [-k N] [--sel-attr ATTR] - [--sel-excl [E ...] | --sel-incl [I ...]] [-a A [A ...]] - [config_file] - -Run or submit project jobs. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - -i, --ignore-flags Ignore run status flags? Default=False - -d, --dry-run Don't actually submit the jobs. Default=False - -t S, --time-delay S Time delay in seconds between job submissions - -x S, --command-extra S String to append to every command - -y S, --command-extra-override S Same as command-extra, but overrides values in PEP - -f, --skip-file-checks Do not perform input file checks - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - -divvy arguments: - Configure divvy to change computing settings - - --divvy DIVCFG Path to divvy configuration file. Default=$DIVCFG env - variable. Currently: not set - -p P, --package P Name of computing resource package to use - -s S, --settings S Path to a YAML settings file with compute settings - -c K [K ...], --compute K [K ...] List of key-value pairs (k1=v1) - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper runp [-h] [-i] [-t TIME_DELAY] [-d] [-x COMMAND_EXTRA] + [-y COMMAND_EXTRA_OVERRIDE] [-u LUMP] [-n LUMP_N] + [--divvy DIVVY] [-f] [-c COMPUTE [COMPUTE ...]] + [--package PACKAGE] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + -i, --ignore-flags Ignore run status flags (default: False) + -t TIME_DELAY, --time-delay TIME_DELAY + Time delay in seconds between job submissions (min: 0, + max: 30) (default: 0) + -d, --dry-run Don't actually submit jobs (default: False) + -x COMMAND_EXTRA, --command-extra COMMAND_EXTRA + String to append to every command (default: ) + -y COMMAND_EXTRA_OVERRIDE, --command-extra-override COMMAND_EXTRA_OVERRIDE + Same as command-extra, but overrides values in PEP + (default: ) + -u LUMP, --lump LUMP Total input file size (GB) to batch into one job + (default: None) + -n LUMP_N, --lump-n LUMP_N + Number of commands to batch into one job (default: + None) + --divvy DIVVY Path to divvy configuration file. Default=$DIVCFG env + variable. Currently: not set (default: None) + -f, --skip-file-checks + Do not perform input file checks (default: False) + -c COMPUTE [COMPUTE ...], --compute COMPUTE [COMPUTE ...] + List of key-value pairs (k1=v1) (default: []) + --package PACKAGE Name of computing resource package to use (default: + None) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper rerun --help` ```console -usage: looper rerun [-h] [-i] [-d] [-t S] [-x S] [-y S] [-f] [--divvy DIVCFG] [-p P] - [-s S] [-c K [K ...]] [-u X] [-n N] [--looper-config LOOPER_CONFIG] - [-S YAML [YAML ...]] [-P YAML [YAML ...]] [-l N] [-k N] - [--sel-attr ATTR] [--sel-excl [E ...] | --sel-incl [I ...]] - [-a A [A ...]] - [config_file] - -Resubmit sample jobs with failed flags. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - -i, --ignore-flags Ignore run status flags? Default=False - -d, --dry-run Don't actually submit the jobs. Default=False - -t S, --time-delay S Time delay in seconds between job submissions - -x S, --command-extra S String to append to every command - -y S, --command-extra-override S Same as command-extra, but overrides values in PEP - -f, --skip-file-checks Do not perform input file checks - -u X, --lump X Total input file size (GB) to batch into one job - -n N, --lumpn N Number of commands to batch into one job - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - -divvy arguments: - Configure divvy to change computing settings - - --divvy DIVCFG Path to divvy configuration file. Default=$DIVCFG env - variable. Currently: not set - -p P, --package P Name of computing resource package to use - -s S, --settings S Path to a YAML settings file with compute settings - -c K [K ...], --compute K [K ...] List of key-value pairs (k1=v1) - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper rerun [-h] [-i] [-t TIME_DELAY] [-d] [-x COMMAND_EXTRA] + [-y COMMAND_EXTRA_OVERRIDE] [-u LUMP] [-n LUMP_N] + [-j LUMP_J] [--divvy DIVVY] [-f] + [-c COMPUTE [COMPUTE ...]] [--package PACKAGE] + [--settings SETTINGS] [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + -i, --ignore-flags Ignore run status flags (default: False) + -t TIME_DELAY, --time-delay TIME_DELAY + Time delay in seconds between job submissions (min: 0, + max: 30) (default: 0) + -d, --dry-run Don't actually submit jobs (default: False) + -x COMMAND_EXTRA, --command-extra COMMAND_EXTRA + String to append to every command (default: ) + -y COMMAND_EXTRA_OVERRIDE, --command-extra-override COMMAND_EXTRA_OVERRIDE + Same as command-extra, but overrides values in PEP + (default: ) + -u LUMP, --lump LUMP Total input file size (GB) to batch into one job + (default: None) + -n LUMP_N, --lump-n LUMP_N + Number of commands to batch into one job (default: + None) + -j LUMP_J, --lump-j LUMP_J + Lump samples into number of jobs. (default: None) + --divvy DIVVY Path to divvy configuration file. Default=$DIVCFG env + variable. Currently: not set (default: None) + -f, --skip-file-checks + Do not perform input file checks (default: False) + -c COMPUTE [COMPUTE ...], --compute COMPUTE [COMPUTE ...] + List of key-value pairs (k1=v1) (default: []) + --package PACKAGE Name of computing resource package to use (default: + None) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper report --help` ```console -usage: looper report [-h] [--looper-config LOOPER_CONFIG] [-S YAML [YAML ...]] - [-P YAML [YAML ...]] [-l N] [-k N] [--sel-attr ATTR] - [--sel-excl [E ...] | --sel-incl [I ...]] [-a A [A ...]] [--project] - [config_file] - -Create browsable HTML report of project results. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - --project Process project-level pipelines - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper report [-h] [--portable] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] + [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] + [--sel-excl SEL_EXCL] [-l LIMIT] [-k SKIP] + [--pep-config PEP_CONFIG] [-o OUTPUT_DIR] + [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + --portable Makes html report portable. (default: False) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper table --help` ```console -usage: looper table [-h] [--looper-config LOOPER_CONFIG] [-S YAML [YAML ...]] - [-P YAML [YAML ...]] [-l N] [-k N] [--sel-attr ATTR] - [--sel-excl [E ...] | --sel-incl [I ...]] [-a A [A ...]] [--project] - [config_file] - -Write summary stats table for project samples. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - --project Process project-level pipelines - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper table [-h] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper inspect --help` ```console -usage: looper inspect [-h] [--looper-config LOOPER_CONFIG] [-S YAML [YAML ...]] - [-P YAML [YAML ...]] [-l N] [-k N] [--sel-attr ATTR] - [--sel-excl [E ...] | --sel-incl [I ...]] [-a A [A ...]] - [--sample-names [SAMPLE_NAMES ...]] [--attr-limit ATTR_LIMIT] - [config_file] - -Print information about a project. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - --sample-names [SAMPLE_NAMES ...] Names of the samples to inspect - --attr-limit ATTR_LIMIT Number of attributes to display - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper inspect [-h] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] + [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] + [--sel-excl SEL_EXCL] [-l LIMIT] [-k SKIP] + [--pep-config PEP_CONFIG] [-o OUTPUT_DIR] + [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper init --help` ```console -usage: looper init [-h] [-f] [-o DIR] [-S YAML [YAML ...]] [-P YAML [YAML ...]] [-p] - pep_config - -Initialize looper config file. - -positional arguments: - pep_config Project configuration file (PEP) - -options: - -h, --help show this help message and exit - -f, --force Force overwrite - -o DIR, --output-dir DIR - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -p, --piface Generates generic pipeline interface +usage: looper init [-h] [-f] [-o OUTPUT_DIR] [--pep-config PEP_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + +optional arguments: + -f, --force-yes Provide upfront confirmation of destruction intent, to + skip console query. Default=False (default: False) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + +help: + -h, --help show this help message and exit ``` ## `looper destroy --help` ```console -usage: looper destroy [-h] [-d] [--force-yes] [--looper-config LOOPER_CONFIG] - [-S YAML [YAML ...]] [-P YAML [YAML ...]] [-l N] [-k N] - [--sel-attr ATTR] [--sel-excl [E ...] | --sel-incl [I ...]] - [-a A [A ...]] [--project] - [config_file] - -Remove output files of the project. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - -d, --dry-run Don't actually submit the jobs. Default=False - --force-yes Provide upfront confirmation of destruction intent, - to skip console query. Default=False - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - --project Process project-level pipelines - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper destroy [-h] [-d] [-f] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] + [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] + [--sel-excl SEL_EXCL] [-l LIMIT] [-k SKIP] + [--pep-config PEP_CONFIG] [-o OUTPUT_DIR] + [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + -d, --dry-run Don't actually submit jobs (default: False) + -f, --force-yes Provide upfront confirmation of destruction intent, to + skip console query. Default=False (default: False) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper check --help` ```console -usage: looper check [-h] [--describe-codes] [--itemized] [-f [F ...]] - [--looper-config LOOPER_CONFIG] [-S YAML [YAML ...]] - [-P YAML [YAML ...]] [-l N] [-k N] [--sel-attr ATTR] - [--sel-excl [E ...] | --sel-incl [I ...]] [-a A [A ...]] [--project] - [config_file] - -Check flag status of current runs. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - --describe-codes Show status codes description - --itemized Show a detailed, by sample statuses - -f [F ...], --flags [F ...] Check on only these flags/status values - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - --project Process project-level pipelines - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper check [-h] [--describe-codes] [--itemized] + [-f FLAGS [FLAGS ...]] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + --describe-codes Show status codes description. Default=False (default: + False) + --itemized Show detailed overview of sample statuses. + Default=False (default: False) + -f FLAGS [FLAGS ...], --flags FLAGS [FLAGS ...] + Only check samples based on these status flags. + (default: []) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` ## `looper clean --help` ```console -usage: looper clean [-h] [-d] [--force-yes] [--looper-config LOOPER_CONFIG] - [-S YAML [YAML ...]] [-P YAML [YAML ...]] [-l N] [-k N] - [--sel-attr ATTR] [--sel-excl [E ...] | --sel-incl [I ...]] - [-a A [A ...]] - [config_file] - -Run clean scripts of already processed jobs. - -positional arguments: - config_file Project configuration file (YAML) or pephub registry - path. - -options: - -h, --help show this help message and exit - -d, --dry-run Don't actually submit the jobs. Default=False - --force-yes Provide upfront confirmation of destruction intent, - to skip console query. Default=False - --looper-config LOOPER_CONFIG Looper configuration file (YAML) - -S YAML [YAML ...], --sample-pipeline-interfaces YAML [YAML ...] - Path to looper sample config file - -P YAML [YAML ...], --project-pipeline-interfaces YAML [YAML ...] - Path to looper project config file - -a A [A ...], --amend A [A ...] List of amendments to activate - -sample selection arguments: - Specify samples to include or exclude based on sample attribute values - - -l N, --limit N Limit to n samples - -k N, --skip N Skip samples by numerical index - --sel-attr ATTR Attribute for sample exclusion OR inclusion - --sel-excl [E ...] Exclude samples with these values - --sel-incl [I ...] Include only samples with these values +usage: looper clean [-h] [-d] [-f] [--settings SETTINGS] + [--exc-flag EXC_FLAG [EXC_FLAG ...]] + [--sel-flag SEL_FLAG [SEL_FLAG ...]] [--sel-attr SEL_ATTR] + [--sel-incl SEL_INCL [SEL_INCL ...]] [--sel-excl SEL_EXCL] + [-l LIMIT] [-k SKIP] [--pep-config PEP_CONFIG] + [-o OUTPUT_DIR] [--config-file CONFIG_FILE] + [--looper-config LOOPER_CONFIG] + [-S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...]] + [-P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...]] + [--pipestat PIPESTAT] [--amend AMEND [AMEND ...]] + [--project] + +optional arguments: + -d, --dry-run Don't actually submit jobs (default: False) + -f, --force-yes Provide upfront confirmation of destruction intent, to + skip console query. Default=False (default: False) + --settings SETTINGS Path to a YAML settings file with compute settings + (default: ) + --exc-flag EXC_FLAG [EXC_FLAG ...] + Sample exclusion flag (default: []) + --sel-flag SEL_FLAG [SEL_FLAG ...] + Sample selection flag (default: []) + --sel-attr SEL_ATTR Attribute for sample exclusion OR inclusion (default: + toggle) + --sel-incl SEL_INCL [SEL_INCL ...] + Include only samples with these values (default: []) + --sel-excl SEL_EXCL Exclude samples with these values (default: ) + -l LIMIT, --limit LIMIT + Limit to n samples (default: None) + -k SKIP, --skip SKIP Skip samples by numerical index (default: None) + --pep-config PEP_CONFIG + PEP configuration file (default: None) + -o OUTPUT_DIR, --output-dir OUTPUT_DIR + Output directory (default: None) + --config-file CONFIG_FILE + Project configuration file (default: None) + --looper-config LOOPER_CONFIG + Looper configuration file (YAML) (default: None) + -S SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...], --sample-pipeline-interfaces SAMPLE_PIPELINE_INTERFACES [SAMPLE_PIPELINE_INTERFACES ...] + Paths to looper sample pipeline interfaces (default: + []) + -P PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...], --project-pipeline-interfaces PROJECT_PIPELINE_INTERFACES [PROJECT_PIPELINE_INTERFACES ...] + Paths to looper project pipeline interfaces (default: + []) + --pipestat PIPESTAT Path to pipestat files. (default: None) + --amend AMEND [AMEND ...] + List of amendments to activate (default: []) + --project Is this command executed for project-level? (default: + False) + +help: + -h, --help show this help message and exit ``` diff --git a/docs/looper/using-geofetch.md b/docs/looper/using-geofetch.md index 113b1252..30491a5b 100644 --- a/docs/looper/using-geofetch.md +++ b/docs/looper/using-geofetch.md @@ -24,12 +24,3 @@ Now, you can convert the files from sra into fastq format: looper run --amend sra_convert ``` -## Run pipeline - -Add a pipeline interface to link to a project - -(Experimental) - -``` -looper mod "pipeline_interfaces: /path/to/piface.yaml" -``` diff --git a/docs/looper/variable-namespaces.md b/docs/looper/variable-namespaces.md index 40b69b58..437837b8 100644 --- a/docs/looper/variable-namespaces.md +++ b/docs/looper/variable-namespaces.md @@ -64,13 +64,11 @@ So, the compute namespace is first populated with any variables from the selecte ## 6. pipestat -The `pipestat` namespace conists of a group of variables that reflect the [pipestat](http://pipestat.databio.org) configuration for a submission. +The `pipestat` namespace consists of a group of variables that reflect the [pipestat](http://pipestat.databio.org) configuration for a submission. -1. schema (`PipestatManager.schema_path`) -2. results_file (`PipestatManager.file`) -3. record_id (`PipestatManager.record_identifier`) -4. namespace (`PipestatManager.namespace`) -5. config (`PipestatManager.config_path`) +1. results_file (`pipestat.results_file`) +2. record_identifier (`pipestat.record_identifier`) +3. config (`pipestat.config_file`) ## Mapping variables to submission templates using divvy adapters diff --git a/docs/looper/writing-a-pipeline-interface.md b/docs/looper/writing-a-pipeline-interface.md index 7a9585eb..4fcdd39a 100644 --- a/docs/looper/writing-a-pipeline-interface.md +++ b/docs/looper/writing-a-pipeline-interface.md @@ -32,3 +32,12 @@ Note: previous versions used the `path` variable instead of `var_templates: pipe Finally, populate the `command_template`. You can use the full power of Jinja2 Python templates here, but most likely you'll just need to use a few variables using curly braces. In this case, we refer to the `count_lines.sh` script with `{pipeline.var_templates.pipeline}`, which points directly to the `pipeline` variable defined above. Then, we use `{sample.file}` to refer to the `file` column in the sample table specified in the PEP. This pipeline thus takes a single positional command-line argument. You can make the command template much more complicated and refer to any sample or project attributes, as well as a bunch of [other variables made available by looper](variable-namespaces.md). Now, you have a basic functional pipeline interface. There are many more advanced features you can use to make your pipeline more powerful, such as providing a schema to specify inputs or outputs, making input-size-dependent compute settings, and more. For complete details, consult the formal [pipeline interface format specification](pipeline-interface-specification.md). + +## Example Pipeline Interface Using Pipestat +```yaml +pipeline_name: example_pipestat_pipeline +pipeline_type: sample +output_schema: pipestat_output_schema.yaml +command_template: > + python {looper.piface_dir}/count_lines.py {sample.file} {sample.sample_name} {pipestat.results_file} +``` From cdb54c08a6cf398fc7cb45e081f63f1e71e5944c Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Wed, 5 Jun 2024 09:55:57 -0400 Subject: [PATCH 2/6] update looper hello world example --- .gitignore | 4 +- docs/looper/code/hello-world.md | 381 ++++++++++++-------- docs/looper/notebooks/hello-world.ipynb | 455 +++++++++++++++--------- 3 files changed, 532 insertions(+), 308 deletions(-) diff --git a/.gitignore b/.gitignore index a16c703d..394a0266 100644 --- a/.gitignore +++ b/.gitignore @@ -2,4 +2,6 @@ site venv/ .venv/ -.DS_Store \ No newline at end of file +.DS_Store +/docs/looper/notebooks/hello_looper-master/ +/docs/looper/notebooks/master.zip diff --git a/docs/looper/code/hello-world.md b/docs/looper/code/hello-world.md index 4f31f201..7ede5acd 100644 --- a/docs/looper/code/hello-world.md +++ b/docs/looper/code/hello-world.md @@ -18,21 +18,21 @@ The [hello looper repository](http://github.com/pepkit/hello_looper) contains a !wget https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip ``` - --2023-11-08 17:27:01-- https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip - Resolving github.com (github.com)... 140.82.114.3 - Connecting to github.com (github.com)|140.82.114.3|:443... connected. + --2024-06-05 09:33:27-- https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip + Resolving github.com (github.com)... 140.82.112.3 + Connecting to github.com (github.com)|140.82.112.3|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master [following] - --2023-11-08 17:27:01-- https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master - Resolving codeload.github.com (codeload.github.com)... 140.82.113.10 - Connecting to codeload.github.com (codeload.github.com)|140.82.113.10|:443... connected. + --2024-06-05 09:33:27-- https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master + Resolving codeload.github.com (codeload.github.com)... 140.82.112.10 + Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/zip] Saving to: ‘master.zip’ - master.zip [ <=> ] 13.37K --.-KB/s in 0.03s + master.zip [ <=> ] 22.79K --.-KB/s in 0.003s - 2023-11-08 17:27:01 (472 KB/s) - ‘master.zip’ saved [13693] + 2024-06-05 09:33:28 (6.66 MB/s) - ‘master.zip’ saved [23340] @@ -41,43 +41,90 @@ The [hello looper repository](http://github.com/pepkit/hello_looper) contains a !unzip master.zip ``` - Archive: master.zip - 73ef08e38d3e17fd3d4f940282c80e3ee4dbb91f - creating: hello_looper-master/ - inflating: hello_looper-master/.gitignore - inflating: hello_looper-master/.looper.yaml - inflating: hello_looper-master/.looper_pephub.yaml - inflating: hello_looper-master/.looper_pipestat.yaml - inflating: hello_looper-master/.looper_pipestat_shell.yaml - inflating: hello_looper-master/README.md - creating: hello_looper-master/data/ - inflating: hello_looper-master/data/frog1_data.txt - inflating: hello_looper-master/data/frog2_data.txt - inflating: hello_looper-master/looper_pipelines.md - creating: hello_looper-master/old_specification/ - inflating: hello_looper-master/old_specification/README.md - creating: hello_looper-master/old_specification/data/ - inflating: hello_looper-master/old_specification/data/frog1_data.txt - inflating: hello_looper-master/old_specification/data/frog2_data.txt - creating: hello_looper-master/old_specification/pipeline/ - inflating: hello_looper-master/old_specification/pipeline/count_lines.sh - inflating: hello_looper-master/old_specification/pipeline/pipeline_interface.yaml - creating: hello_looper-master/old_specification/project/ - inflating: hello_looper-master/old_specification/project/project_config.yaml - inflating: hello_looper-master/old_specification/project/sample_annotation.csv - creating: hello_looper-master/pipeline/ - inflating: hello_looper-master/pipeline/count_lines.sh - inflating: hello_looper-master/pipeline/pipeline_interface.yaml - inflating: hello_looper-master/pipeline/pipeline_interface_project.yaml - creating: hello_looper-master/pipeline_pipestat/ - inflating: hello_looper-master/pipeline_pipestat/count_lines.py - inflating: hello_looper-master/pipeline_pipestat/count_lines_pipestat.sh - inflating: hello_looper-master/pipeline_pipestat/pipeline_interface.yaml - inflating: hello_looper-master/pipeline_pipestat/pipeline_interface_shell.yaml - inflating: hello_looper-master/pipeline_pipestat/pipestat_output_schema.yaml - creating: hello_looper-master/project/ - inflating: hello_looper-master/project/project_config.yaml - inflating: hello_looper-master/project/sample_annotation.csv + Archive: master.zip + 95c790f9b17e66a24e3e579c0ebc05a955b5c1d0 + creating: hello_looper-master/ + inflating: hello_looper-master/.gitignore + inflating: hello_looper-master/.looper.yaml + inflating: hello_looper-master/README.md + creating: hello_looper-master/advanced/ + inflating: hello_looper-master/advanced/.looper.yaml + inflating: hello_looper-master/advanced/.looper_advanced_pipestat.yaml + creating: hello_looper-master/advanced/pipeline/ + inflating: hello_looper-master/advanced/pipeline/output_schema.yaml + inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_project.yaml + inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_sample.yaml + inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_project.yaml + inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_sample.yaml + inflating: hello_looper-master/advanced/pipeline/pipestat_output_schema.yaml + inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface1_sample.yaml + inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface2_sample.yaml + inflating: hello_looper-master/advanced/pipeline/readData.R + inflating: hello_looper-master/advanced/pipeline/resources-project.tsv + inflating: hello_looper-master/advanced/pipeline/resources-sample.tsv + creating: hello_looper-master/advanced/project/ + inflating: hello_looper-master/advanced/project/annotation_sheet.csv + inflating: hello_looper-master/advanced/project/project_config.yaml + creating: hello_looper-master/csv/ + inflating: hello_looper-master/csv/.looper.yaml + creating: hello_looper-master/csv/data/ + inflating: hello_looper-master/csv/data/frog1_data.txt + inflating: hello_looper-master/csv/data/frog2_data.txt + creating: hello_looper-master/csv/pipeline/ + inflating: hello_looper-master/csv/pipeline/count_lines.sh + inflating: hello_looper-master/csv/pipeline/pipeline_interface.yaml + inflating: hello_looper-master/csv/pipeline/pipeline_interface_project.yaml + creating: hello_looper-master/csv/project/ + inflating: hello_looper-master/csv/project/sample_annotation.csv + creating: hello_looper-master/intermediate/ + inflating: hello_looper-master/intermediate/.looper.yaml + inflating: hello_looper-master/intermediate/.looper_project.yaml + creating: hello_looper-master/intermediate/data/ + inflating: hello_looper-master/intermediate/data/frog_1.txt + inflating: hello_looper-master/intermediate/data/frog_2.txt + creating: hello_looper-master/intermediate/pipeline/ + inflating: hello_looper-master/intermediate/pipeline/count_lines.sh + inflating: hello_looper-master/intermediate/pipeline/pipeline_interface.yaml + inflating: hello_looper-master/intermediate/pipeline/pipeline_interface_project.yaml + creating: hello_looper-master/intermediate/project/ + inflating: hello_looper-master/intermediate/project/project_config.yaml + inflating: hello_looper-master/intermediate/project/sample_annotation.csv + creating: hello_looper-master/minimal/ + inflating: hello_looper-master/minimal/.looper.yaml + creating: hello_looper-master/minimal/data/ + inflating: hello_looper-master/minimal/data/frog_1.txt + inflating: hello_looper-master/minimal/data/frog_2.txt + creating: hello_looper-master/minimal/pipeline/ + inflating: hello_looper-master/minimal/pipeline/count_lines.sh + inflating: hello_looper-master/minimal/pipeline/pipeline_interface.yaml + creating: hello_looper-master/minimal/project/ + inflating: hello_looper-master/minimal/project/project_config.yaml + inflating: hello_looper-master/minimal/project/sample_annotation.csv + creating: hello_looper-master/pephub/ + inflating: hello_looper-master/pephub/.looper.yaml + creating: hello_looper-master/pephub/data/ + inflating: hello_looper-master/pephub/data/frog1_data.txt + inflating: hello_looper-master/pephub/data/frog2_data.txt + creating: hello_looper-master/pephub/pipeline/ + inflating: hello_looper-master/pephub/pipeline/count_lines.sh + inflating: hello_looper-master/pephub/pipeline/pipeline_interface.yaml + inflating: hello_looper-master/pephub/pipeline/pipeline_interface_project.yaml + creating: hello_looper-master/pipestat/ + inflating: hello_looper-master/pipestat/.looper.yaml + inflating: hello_looper-master/pipestat/.looper_pipestat_shell.yaml + creating: hello_looper-master/pipestat/data/ + inflating: hello_looper-master/pipestat/data/frog_1.txt + inflating: hello_looper-master/pipestat/data/frog_2.txt + creating: hello_looper-master/pipestat/pipeline_pipestat/ + inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines.py + inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines_pipestat.sh + inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface.yaml + inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_project.yaml + inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_shell.yaml + inflating: hello_looper-master/pipestat/pipeline_pipestat/pipestat_output_schema.yaml + creating: hello_looper-master/pipestat/project/ + inflating: hello_looper-master/pipestat/project/project_config.yaml + inflating: hello_looper-master/pipestat/project/sample_annotation.csv ## 3. Run it @@ -86,35 +133,40 @@ Run it by changing to the directory and then invoking `looper run` on the projec ```python -!looper run --looper-config hello_looper-master/.looper.yaml +cd hello_looper-master/minimal ``` - Looper version: 1.5.2-dev - Command: run - Using default divvy config. You may specify in env var: ['DIVCFG'] - Pipestat compatible: False - ## [1 of 2] sample: frog_1; pipeline: count_lines - /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog1_data.txt - Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub - Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub - Compute node: databio - Start time: 2023-11-08 17:29:45 - wc: data/frog1_data.txt: No such file or directory - Number of lines: - ## [2 of 2] sample: frog_2; pipeline: count_lines - /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog2_data.txt - Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub - Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub - Compute node: databio - Start time: 2023-11-08 17:29:45 - wc: data/frog2_data.txt: No such file or directory - Number of lines: - - Looper finished - Samples valid for job generation: 2 of 2 - Commands submitted: 2 of 2 - Jobs submitted: 2 - {'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2} + /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal + + + /home/drc/GITHUB/pepspec/.venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library. + self.shell.db['dhist'] = compress_dhist(dhist)[-100:] + + + +```python +!looper run --looper-config .looper.yaml +``` + + Looper version: 1.8.1a1 + Command: run + Using default divvy config. You may specify in env var: ['DIVCFG'] + ## [1 of 2] sample: frog_1; pipeline: count_lines + Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub + Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub + Compute node: databio + Start time: 2024-06-05 09:37:34 + Number of lines: 4 + ## [2 of 2] sample: frog_2; pipeline: count_lines + Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub + Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub + Compute node: databio + Start time: 2024-06-05 09:37:34 + Number of lines: 7 + + Looper finished + Samples valid for job generation: 2 of 2 + {'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}  Voila! You've run your very first pipeline across multiple samples using `looper`! @@ -125,20 +177,26 @@ Now, let's inspect the `hello_looper` repository you downloaded. It has 3 compon ```python -!tree hello_looper-master/*/ +!tree * ``` - hello_looper-master/data/ - ├── frog1_data.txt - └── frog2_data.txt - hello_looper-master/pipeline/ - ├── count_lines.sh - └── pipeline_interface.yaml - hello_looper-master/project/ - ├── project_config.yaml - └── sample_annotation.csv - - 0 directories, 6 files + data + ├── frog_1.txt + └── frog_2.txt + pipeline + ├── count_lines.sh + └── pipeline_interface.yaml + project + ├── project_config.yaml + └── sample_annotation.csv + results + └── submission + ├── count_lines_frog_1.log + ├── count_lines_frog_1.sub + ├── count_lines_frog_2.log + └── count_lines_frog_2.sub + + 1 directory, 4 files These are: @@ -153,25 +211,25 @@ These are: When we invoke `looper` from the command line we told it to `run project/project_config.yaml`. `looper` reads the [project/project_config.yaml](https://github.com/pepkit/hello_looper/blob/master/project/project_config.yaml) file, which points to a few things: * the [project/sample_annotation.csv](https://github.com/pepkit/hello_looper/blob/master/project/sample_annotation.csv) file, which specifies a few samples, their type, and path to data file - * the `output_dir`, which is where looper results are saved. Results will be saved in `$HOME/hello_looper_results`. + * the `output_dir`, which is where looper results are saved. * the `pipeline_interface.yaml` file, ([pipeline/pipeline_interface.yaml](https://github.com/pepkit/hello_looper/blob/master/pipeline/pipeline_interface.yaml)), which tells looper how to connect to the pipeline ([pipeline/count_lines.sh](https://github.com/pepkit/hello_looper/blob/master/pipeline/)). The 3 folders (`data`, `project`, and `pipeline`) are modular; there is no need for these to live in any predetermined folder structure. For this example, the data and pipeline are included locally, but in practice, they are usually in a separate folder; you can point to anything (so data, pipelines, and projects may reside in distinct spaces on disk). You may also include more than one pipeline interface in your `project_config.yaml`, so in a looper project, many-to-many relationships are possible. ## Looper config -The [looper config](looper-config.md) contains paths to the project config, the output_dir as well as any define pipeline interfaces. +The [looper config](looper-config.md) contains paths to the project config, the output_dir as well as any defined pipeline interfaces. ```python -!cat hello_looper-master/.looper.yaml +!cat .looper.yaml ``` - pep_config: project/project_config.yaml # local path to pep config - # pep_config: pepkit/hello_looper:default # you can also use a pephub registry path - output_dir: "results" - pipeline_interfaces: - sample: pipeline/pipeline_interface.yaml + pep_config: project/project_config.yaml # local path to pep config + # pep_config: pepkit/hello_looper:default # you can also use a pephub registry path + output_dir: "results" + pipeline_interfaces: + sample: pipeline/pipeline_interface.yaml @@ -183,12 +241,11 @@ The project config file contains the PEP version and sample annotation sheet. (s ```python -!cat hello_looper-master/project/project_config.yaml +!cat project/project_config.yaml ``` - pep_version: 2.0.0 - sample_table: sample_annotation.csv - + pep_version: 2.0.0 + sample_table: sample_annotation.csv ## Pipeline Interface @@ -197,15 +254,15 @@ The [pipeline interface](pipeline-interface-specification.md) shows the pipeline ```python -!cat hello_looper-master/pipeline/pipeline_interface.yaml +!cat pipeline/pipeline_interface.yaml ``` - pipeline_name: count_lines - pipeline_type: sample - var_templates: - pipeline: '{looper.piface_dir}/count_lines.sh' - command_template: > - {pipeline.var_templates.pipeline} {sample.file} + pipeline_name: count_lines + pipeline_type: sample + var_templates: + pipeline: '{looper.piface_dir}/count_lines.sh' + command_template: > + {pipeline.var_templates.pipeline} {sample.file} Alright, next let's explore what this pipeline stuck into our `output_dir`: @@ -213,27 +270,24 @@ Alright, next let's explore what this pipeline stuck into our `output_dir`: ```python -!tree $HOME/hello_looper_results +!tree results/ ``` - /home/nsheff/hello_looper_results - ├── results_pipeline - └── submission - ├── count_lines.sh_frog_1.log - ├── count_lines.sh_frog_1.sub - ├── count_lines.sh_frog_2.log - ├── count_lines.sh_frog_2.sub - ├── frog_1.yaml - └── frog_2.yaml - - 2 directories, 6 files + results/ + └── submission + ├── count_lines_frog_1.log + ├── count_lines_frog_1.sub + ├── count_lines_frog_2.log + └── count_lines_frog_2.sub + + 1 directory, 4 files -Inside of an `output_dir` there will be two directories: +Inside of an `output_dir` there will be one to two directories: -- `results_pipeline` - a directory with output of the pipeline(s), for each sample/pipeline combination (often one per sample) - `submissions` - which holds a YAML representation of each sample and a log file for each submitted job +- `results_pipeline` - a directory with output of the pipeline(s) (if applicable), for each sample/pipeline combination (often one per sample). In this minimal example, the output was simply printed to terminal instead of producing files. From here to running hundreds of samples of various sample types is virtually the same effort! @@ -244,47 +298,55 @@ Looper also supports running a PEP from [PEPHub](https://pephub.databio.org/)! ```python -!cat hello_looper-master/.looper_pephub.yaml +cd .. +``` + + /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master + + + +```python +cd pephub ``` - pep_config: pepkit/hello_looper:default # pephub registry path or local path - output_dir: results - pipeline_interfaces: - sample: pipeline/pipeline_interface.yaml + /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub ```python -!looper run --looper-config hello_looper-master/.looper_pephub.yaml +!cat .looper.yaml ``` - Looper version: 1.5.2-dev + pep_config: pepkit/hello_looper:default # pephub registry path or local path + output_dir: results + pipeline_interfaces: + sample: pipeline/pipeline_interface.yaml + + + +```python +!looper run --looper-config .looper.yaml +``` + + Looper version: 1.8.1a1 Command: run Using default divvy config. You may specify in env var: ['DIVCFG'] No config key in Project, or reading project from dict - Processing project from dictionary... - Pipestat compatible: False ## [1 of 2] sample: frog_1; pipeline: count_lines - /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog1_data.txt - Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub - Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub + Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub + Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub Compute node: databio - Start time: 2023-11-09 15:39:28 - wc: data/frog1_data.txt: No such file or directory - Number of lines: + Start time: 2024-06-05 09:46:04 + Number of lines: 4 ## [2 of 2] sample: frog_2; pipeline: count_lines - /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog2_data.txt - Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub - Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub + Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub + Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub Compute node: databio - Start time: 2023-11-09 15:39:28 - wc: data/frog2_data.txt: No such file or directory - Number of lines: + Start time: 2024-06-05 09:46:04 + Number of lines: 7 Looper finished Samples valid for job generation: 2 of 2 - Commands submitted: 2 of 2 - Jobs submitted: 2 {'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}  @@ -294,15 +356,33 @@ Looper can also be used in tandem with [pipestat](https://pipestat.databio.org/e ```python -!cat hello_looper-master/.looper_pipestat.yaml +cd .. ``` - pep_config: ./project/project_config.yaml # pephub registry path or local path - output_dir: ./results - pipeline_interfaces: - sample: ./pipeline_pipestat/pipeline_interface.yaml - pipestat: + /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master + + + +```python +cd pipestat +``` + + /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pipestat + + + +```python +!cat .looper.yaml +``` + + pep_config: ./project/project_config.yaml # pephub registry path or local path + output_dir: ./results + pipeline_interfaces: + sample: ./pipeline_pipestat/pipeline_interface.yaml + project: ./pipeline_pipestat/pipeline_interface_project.yaml + pipestat: results_file_path: results.yaml + flag_file_dir: results/flags ## A few more basic looper options @@ -317,8 +397,8 @@ For `looper run`: There are also other commands: -- `looper check`: checks on the status (running, failed, completed) of your jobs -- `looper summarize`: produces an output file that summarizes your project results +- `looper check`: checks on the status (running, failed, completed) of your jobs (requires pipestat) +- `looper summarize`: produces an output file that summarizes your project results (requires pipestat) - `looper destroy`: completely erases all results so you can restart - `looper rerun`: rerun only jobs that have failed. @@ -327,3 +407,8 @@ There are also other commands: To use `looper` on your own, you will need to prepare 2 things: a **project** (metadata that define *what* you want to process), and **pipelines** (*how* to process data). To link your project to `looper`, you will need to [define a project](defining-a-project.md). You will want to either use pre-made `looper`-compatible pipelines or link your own custom-built pipelines. These docs will also show you how to connect your pipeline to your project. + + +```python + +``` diff --git a/docs/looper/notebooks/hello-world.ipynb b/docs/looper/notebooks/hello-world.ipynb index e6119f62..d4669b59 100644 --- a/docs/looper/notebooks/hello-world.ipynb +++ b/docs/looper/notebooks/hello-world.ipynb @@ -21,28 +21,28 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "--2023-11-08 17:27:01-- https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip\n", - "Resolving github.com (github.com)... 140.82.114.3\n", - "Connecting to github.com (github.com)|140.82.114.3|:443... connected.\n", + "--2024-06-05 09:33:27-- https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip\n", + "Resolving github.com (github.com)... 140.82.112.3\n", + "Connecting to github.com (github.com)|140.82.112.3|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master [following]\n", - "--2023-11-08 17:27:01-- https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master\n", - "Resolving codeload.github.com (codeload.github.com)... 140.82.113.10\n", - "Connecting to codeload.github.com (codeload.github.com)|140.82.113.10|:443... connected.\n", + "--2024-06-05 09:33:27-- https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master\n", + "Resolving codeload.github.com (codeload.github.com)... 140.82.112.10\n", + "Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [application/zip]\n", "Saving to: ‘master.zip’\n", "\n", - "master.zip [ <=> ] 13.37K --.-KB/s in 0.03s \n", + "master.zip [ <=> ] 22.79K --.-KB/s in 0.003s \n", "\n", - "2023-11-08 17:27:01 (472 KB/s) - ‘master.zip’ saved [13693]\n", + "2024-06-05 09:33:28 (6.66 MB/s) - ‘master.zip’ saved [23340]\n", "\n" ] } @@ -53,50 +53,97 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Archive: master.zip\r\n", - "73ef08e38d3e17fd3d4f940282c80e3ee4dbb91f\r\n", - " creating: hello_looper-master/\r\n", - " inflating: hello_looper-master/.gitignore \r\n", - " inflating: hello_looper-master/.looper.yaml \r\n", - " inflating: hello_looper-master/.looper_pephub.yaml \r\n", - " inflating: hello_looper-master/.looper_pipestat.yaml \r\n", - " inflating: hello_looper-master/.looper_pipestat_shell.yaml \r\n", - " inflating: hello_looper-master/README.md \r\n", - " creating: hello_looper-master/data/\r\n", - " inflating: hello_looper-master/data/frog1_data.txt \r\n", - " inflating: hello_looper-master/data/frog2_data.txt \r\n", - " inflating: hello_looper-master/looper_pipelines.md \r\n", - " creating: hello_looper-master/old_specification/\r\n", - " inflating: hello_looper-master/old_specification/README.md \r\n", - " creating: hello_looper-master/old_specification/data/\r\n", - " inflating: hello_looper-master/old_specification/data/frog1_data.txt \r\n", - " inflating: hello_looper-master/old_specification/data/frog2_data.txt \r\n", - " creating: hello_looper-master/old_specification/pipeline/\r\n", - " inflating: hello_looper-master/old_specification/pipeline/count_lines.sh \r\n", - " inflating: hello_looper-master/old_specification/pipeline/pipeline_interface.yaml \r\n", - " creating: hello_looper-master/old_specification/project/\r\n", - " inflating: hello_looper-master/old_specification/project/project_config.yaml \r\n", - " inflating: hello_looper-master/old_specification/project/sample_annotation.csv \r\n", - " creating: hello_looper-master/pipeline/\r\n", - " inflating: hello_looper-master/pipeline/count_lines.sh \r\n", - " inflating: hello_looper-master/pipeline/pipeline_interface.yaml \r\n", - " inflating: hello_looper-master/pipeline/pipeline_interface_project.yaml \r\n", - " creating: hello_looper-master/pipeline_pipestat/\r\n", - " inflating: hello_looper-master/pipeline_pipestat/count_lines.py \r\n", - " inflating: hello_looper-master/pipeline_pipestat/count_lines_pipestat.sh \r\n", - " inflating: hello_looper-master/pipeline_pipestat/pipeline_interface.yaml \r\n", - " inflating: hello_looper-master/pipeline_pipestat/pipeline_interface_shell.yaml \r\n", - " inflating: hello_looper-master/pipeline_pipestat/pipestat_output_schema.yaml \r\n", - " creating: hello_looper-master/project/\r\n", - " inflating: hello_looper-master/project/project_config.yaml \r\n", - " inflating: hello_looper-master/project/sample_annotation.csv \r\n" + "Archive: master.zip\n", + "95c790f9b17e66a24e3e579c0ebc05a955b5c1d0\n", + " creating: hello_looper-master/\n", + " inflating: hello_looper-master/.gitignore \n", + " inflating: hello_looper-master/.looper.yaml \n", + " inflating: hello_looper-master/README.md \n", + " creating: hello_looper-master/advanced/\n", + " inflating: hello_looper-master/advanced/.looper.yaml \n", + " inflating: hello_looper-master/advanced/.looper_advanced_pipestat.yaml \n", + " creating: hello_looper-master/advanced/pipeline/\n", + " inflating: hello_looper-master/advanced/pipeline/output_schema.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_project.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_sample.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_project.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_sample.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipestat_output_schema.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface1_sample.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface2_sample.yaml \n", + " inflating: hello_looper-master/advanced/pipeline/readData.R \n", + " inflating: hello_looper-master/advanced/pipeline/resources-project.tsv \n", + " inflating: hello_looper-master/advanced/pipeline/resources-sample.tsv \n", + " creating: hello_looper-master/advanced/project/\n", + " inflating: hello_looper-master/advanced/project/annotation_sheet.csv \n", + " inflating: hello_looper-master/advanced/project/project_config.yaml \n", + " creating: hello_looper-master/csv/\n", + " inflating: hello_looper-master/csv/.looper.yaml \n", + " creating: hello_looper-master/csv/data/\n", + " inflating: hello_looper-master/csv/data/frog1_data.txt \n", + " inflating: hello_looper-master/csv/data/frog2_data.txt \n", + " creating: hello_looper-master/csv/pipeline/\n", + " inflating: hello_looper-master/csv/pipeline/count_lines.sh \n", + " inflating: hello_looper-master/csv/pipeline/pipeline_interface.yaml \n", + " inflating: hello_looper-master/csv/pipeline/pipeline_interface_project.yaml \n", + " creating: hello_looper-master/csv/project/\n", + " inflating: hello_looper-master/csv/project/sample_annotation.csv \n", + " creating: hello_looper-master/intermediate/\n", + " inflating: hello_looper-master/intermediate/.looper.yaml \n", + " inflating: hello_looper-master/intermediate/.looper_project.yaml \n", + " creating: hello_looper-master/intermediate/data/\n", + " inflating: hello_looper-master/intermediate/data/frog_1.txt \n", + " inflating: hello_looper-master/intermediate/data/frog_2.txt \n", + " creating: hello_looper-master/intermediate/pipeline/\n", + " inflating: hello_looper-master/intermediate/pipeline/count_lines.sh \n", + " inflating: hello_looper-master/intermediate/pipeline/pipeline_interface.yaml \n", + " inflating: hello_looper-master/intermediate/pipeline/pipeline_interface_project.yaml \n", + " creating: hello_looper-master/intermediate/project/\n", + " inflating: hello_looper-master/intermediate/project/project_config.yaml \n", + " inflating: hello_looper-master/intermediate/project/sample_annotation.csv \n", + " creating: hello_looper-master/minimal/\n", + " inflating: hello_looper-master/minimal/.looper.yaml \n", + " creating: hello_looper-master/minimal/data/\n", + " inflating: hello_looper-master/minimal/data/frog_1.txt \n", + " inflating: hello_looper-master/minimal/data/frog_2.txt \n", + " creating: hello_looper-master/minimal/pipeline/\n", + " inflating: hello_looper-master/minimal/pipeline/count_lines.sh \n", + " inflating: hello_looper-master/minimal/pipeline/pipeline_interface.yaml \n", + " creating: hello_looper-master/minimal/project/\n", + " inflating: hello_looper-master/minimal/project/project_config.yaml \n", + " inflating: hello_looper-master/minimal/project/sample_annotation.csv \n", + " creating: hello_looper-master/pephub/\n", + " inflating: hello_looper-master/pephub/.looper.yaml \n", + " creating: hello_looper-master/pephub/data/\n", + " inflating: hello_looper-master/pephub/data/frog1_data.txt \n", + " inflating: hello_looper-master/pephub/data/frog2_data.txt \n", + " creating: hello_looper-master/pephub/pipeline/\n", + " inflating: hello_looper-master/pephub/pipeline/count_lines.sh \n", + " inflating: hello_looper-master/pephub/pipeline/pipeline_interface.yaml \n", + " inflating: hello_looper-master/pephub/pipeline/pipeline_interface_project.yaml \n", + " creating: hello_looper-master/pipestat/\n", + " inflating: hello_looper-master/pipestat/.looper.yaml \n", + " inflating: hello_looper-master/pipestat/.looper_pipestat_shell.yaml \n", + " creating: hello_looper-master/pipestat/data/\n", + " inflating: hello_looper-master/pipestat/data/frog_1.txt \n", + " inflating: hello_looper-master/pipestat/data/frog_2.txt \n", + " creating: hello_looper-master/pipestat/pipeline_pipestat/\n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines.py \n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines_pipestat.sh \n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface.yaml \n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_project.yaml \n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_shell.yaml \n", + " inflating: hello_looper-master/pipestat/pipeline_pipestat/pipestat_output_schema.yaml \n", + " creating: hello_looper-master/pipestat/project/\n", + " inflating: hello_looper-master/pipestat/project/project_config.yaml \n", + " inflating: hello_looper-master/pipestat/project/sample_annotation.csv \n" ] } ], @@ -115,45 +162,63 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/drc/GITHUB/pepspec/.venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n", + " self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n" + ] + } + ], + "source": [ + "cd hello_looper-master/minimal" + ] + }, + { + "cell_type": "code", + "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Looper version: 1.5.2-dev\r\n", - "Command: run\r\n", - "Using default divvy config. You may specify in env var: ['DIVCFG']\r\n", - "Pipestat compatible: False\r\n", - "\u001b[36m## [1 of 2] sample: frog_1; pipeline: count_lines\u001b[0m\r\n", - "/home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog1_data.txt\r\n", - "Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub\r\n", - "Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub\r\n", - "Compute node: databio\r\n", - "Start time: 2023-11-08 17:29:45\r\n", - "wc: data/frog1_data.txt: No such file or directory\r\n", - "Number of lines: \r\n", - "\u001b[36m## [2 of 2] sample: frog_2; pipeline: count_lines\u001b[0m\r\n", - "/home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog2_data.txt\r\n", - "Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub\r\n", - "Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub\r\n", - "Compute node: databio\r\n", - "Start time: 2023-11-08 17:29:45\r\n", - "wc: data/frog2_data.txt: No such file or directory\r\n", - "Number of lines: \r\n", - "\r\n", - "Looper finished\r\n", - "Samples valid for job generation: 2 of 2\r\n", - "Commands submitted: 2 of 2\r\n", - "Jobs submitted: 2\r\n", - "{'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}\r\n", + "Looper version: 1.8.1a1\n", + "Command: run\n", + "Using default divvy config. You may specify in env var: ['DIVCFG']\n", + "\u001b[36m## [1 of 2] sample: frog_1; pipeline: count_lines\u001b[0m\n", + "Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub\n", + "Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub\n", + "Compute node: databio\n", + "Start time: 2024-06-05 09:37:34\n", + "Number of lines: 4\n", + "\u001b[36m## [2 of 2] sample: frog_2; pipeline: count_lines\u001b[0m\n", + "Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub\n", + "Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub\n", + "Compute node: databio\n", + "Start time: 2024-06-05 09:37:34\n", + "Number of lines: 7\n", + "\n", + "Looper finished\n", + "Samples valid for job generation: 2 of 2\n", + "{'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}\n", "\u001b[0m" ] } ], "source": [ - "!looper run --looper-config hello_looper-master/.looper.yaml" + "!looper run --looper-config .looper.yaml" ] }, { @@ -181,22 +246,28 @@ "name": "stdout", "output_type": "stream", "text": [ - "hello_looper-master/data/\r\n", - "├── frog1_data.txt\r\n", - "└── frog2_data.txt\r\n", - "hello_looper-master/pipeline/\r\n", - "├── count_lines.sh\r\n", - "└── pipeline_interface.yaml\r\n", - "hello_looper-master/project/\r\n", - "├── project_config.yaml\r\n", - "└── sample_annotation.csv\r\n", - "\r\n", - "0 directories, 6 files\r\n" + "\u001b[01;34mdata\u001b[0m\n", + "├── frog_1.txt\n", + "└── frog_2.txt\n", + "\u001b[01;34mpipeline\u001b[0m\n", + "├── \u001b[01;32mcount_lines.sh\u001b[0m\n", + "└── pipeline_interface.yaml\n", + "\u001b[01;34mproject\u001b[0m\n", + "├── project_config.yaml\n", + "└── sample_annotation.csv\n", + "\u001b[01;34mresults\u001b[0m\n", + "└── \u001b[01;34msubmission\u001b[0m\n", + " ├── count_lines_frog_1.log\n", + " ├── count_lines_frog_1.sub\n", + " ├── count_lines_frog_2.log\n", + " └── count_lines_frog_2.sub\n", + "\n", + "1 directory, 4 files\n" ] } ], "source": [ - "!tree hello_looper-master/*/" + "!tree *" ] }, { @@ -219,7 +290,7 @@ "When we invoke `looper` from the command line we told it to `run project/project_config.yaml`. `looper` reads the [project/project_config.yaml](https://github.com/pepkit/hello_looper/blob/master/project/project_config.yaml) file, which points to a few things:\n", "\n", " * the [project/sample_annotation.csv](https://github.com/pepkit/hello_looper/blob/master/project/sample_annotation.csv) file, which specifies a few samples, their type, and path to data file\n", - " * the `output_dir`, which is where looper results are saved. Results will be saved in `$HOME/hello_looper_results`.\n", + " * the `output_dir`, which is where looper results are saved.\n", " * the `pipeline_interface.yaml` file, ([pipeline/pipeline_interface.yaml](https://github.com/pepkit/hello_looper/blob/master/pipeline/pipeline_interface.yaml)), which tells looper how to connect to the pipeline ([pipeline/count_lines.sh](https://github.com/pepkit/hello_looper/blob/master/pipeline/)).\n", "\n", "The 3 folders (`data`, `project`, and `pipeline`) are modular; there is no need for these to live in any predetermined folder structure. For this example, the data and pipeline are included locally, but in practice, they are usually in a separate folder; you can point to anything (so data, pipelines, and projects may reside in distinct spaces on disk). You may also include more than one pipeline interface in your `project_config.yaml`, so in a looper project, many-to-many relationships are possible." @@ -231,28 +302,28 @@ "source": [ "## Looper config\n", "\n", - "The [looper config](looper-config.md) contains paths to the project config, the output_dir as well as any dfine pipeline interfaces. " + "The [looper config](looper-config.md) contains paths to the project config, the output_dir as well as any defined pipeline interfaces. " ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "pep_config: project/project_config.yaml # local path to pep config\r\n", - "# pep_config: pepkit/hello_looper:default # you can also use a pephub registry path\r\n", - "output_dir: \"results\"\r\n", - "pipeline_interfaces:\r\n", - " sample: pipeline/pipeline_interface.yaml\r\n" + "pep_config: project/project_config.yaml # local path to pep config\n", + "# pep_config: pepkit/hello_looper:default # you can also use a pephub registry path\n", + "output_dir: \"results\"\n", + "pipeline_interfaces:\n", + " sample: pipeline/pipeline_interface.yaml\n" ] } ], "source": [ - "!cat hello_looper-master/.looper.yaml" + "!cat .looper.yaml" ] }, { @@ -268,20 +339,20 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "pep_version: 2.0.0\r\n", - "sample_table: sample_annotation.csv\r\n" + "pep_version: 2.0.0\n", + "sample_table: sample_annotation.csv" ] } ], "source": [ - "!cat hello_looper-master/project/project_config.yaml" + "!cat project/project_config.yaml" ] }, { @@ -295,24 +366,24 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "pipeline_name: count_lines\r\n", - "pipeline_type: sample\r\n", - "var_templates:\r\n", - " pipeline: '{looper.piface_dir}/count_lines.sh'\r\n", - "command_template: >\r\n", - " {pipeline.var_templates.pipeline} {sample.file}\r\n" + "pipeline_name: count_lines\n", + "pipeline_type: sample\n", + "var_templates:\n", + " pipeline: '{looper.piface_dir}/count_lines.sh'\n", + "command_template: >\n", + " {pipeline.var_templates.pipeline} {sample.file}\n" ] } ], "source": [ - "!cat hello_looper-master/pipeline/pipeline_interface.yaml" + "!cat pipeline/pipeline_interface.yaml" ] }, { @@ -324,29 +395,26 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "/home/nsheff/hello_looper_results\r\n", - "├── results_pipeline\r\n", - "└── submission\r\n", - " ├── count_lines.sh_frog_1.log\r\n", - " ├── count_lines.sh_frog_1.sub\r\n", - " ├── count_lines.sh_frog_2.log\r\n", - " ├── count_lines.sh_frog_2.sub\r\n", - " ├── frog_1.yaml\r\n", - " └── frog_2.yaml\r\n", - "\r\n", - "2 directories, 6 files\r\n" + "\u001b[01;34mresults/\u001b[0m\n", + "└── \u001b[01;34msubmission\u001b[0m\n", + " ├── count_lines_frog_1.log\n", + " ├── count_lines_frog_1.sub\n", + " ├── count_lines_frog_2.log\n", + " └── count_lines_frog_2.sub\n", + "\n", + "1 directory, 4 files\n" ] } ], "source": [ - "!tree $HOME/hello_looper_results" + "!tree results/" ] }, { @@ -354,10 +422,10 @@ "metadata": {}, "source": [ "\n", - "Inside of an `output_dir` there will be two directories:\n", + "Inside of an `output_dir` there will be one to two directories:\n", "\n", - "- `results_pipeline` - a directory with output of the pipeline(s), for each sample/pipeline combination (often one per sample)\n", "- `submissions` - which holds a YAML representation of each sample and a log file for each submitted job\n", + "- `results_pipeline` - a directory with output of the pipeline(s) (if applicable), for each sample/pipeline combination (often one per sample). In this minimal example, the output was simply printed to terminal instead of producing files.\n", "\n", "From here to running hundreds of samples of various sample types is virtually the same effort!\n" ] @@ -373,67 +441,93 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "pep_config: pepkit/hello_looper:default # pephub registry path or local path\r\n", - "output_dir: results\r\n", - "pipeline_interfaces:\r\n", - " sample: pipeline/pipeline_interface.yaml\r\n" + "/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master\n" ] } ], "source": [ - "!cat hello_looper-master/.looper_pephub.yaml" + "cd .." ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Looper version: 1.5.2-dev\n", + "/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub\n" + ] + } + ], + "source": [ + "cd pephub" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "pep_config: pepkit/hello_looper:default # pephub registry path or local path\n", + "output_dir: results\n", + "pipeline_interfaces:\n", + " sample: pipeline/pipeline_interface.yaml\n" + ] + } + ], + "source": [ + "!cat .looper.yaml" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Looper version: 1.8.1a1\n", "Command: run\n", "Using default divvy config. You may specify in env var: ['DIVCFG']\n", "No config key in Project, or reading project from dict\n", - "Processing project from dictionary...\n", - "Pipestat compatible: False\n", "\u001b[36m## [1 of 2] sample: frog_1; pipeline: count_lines\u001b[0m\n", - "/home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog1_data.txt\n", - "Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub\n", - "Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_1.sub\n", + "Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub\n", + "Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub\n", "Compute node: databio\n", - "Start time: 2023-11-09 15:39:28\n", - "wc: data/frog1_data.txt: No such file or directory\n", - "Number of lines: \n", + "Start time: 2024-06-05 09:46:04\n", + "Number of lines: 4\n", "\u001b[36m## [2 of 2] sample: frog_2; pipeline: count_lines\u001b[0m\n", - "/home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/pipeline/count_lines.sh data/frog2_data.txt\n", - "Writing script to /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub\n", - "Job script (n=1; 0.00Gb): /home/drc/GITHUB/looper/master/looper/docs_jupyter/hello_looper-master/results/submission/count_lines_frog_2.sub\n", + "Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub\n", + "Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub\n", "Compute node: databio\n", - "Start time: 2023-11-09 15:39:28\n", - "wc: data/frog2_data.txt: No such file or directory\n", - "Number of lines: \n", + "Start time: 2024-06-05 09:46:04\n", + "Number of lines: 7\n", "\n", "Looper finished\n", "Samples valid for job generation: 2 of 2\n", - "Commands submitted: 2 of 2\n", - "Jobs submitted: 2\n", "{'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}\n", "\u001b[0m" ] } ], "source": [ - "!looper run --looper-config hello_looper-master/.looper_pephub.yaml" + "!looper run --looper-config .looper.yaml" ] }, { @@ -447,24 +541,60 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master\n" + ] + } + ], + "source": [ + "cd .." + ] + }, + { + "cell_type": "code", + "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "pep_config: ./project/project_config.yaml # pephub registry path or local path\r\n", - "output_dir: ./results\r\n", - "pipeline_interfaces:\r\n", - " sample: ./pipeline_pipestat/pipeline_interface.yaml\r\n", - "pipestat:\r\n", - " results_file_path: results.yaml" + "/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pipestat\n" ] } ], "source": [ - "!cat hello_looper-master/.looper_pipestat.yaml" + "cd pipestat" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "pep_config: ./project/project_config.yaml # pephub registry path or local path\n", + "output_dir: ./results\n", + "pipeline_interfaces:\n", + " sample: ./pipeline_pipestat/pipeline_interface.yaml\n", + " project: ./pipeline_pipestat/pipeline_interface_project.yaml\n", + "pipestat:\n", + " results_file_path: results.yaml\n", + " flag_file_dir: results/flags" + ] + } + ], + "source": [ + "!cat .looper.yaml" ] }, { @@ -484,8 +614,8 @@ "\n", "There are also other commands:\n", "\n", - "- `looper check`: checks on the status (running, failed, completed) of your jobs\n", - "- `looper summarize`: produces an output file that summarizes your project results\n", + "- `looper check`: checks on the status (running, failed, completed) of your jobs (requires pipestat)\n", + "- `looper summarize`: produces an output file that summarizes your project results (requires pipestat)\n", "- `looper destroy`: completely erases all results so you can restart\n", "- `looper rerun`: rerun only jobs that have failed.\n" ] @@ -498,6 +628,13 @@ "\n", "To use `looper` on your own, you will need to prepare 2 things: a **project** (metadata that define *what* you want to process), and **pipelines** (*how* to process data). To link your project to `looper`, you will need to [define a project](defining-a-project.md). You will want to either use pre-made `looper`-compatible pipelines or link your own custom-built pipelines. These docs will also show you how to connect your pipeline to your project.\n" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -520,5 +657,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } From 9cc904b3a53d0dd1571e4700099f19af7a6f1c76 Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Wed, 5 Jun 2024 10:05:17 -0400 Subject: [PATCH 3/6] Update peppy, pipestat, pypiper changelogs from recent releases --- docs/peppy/changelog.md | 9 +++++++++ docs/pipestat/changelog.md | 7 ++++++- docs/pypiper/changelog.md | 4 ++++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/docs/peppy/changelog.md b/docs/peppy/changelog.md index 815c280e..27149088 100644 --- a/docs/peppy/changelog.md +++ b/docs/peppy/changelog.md @@ -2,6 +2,15 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. +## [0.40.2] -- 2024-05-28 +### Added +- added `sample_name` property to samples object. + +## [0.40.1] -- 2024-01-11 +### Fixed +- Initializing Project with `NaN` value instead of `None` in `from_pandas` method + + ## [0.40.0] -- 2023-12-18 **This version introduced backwards-incompatible changes.** diff --git a/docs/pipestat/changelog.md b/docs/pipestat/changelog.md index dabc1ef3..eb4eb621 100644 --- a/docs/pipestat/changelog.md +++ b/docs/pipestat/changelog.md @@ -2,13 +2,18 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. +## [0.9.2] - 2024-06-24 +### Changed +- User can override pipeline name via parameter or config file, otherwise look at output_schema, then fall back on default as last resort. +- Allow pipestat to proceed without creating a results file backend IF using "{record_identifier}" in the file path, helps address [Looper #471](https://github.com/pepkit/looper/issues/471) +- Reduce overall verbosity when creating backends + ## [0.9.1] - 2024-04-24 ### Fixed - Pipestat summarize html report columns now show stats only [#148](https://github.com/pepkit/pipestat/issues/148). - When creating HTML reports from both sample and project level results (multi results), only sample-level results show in the main index table [#150](https://github.com/pepkit/pipestat/issues/150). - Add more complex schema during dependency check to mitigate false test failures regarding different output schemas [#181](https://github.com/pepkit/pipestat/issues/181). - ## [0.9.0] - 2024-04-19 ### Fixed - Bug with rm_record for filebackend diff --git a/docs/pypiper/changelog.md b/docs/pypiper/changelog.md index 40a05f9b..e69b068b 100644 --- a/docs/pypiper/changelog.md +++ b/docs/pypiper/changelog.md @@ -1,5 +1,9 @@ # Changelog +## [0.14.2] -- 2024-05-07 +### Changed +- Addresses [#218](https://github.com/databio/pypiper/issues/218) + ## [0.14.1] -- 2024-04-19 ### Changed - remove pipestat_project_name from PipelineManager parameters From 9a4abab5bb7e46c7b5ebb513003c1f89a353c4fa Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Wed, 5 Jun 2024 10:11:58 -0400 Subject: [PATCH 4/6] fix typos in looper changelog for older 1.6.0 --- docs/looper/changelog.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/looper/changelog.md b/docs/looper/changelog.md index 00a7c05c..bf2ecc2e 100644 --- a/docs/looper/changelog.md +++ b/docs/looper/changelog.md @@ -44,6 +44,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm - `--lumpn` is now `--lump-n` - `--lump` is now `--lump-s` - +## [1.6.0] -- 2023-12-22 + ### Added - `looper link` creates symlinks for results grouped by record_identifier. It requires pipestat to be configured. [#72](https://github.com/pepkit/looper/issues/72) - basic tab completion. @@ -54,6 +56,9 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm - changed how looper configures pipestat [#411](https://github.com/pepkit/looper/issues/411) - initializing pipeline interface also writes an example `output_schema.yaml` and `count_lines.sh` pipeline +### Fixed +- filtering via attributes that are integers. + ## [1.5.1] -- 2023-08-14 ### Fixed From 09f57cc38ccd79d36ace4c2258a9344970122a68 Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Wed, 5 Jun 2024 10:58:07 -0400 Subject: [PATCH 5/6] add comment about project level in looper changelog --- docs/looper/changelog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/looper/changelog.md b/docs/looper/changelog.md index bf2ecc2e..e7c18c04 100644 --- a/docs/looper/changelog.md +++ b/docs/looper/changelog.md @@ -5,6 +5,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Fixed - added `-v` and `--version` to the CLI +- fixed running project level with `--project` argument ## [1.8.0] -- 2024-06-04 From fa209014222778cc793532a59ece5f43410a032a Mon Sep 17 00:00:00 2001 From: Donald Campbell <125581724+donaldcampbelljr@users.noreply.github.com> Date: Thu, 6 Jun 2024 09:37:33 -0400 Subject: [PATCH 6/6] add clarifications on overriding compute, and regarding var_templates --- .../pipeline-interface-specification.md | 20 +++++++++++++++++++ docs/looper/running-on-a-cluster.md | 4 ++++ docs/looper/variable-namespaces.md | 2 +- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/looper/pipeline-interface-specification.md b/docs/looper/pipeline-interface-specification.md index 49b8b395..a70e3e95 100644 --- a/docs/looper/pipeline-interface-specification.md +++ b/docs/looper/pipeline-interface-specification.md @@ -217,6 +217,26 @@ This final line in the resources `tsv` must include `NaN` in the `max_file_size` This section can consist of multiple variable templates that are rendered and can be reused. The namespaces available to the templates are listed in [variable namespaces](variable-namespaces.md) section. Please note that the variables defined here (even if they are paths) are arbitrary and are *not* subject to be made relative. Therefore, the pipeline interface author needs take care of making them portable (the `{looper.piface_dir}` value comes in handy!). +Example using var_templates: +```yaml +pipeline_name: example_pipeline +pipeline_type: sample +output_schema: output_schema.yaml +var_templates: + pipeline: "{looper.piface_dir}/pipelines/pipeline1.py" +command_template: > + {pipeline.var_templates.pipeline} --sample-name {sample.sample_name} --req-attr {sample.attr} +``` + +Example without var_templates: +```yaml +pipeline_name: example_pipeline +pipeline_type: sample +output_schema: output_schema.yaml +command_template: > + python {looper.piface_dir}/count_lines.py {sample.file} {sample.sample_name} +``` + #### pre_submit This section can consist of two subsections: `python_functions` and/or `command_templates`, which specify the pre-submission tasks to be run before the main pipeline command is submitted. Please refer to the [pre-submission hooks system](pre-submission-hooks.md) section for a detailed explanation of this feature and syntax. diff --git a/docs/looper/running-on-a-cluster.md b/docs/looper/running-on-a-cluster.md index 76fe54ae..bd664121 100644 --- a/docs/looper/running-on-a-cluster.md +++ b/docs/looper/running-on-a-cluster.md @@ -13,6 +13,10 @@ divvy init -c $DIVCFG Looper will now have access to your computing configuration. You can run `divvy list` to see what compute packages are available in this file. For example, you'll start with a package called 'slurm', which you can use with looper by calling `looper --package slurm`. For many systems (SLURM, SGE, LFS, etc), the default divvy configuration will work out of the box. If you need to tweak things, the template system is flexible and you can configure it to run in any compute environment. That's all there is to it. +You can also override the computing configuration in the CLI, e.g. + +``looper run --looper-config .your_config.yaml --package slurm --compute PARTITION=standard time='01-00:00:00' cores='32' mem='32000'`` + Complete details on how to configure divvy are described in the [divvy documentation](http://divvy.databio.org). ## Divvy config file locations diff --git a/docs/looper/variable-namespaces.md b/docs/looper/variable-namespaces.md index 437837b8..44a04898 100644 --- a/docs/looper/variable-namespaces.md +++ b/docs/looper/variable-namespaces.md @@ -55,7 +55,7 @@ The `looper.command` value is what enables the two-layer template system, whereb The `compute` namespace consists of a group of variables relevant for computing resources. The `compute` namespace has a unique behavior: it aggregates variables from several sources in a priority order, overriding values with more specific ones as priority increases. The list of variable sources in priority order is: -1. Looper CLI (`--compute` or `--settings` for on-the-fly settings) +1. Looper CLI (`--compute` or `--settings` for on-the-fly settings) e.g. `looper run --looper-config .your_config.yaml --package slurm --compute PARTITION=standard time='01-00:00:00' cores='32' mem='32000'` 2. PEP config, `project.looper.compute` section 3. Pipeline interface, `compute` section 4. Activated divvy compute package (`--package` CLI argument)