Skip to content
This repository has been archived by the owner on Dec 1, 2023. It is now read-only.

Remove tutorial in order to have any changes #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ This repository aims to show users how to run `nf-core` pipeline with PEP as inp

The command would look something like that:
```
nextflow run nf-core/taxprofiler -profile test_pep --outdir /home/cgf8xr/nextflow-output
nextflow run main.nf -profile test_pep,docker --outdir <output_directory>
```
66 changes: 31 additions & 35 deletions developers_tutorial.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Tutorial for integrating `nf-core` with PEP

## Introduction and summary

This tutorial explains how to adapt `nf-core`
[pipelines](https://nf-co.re/pipelines) to accept sample metadata in PEP format.
An example implementation can be found
in the `taxprofiler` [pipeline](https://nf-co.re/taxprofiler).
A pull request with all the changes needed can be found here.
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
The steps to accomplish that are as follows:
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
The steps to accomplish PEP-`nf-core` integration for any `nf-core` pipeline are as follows:

1. Rewrite all pipeline input checks to [PEP schema](http://eido.databio.org/en/latest/writing-a-schema/).
2. If the script to check input does something more than input validation, then decouple the logic.
Expand All @@ -21,56 +20,53 @@ Below is detailed explanation of these tasks as well
as other information with additional resources that may be
useful during implementation.

## 1. Rewrite all pipeline input checks

In general, `nf-core` pipelines usually consist of a `check_samplesheet.py`
(or similarly named) Python script that is validates the
`samplesheet.csv` file. This validation checks if all mandatory columns are present in the file,
if all required columns have data, if extensions of the files are correct, etc.

Here, we propose switching this approach to insetad use a PEP schema, so that the PEP validator (`eido`) can be used to accomplish
all checks formerly performed by `check_samplesheet.py`. Example PEP schema for `taxprofiler`
pipeline can be found here.
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved

## 2. Decouple in case of emergency
## Steps to complete the integration
### 1. Rewrite all pipeline input checks
In general `nf-core` pipelines usually consist of `check_samplesheet.py`
(or similarly named) Python script that is responsible for validation of
`samplesheet.csv` file (eg. if all mandatory columns are present in the file,
if all required columns have data, if extensions of the files are correct, etc.).
The goal of this task is to create a PEP schema from scratch, so that it exactly reflects
all the check from `check_samplesheet.py` Python script.
[Example PEP schema](https://github.com/nf-core/taxprofiler/pull/133/files#diff-abc09af6a9de56ba2e40d0fa32a4c0f8c2cd30a0299488c4d922453ad20f3100)
for `taxprofiler` pipeline is available in the pipeline code.

### 2. Decouple in case of emergency
In some cases previously mentioned `check_samplesheet.py` script not only was supposed to validate
the input files, but was also adding additional column with information what type of reads
given row has.

Since `eido` is a tool just for validation, one can't add any column by using `eido/validate`.
The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification
of the input file and move it to separate Python script (`bin/place_the_script_here.py`). That way one can
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification
of the input file and move it to separate Python script (`bin/place_the_script_here.py` in `taxprofiler` source code). That way one can
still remove all the logic responsible for validation and replace it with `eido`, and modify the input
`samplesheet.csv` using newly extracted Python script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also be possible by using PEP capabilities, as maybe an alternative approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how popular is that in nf-core pipelines but I talked about it with James and he mentioned something about moving this logic to separate workflow in the pipeline. In that case this problem will disappear.


## 3. Add PEP as input parameter
### 3. Update --input parameter
It will be good if all the pipelines will share a common interface, so that users can run PEP with all the
pipelines the same way. To accomplish that, the `--pep` parameter should be added to the pipeline.
Developer should allow pipeline to consume `--pep` parameter and make it mandatory to provide either `--input`
or `--pep` when running a pipeline (by default user must always pass `--input`). In case of `taxprofiler` pipeline
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
two files had to be edited: `lib/WorkflowMain.groovy` and `workflows/taxprofiler.nf`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please detail the edits here -- exactly what did you have to change in these? just saying they had to be edited is just the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is necessary? I didn't wanted to describe everything, because we have PR available, where the developer can check what exactly was changed. This also might differ for reach pipeline so I wanted to leave it more general

pipelines the same way. Developer should adjust `--input` parameter to be able to accept also PEP config.

## 4. Adjust `nextflow_schema.json`
This step is strongly coupled with `3. Add PEP as input parameter`. When adding new parameter to the pipeline,
one must adjust the `nextflow_schema.json` to avoild validation errors. The only thing needed here is to tell
that instead of one mandatory argument (`--input`), we will now have one of `[--input, --pep]` as mandatory.
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
The developer must also update `nextflow_schema.json`. When adding new parameter to the pipeline,
he must adjust the `nextflow_schema.json` to avoid validation errors. The only thing needed here is to
allow passing `yaml` files in the schema.

## 5. Install `eido` modules
### 4. Install `eido` modules
Eido is currently added as a module to `nf-core` modules. That way it can be shared across all the pipelines.
To be able to use `EIDO_VALIDATE` and `EIDO_CONVERT` commands in the pipeline, the developer first must install the
modules for current pipeline. Tutorial how to do it can be found
[here](https://nf-co.re/tools/#install-modules-in-a-pipeline).
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
modules for current pipeline. There is available tutorial [how to install modules in a pipeline](https://nf-co.re/tools/#install-modules-in-a-pipeline).

## 6. Adjust the workflow responsible for input check
### 5. Adjust the workflow responsible for input check
When incorporating new modules, the workflow will change. In my case changes were needed in
`modules/local/samplesheet_check.nf` and `subworkflows/local/input_check.nf`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify in a bit more detail what these changes were.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above. Do we want to go into details here when whole PR is available to see the changes exactly?


## 7. Create test config
### 6. Create test config
Developer should create test config so that user can run pipeline with PEP as input with minimal effort.
In order to do it, new config profile should be added as shown in `taxprofiler` pull request.
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
In order to do it, new config profile should be added as shown in `taxprofiler` [pull request containing
all changes](https://github.com/nf-core/taxprofiler/pull/133/files#diff-13b96be1e48daf716d5ac39dae9f905df6a0e0d4af0232e3f5c36fd52a178862).
Config will contain the minimal setup allowing to run analysis using PEP files.

## 8. Other information
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
### Biocontainers
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
## Other information
### How to add the tool to biocontainers
In general all necessary modules (`eido/validate` and `eido/convert`) are already added to `nf-core modules`,
but it may happen that the developer will need to add other tools. In order to do it, it's good to know how
this works for `nf-core`. To be able to use any container in `nf-core` pipelines they should be hosted on
Expand All @@ -79,5 +75,5 @@ There are two ways to accomplish that:

1. Put `peppy` to `bioconda`. This is the easiest way, and when `peppy` is available in `bioconda`, then
`biocontainers` provide an automated container creation for this tool.
2. Manually add `peppy` to biocontainers. Detailed tutorial how to do it is available
[here](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html).
rafalstepien marked this conversation as resolved.
Show resolved Hide resolved
2. Manually add `peppy` to biocontainers. There is detailed
[tutorial how to add the tool to biocontainers](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html) available.