pepkit · rafalstepien · Sep 16, 2022 · Sep 16, 2022 · Sep 16, 2022 · nsheff
diff --git a/README.md b/README.md
@@ -2,5 +2,5 @@ This repository aims to show users how to run `nf-core` pipeline with PEP as inp
 
 The command would look something like that:
 ```
-nextflow run nf-core/taxprofiler -profile test_pep --outdir /home/cgf8xr/nextflow-output
+nextflow run main.nf -profile test_pep,docker --outdir <output_directory>
 ```
diff --git a/developers_tutorial.md b/developers_tutorial.md
@@ -1,13 +1,12 @@
 # Tutorial for integrating `nf-core` with PEP 
-
 ## Introduction and summary
 
 This tutorial explains how to adapt `nf-core` 
 [pipelines](https://nf-co.re/pipelines) to accept sample metadata in PEP format.
 An example implementation can be found 
 in the `taxprofiler` [pipeline](https://nf-co.re/taxprofiler). 
 A pull request with all the changes needed can be found here. 
-The steps to accomplish that are as follows:
+The steps to accomplish PEP-`nf-core` integration for any `nf-core` pipeline are as follows:
 
 1. Rewrite all pipeline input checks to [PEP schema](http://eido.databio.org/en/latest/writing-a-schema/).
 2. If the script to check input does something more than input validation, then decouple the logic.
@@ -21,56 +20,53 @@ Below is detailed explanation of these tasks as well
 as other information with additional resources that may be 
 useful during implementation.
 
-## 1. Rewrite all pipeline input checks
-
-In general, `nf-core` pipelines usually consist of a `check_samplesheet.py` 
-(or similarly named) Python script that is validates the 
-`samplesheet.csv` file. This validation checks if all mandatory columns are present in the file, 
-if all required columns have data, if extensions of the files are correct, etc.
-
-Here, we propose switching this approach to insetad use a PEP schema, so that the PEP validator (`eido`) can be used to accomplish 
-all checks formerly performed by `check_samplesheet.py`. Example PEP schema for `taxprofiler`
-pipeline can be found here.
-
-## 2. Decouple in case of emergency
+## Steps to complete the integration
+### 1. Rewrite all pipeline input checks
+In general `nf-core` pipelines usually consist of `check_samplesheet.py` 
+(or similarly named) Python script that is responsible for validation of 
+`samplesheet.csv` file (eg. if all mandatory columns are present in the file, 
+if all required columns have data, if extensions of the files are correct, etc.).
+The goal of this task is to create a PEP schema from scratch, so that it exactly reflects
+all the check from `check_samplesheet.py` Python script. 
+[Example PEP schema](https://github.com/nf-core/taxprofiler/pull/133/files#diff-abc09af6a9de56ba2e40d0fa32a4c0f8c2cd30a0299488c4d922453ad20f3100) 
+for `taxprofiler` pipeline is available in the pipeline code.
+
+### 2. Decouple in case of emergency
 In some cases previously mentioned `check_samplesheet.py` script not only was supposed to validate 
 the input files, but was also adding additional column with information what type of reads
 given row has.
 
 Since `eido` is a tool just for validation, one can't add any column by using `eido/validate`. 
-The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification 
-of the input file and move it to separate Python script (`bin/place_the_script_here.py`). That way one can 
+The best option here is to identify (within `check_samplesheet.py`) the logic responsible for modification
+of the input file and move it to separate Python script (`bin/place_the_script_here.py` in `taxprofiler` source code). That way one can 
 still  remove all the logic responsible for validation and replace it with `eido`, and modify the input
 `samplesheet.csv` using newly extracted Python script.
 
-## 3. Add PEP as input parameter
+### 3. Update --input parameter
 It will be good if all the pipelines will share a common interface, so that users can run PEP with all the
-pipelines the same way. To accomplish that, the `--pep` parameter should be added to the pipeline.
-Developer should allow pipeline to consume `--pep` parameter and make it mandatory to provide either `--input`
-or `--pep` when running a pipeline (by default user must always pass `--input`). In case of `taxprofiler` pipeline
-two files had to be edited: `lib/WorkflowMain.groovy` and `workflows/taxprofiler.nf`.
+pipelines the same way. Developer should adjust `--input` parameter to be able to accept also PEP config.
 
-## 4. Adjust `nextflow_schema.json`
-This step is strongly coupled with `3. Add PEP as input parameter`. When adding new parameter to the pipeline,
-one must adjust the `nextflow_schema.json` to avoild validation errors. The only thing needed here is to tell
-that instead of one mandatory argument (`--input`), we will now have one of `[--input, --pep]` as mandatory.
+The developer must also update `nextflow_schema.json`. When adding new parameter to the pipeline,
+he must adjust the `nextflow_schema.json` to avoid validation errors. The only thing needed here is to
+allow passing `yaml` files in the schema.
 
-## 5. Install `eido` modules
+### 4. Install `eido` modules
 Eido is currently added as a module to `nf-core` modules. That way it can be shared across all the pipelines.
 To be able to use `EIDO_VALIDATE` and `EIDO_CONVERT` commands in the pipeline, the developer first must install the
-modules for current pipeline. Tutorial how to do it can be found 
-[here](https://nf-co.re/tools/#install-modules-in-a-pipeline).
+modules for current pipeline. There is available tutorial [how to install modules in a pipeline](https://nf-co.re/tools/#install-modules-in-a-pipeline).
 
-## 6. Adjust the workflow responsible for input check
+### 5. Adjust the workflow responsible for input check
 When incorporating new modules, the workflow will change. In my case changes were needed in 
 `modules/local/samplesheet_check.nf` and `subworkflows/local/input_check.nf`.
 
-## 7. Create test config
+### 6. Create test config
 Developer should create test config so that user can run pipeline with PEP as input with minimal effort.
-In order to do it, new config profile should be added as shown in `taxprofiler` pull request.
+In order to do it, new config profile should be added as shown in `taxprofiler` [pull request containing
+all changes](https://github.com/nf-core/taxprofiler/pull/133/files#diff-13b96be1e48daf716d5ac39dae9f905df6a0e0d4af0232e3f5c36fd52a178862).
+Config will contain the minimal setup allowing to run analysis using PEP files.
 
-## 8. Other information
-### Biocontainers
+## Other information
+### How to add the tool to biocontainers
 In general all necessary modules (`eido/validate` and `eido/convert`) are already added to `nf-core modules`,
 but it may happen that the developer will need to add other tools. In order to do it, it's good to know how 
 this works for `nf-core`. To be able to use any container in `nf-core` pipelines they should be hosted on 
@@ -79,5 +75,5 @@ There are two ways to accomplish that:
 
 1. Put `peppy` to `bioconda`. This is the easiest way, and when `peppy` is available in `bioconda`, then
    `biocontainers` provide an automated container creation for this tool.
-2. Manually add `peppy` to biocontainers. Detailed tutorial how to do it is available 
-   [here](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html).
+2. Manually add `peppy` to biocontainers. There is detailed 
+   [tutorial how to add the tool to biocontainers](https://biocontainers-edu.readthedocs.io/en/latest/contributing.html) available.