Skip to content

Commit

Permalink
Update tutorial, add tutorail directory to gitignore
Browse files Browse the repository at this point in the history
  • Loading branch information
jfy133 committed Oct 15, 2024
1 parent 7729415 commit 5368c72
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 17 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ testing*
null/
.nf-test*
test.xml
tutorial/
83 changes: 66 additions & 17 deletions docs/usage/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ This page provides a range of tutorials to help give you a more step-by-step gui
nf-core/createtaxdb can be used to generate input database sheets for pipelines such as [nf-core/taxprofiler](https://nf-co.re/taxprofiler).
This tutorial will guide you step-by-step on how to setup a nf-core/createtaxdb run so that you can then run nf-core/taxprofiler almost straight away after.

## Prerequisites
### Prerequisites

### Hardware
#### Hardware

The datasets provided here _should_ be small enough to run on your laptop or desktop computer.
It should not require a HPC or similar.
Expand All @@ -18,14 +18,14 @@ If you wish to use a HPC cluster or cloud, and don’t wish to use an ‘interac

You will need internet access and at least X.X GB of hard-drive space.

#### Software
##### Software

The tutorial assumes you are on a Unix based operating system, and have already installed Nextflow as well a software environment system such as [Conda](https://docs.conda.io/en/latest/miniconda.html), [Docker](https://www.docker.com/), or [Singularity/Apptainer](https://apptainer.org/).
The tutorial will use Docker, however you can simply replace references to `docker` with `conda`, `singularity`, or `apptainer` accordingly.

It also assumes you've already pulled both nf-core/createtaxdb and nf-core/taxprofiler pipelines with `nextflow pull`.

### Data
#### Data

First we will make a directory to run the whole tutorial in.

Expand All @@ -34,7 +34,7 @@ mkdir createtaxdb-tutorial
cd createtaxdb-tutorial/
```

#### nf-core/createtaxdb
##### nf-core/createtaxdb

First we will need to download some reference genome FASTAs for building into databases for nf-core/createtaxdb.
These species are present in our sequencing reads we will use for nf-core/taxprofiler.
Expand All @@ -53,7 +53,7 @@ curl -O https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/creat
curl -O https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/createtaxdb/data/tutorials/nodes_reduced.dmp
```

#### nf-core/taxprofiler
##### nf-core/taxprofiler

Then, for aligning against our database, we will use very small short-read (pre-subset) metagenomes used for testing.
We will download two sets of paired-end sequencing reads.
Expand Down Expand Up @@ -81,9 +81,9 @@ process {
}
```

### Generating databases
### Running nf-core/createtaxdb

#### Input Samplesheet preparation
#### Input nf-core/createtaxdb samplesheet preparation

To generate our database files from our FASTA files with nf-core/createtaxdb, we first write the createtaxdb samplesheet.
In this case we wil make a file called `input.csv` in a text editor which will contain the following:
Expand All @@ -96,11 +96,11 @@ Homo_sapiens,9606,NC_012920.1.fa.gz,

We won't generate amino-acid based databases, so we just supply our two downloaded nucleotide reference genomes. and leave the `fasta_aa` column blank.

#### Prepare command
#### Prepare and run nf-core/createtaxdb command

Then we can construct a nf-core/createtaxdb command.
In this example, we will create two databases, one for `kraken2`, and one for a`centrifuge`.
We will supply the input files of the `input.csv`, and the three taxdmp related files: `nodes_reduced.dmp`, `names_reduced.dmp`, and `accession2taxid_reduced.dmp`.
We will supply the input files of the `input.csv`, and the three taxdmp related files: `nodes_reduced.dmp`, `names_reduced.dmp`, and `nucl_gb.accession2taxid`.
We will also specify the name of the database to build, and also the two tools to generate databases for.

```bash
Expand All @@ -115,8 +115,8 @@ nextflow run ../main.nf \
--namesdmp names_reduced.dmp \
--accession2taxid nucl_gb.accession2taxid \
--nucl2taxid nucl2taxid.txt \
--outdir createtaxdb_results \
--dbname tutorial \
--outdir results \
--build_kraken2 \
--build_centrifuge \
--generate_tar_archive \
Expand All @@ -125,26 +125,38 @@ nextflow run ../main.nf \
--generate_samplesheet_dbtype 'tar'
```

Once completed, we can look in the specified `--outdir` directory, for a subdirectory called `downstream_samplesheets`.
In this tutorial this is `results/downstream_samplesheets`.
After the input and output files, we also specify a name for the databases to created, to build the two databases of interest for the two tools, and also tell the pipeline to generate `.tar.gz` archives of the generated databases, and to generate the samplesheet for `taxprofiler`.

Generating the `.tar.gz` files is optional, as nf-core/taxprofiler can accept directories - however this can be useful for cloud contexts where you cannot 'download' directories.
If we want to supply just the raw database output (i.e., no `tar` archives), we would specify `--generate_samplesheet_dbtype 'raw'.

Currently, nf-core/createtaxdb only supports generating a single pipeline samplesheet.
However in the future, when this is extended to other pipelines, you can supply a comma separated list to make samplesheets for multiple pipelines at once! For example with `--generate_pipeline_samplesheets 'taxprofiler,ampliseq'`.

Now we have a better understanding of what the command is doing, we can execute it, and the pipeline should run!

#### nf-core/createtaxdb output

Once the run finishes, we can look in the specified `--outdir` directory, for a subdirectory called `downstream_samplesheets`.
In this tutorial this is `createtaxdb_results/downstream_samplesheets`.

This samplesheet, called `taxprofiler.csv`, can be used used as input to nf-core/taxprofiler!

It should look something like this:

```csv title="taxprofiler.csv"
tool,db_name,db_params,db_path
kraken2,tutorial-kraken2,,/<path>/<to>/results/kraken2/tutorial-kraken2.tar.gz
centrifuge,tutorial-centrifuge,,/<path>/<to>/results/centrifuge/tutorial-centrifuge.tar.gz
kraken2,tutorial-kraken2,,/<path>/<to>/createtaxdb_results/kraken2/tutorial-kraken2.tar.gz
centrifuge,tutorial-centrifuge,,/<path>/<to>/createtaxdb_results/centrifuge/tutorial-centrifuge.tar.gz
```

Note that paths to the databases in this `.csv` file point to the nf-core/createtaxdb results run directory, make sure not to delete them!
Note that paths to the databases in this `.csv` file point to the nf-core/createtaxdb results run directory, make sure not to move or delete them!

### Running nf-core/taxprofiler

Now with our downloaded reads files that we prepared during the preparation section of this tutorial, and our database samplesheet we can run nf-core/taxprofiler!

#### Input Samplesheet preparation
#### Input nf-core/taxprofiler samplesheet preparation

Firstly we prepare an input _samplesheet_ (not database sheet!) with our reads and their metadata.

Expand All @@ -156,3 +168,40 @@ sample,run_accession,instrument_platform,fastq_1,fastq_2,fasta
ERX5474932,ERR5766176,ILLUMINA,ERX5474932_ERR5766176_1.fastq.gz,ERX5474932_ERR5766176_2.fastq.gz,
ERX5474932,ERR5766176_B,ILLUMINA,ERX5474932_ERR5766176_B_1.fastq.gz,ERX5474932_ERR5766176_B_2.fastq.gz,
```

#### Prepare and run nf-core/taxprofiler command

Then we can set up our command for running nf-core/taxprofiler, using our samplesheet and nf-core/createtaxdb generated database `.csv`.

For the purpose of this tutorial, we will skip a lot of preprocessing steps and go straight to profiling.

```bash
nextflow run nf-core/taxprofiler \
-profile docker \
-c tutorial.conf \
--input input.csv \
--databases results/createtaxdb_downstream_samplesheets/taxprofiler.csv \
--outdir taxprofiler_results/ \
--perform_shortread_qc \
--run_centrifuge \
--run_kraken2
```

Execute this command, and the pipeline should run!

### Clean Up

Once you have completed the tutorial, you can run the following command to delete all downloaded and output files.

```bash
rm -r createtaxdb-tutorial
```

:::warning
Don’t forget to change out of the directory above before trying to delete it!
:::

### Summary

In this tutorial we've shown you how you can generate input sheets for downstream pipelines.
In this case we told nf-core/createtaxdb with `--generate_downstream_samplesheets` and additional flags to create an analysis-ready database sheet that can be given to the nf-core/taxprofiler `--databases` flag.

0 comments on commit 5368c72

Please sign in to comment.