From 5368c72964af0bf32222dfcad89a71b4b35758be Mon Sep 17 00:00:00 2001 From: "James A. Fellows Yates" Date: Tue, 15 Oct 2024 12:08:15 +0000 Subject: [PATCH] Update tutorial, add tutorail directory to gitignore --- .gitignore | 1 + docs/usage/tutorials.md | 83 ++++++++++++++++++++++++++++++++--------- 2 files changed, 67 insertions(+), 17 deletions(-) diff --git a/.gitignore b/.gitignore index 8296684..a960b83 100644 --- a/.gitignore +++ b/.gitignore @@ -9,3 +9,4 @@ testing* null/ .nf-test* test.xml +tutorial/ diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index e722b3c..f679268 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -7,9 +7,9 @@ This page provides a range of tutorials to help give you a more step-by-step gui nf-core/createtaxdb can be used to generate input database sheets for pipelines such as [nf-core/taxprofiler](https://nf-co.re/taxprofiler). This tutorial will guide you step-by-step on how to setup a nf-core/createtaxdb run so that you can then run nf-core/taxprofiler almost straight away after. -## Prerequisites +### Prerequisites -### Hardware +#### Hardware The datasets provided here _should_ be small enough to run on your laptop or desktop computer. It should not require a HPC or similar. @@ -18,14 +18,14 @@ If you wish to use a HPC cluster or cloud, and don’t wish to use an ‘interac You will need internet access and at least X.X GB of hard-drive space. -#### Software +##### Software The tutorial assumes you are on a Unix based operating system, and have already installed Nextflow as well a software environment system such as [Conda](https://docs.conda.io/en/latest/miniconda.html), [Docker](https://www.docker.com/), or [Singularity/Apptainer](https://apptainer.org/). The tutorial will use Docker, however you can simply replace references to `docker` with `conda`, `singularity`, or `apptainer` accordingly. It also assumes you've already pulled both nf-core/createtaxdb and nf-core/taxprofiler pipelines with `nextflow pull`. -### Data +#### Data First we will make a directory to run the whole tutorial in. @@ -34,7 +34,7 @@ mkdir createtaxdb-tutorial cd createtaxdb-tutorial/ ``` -#### nf-core/createtaxdb +##### nf-core/createtaxdb First we will need to download some reference genome FASTAs for building into databases for nf-core/createtaxdb. These species are present in our sequencing reads we will use for nf-core/taxprofiler. @@ -53,7 +53,7 @@ curl -O https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/creat curl -O https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/createtaxdb/data/tutorials/nodes_reduced.dmp ``` -#### nf-core/taxprofiler +##### nf-core/taxprofiler Then, for aligning against our database, we will use very small short-read (pre-subset) metagenomes used for testing. We will download two sets of paired-end sequencing reads. @@ -81,9 +81,9 @@ process { } ``` -### Generating databases +### Running nf-core/createtaxdb -#### Input Samplesheet preparation +#### Input nf-core/createtaxdb samplesheet preparation To generate our database files from our FASTA files with nf-core/createtaxdb, we first write the createtaxdb samplesheet. In this case we wil make a file called `input.csv` in a text editor which will contain the following: @@ -96,11 +96,11 @@ Homo_sapiens,9606,NC_012920.1.fa.gz, We won't generate amino-acid based databases, so we just supply our two downloaded nucleotide reference genomes. and leave the `fasta_aa` column blank. -#### Prepare command +#### Prepare and run nf-core/createtaxdb command Then we can construct a nf-core/createtaxdb command. In this example, we will create two databases, one for `kraken2`, and one for a`centrifuge`. -We will supply the input files of the `input.csv`, and the three taxdmp related files: `nodes_reduced.dmp`, `names_reduced.dmp`, and `accession2taxid_reduced.dmp`. +We will supply the input files of the `input.csv`, and the three taxdmp related files: `nodes_reduced.dmp`, `names_reduced.dmp`, and `nucl_gb.accession2taxid`. We will also specify the name of the database to build, and also the two tools to generate databases for. ```bash @@ -115,8 +115,8 @@ nextflow run ../main.nf \ --namesdmp names_reduced.dmp \ --accession2taxid nucl_gb.accession2taxid \ --nucl2taxid nucl2taxid.txt \ + --outdir createtaxdb_results \ --dbname tutorial \ - --outdir results \ --build_kraken2 \ --build_centrifuge \ --generate_tar_archive \ @@ -125,8 +125,20 @@ nextflow run ../main.nf \ --generate_samplesheet_dbtype 'tar' ``` -Once completed, we can look in the specified `--outdir` directory, for a subdirectory called `downstream_samplesheets`. -In this tutorial this is `results/downstream_samplesheets`. +After the input and output files, we also specify a name for the databases to created, to build the two databases of interest for the two tools, and also tell the pipeline to generate `.tar.gz` archives of the generated databases, and to generate the samplesheet for `taxprofiler`. + +Generating the `.tar.gz` files is optional, as nf-core/taxprofiler can accept directories - however this can be useful for cloud contexts where you cannot 'download' directories. +If we want to supply just the raw database output (i.e., no `tar` archives), we would specify `--generate_samplesheet_dbtype 'raw'. + +Currently, nf-core/createtaxdb only supports generating a single pipeline samplesheet. +However in the future, when this is extended to other pipelines, you can supply a comma separated list to make samplesheets for multiple pipelines at once! For example with `--generate_pipeline_samplesheets 'taxprofiler,ampliseq'`. + +Now we have a better understanding of what the command is doing, we can execute it, and the pipeline should run! + +#### nf-core/createtaxdb output + +Once the run finishes, we can look in the specified `--outdir` directory, for a subdirectory called `downstream_samplesheets`. +In this tutorial this is `createtaxdb_results/downstream_samplesheets`. This samplesheet, called `taxprofiler.csv`, can be used used as input to nf-core/taxprofiler! @@ -134,17 +146,17 @@ It should look something like this: ```csv title="taxprofiler.csv" tool,db_name,db_params,db_path -kraken2,tutorial-kraken2,,///results/kraken2/tutorial-kraken2.tar.gz -centrifuge,tutorial-centrifuge,,///results/centrifuge/tutorial-centrifuge.tar.gz +kraken2,tutorial-kraken2,,///createtaxdb_results/kraken2/tutorial-kraken2.tar.gz +centrifuge,tutorial-centrifuge,,///createtaxdb_results/centrifuge/tutorial-centrifuge.tar.gz ``` -Note that paths to the databases in this `.csv` file point to the nf-core/createtaxdb results run directory, make sure not to delete them! +Note that paths to the databases in this `.csv` file point to the nf-core/createtaxdb results run directory, make sure not to move or delete them! ### Running nf-core/taxprofiler Now with our downloaded reads files that we prepared during the preparation section of this tutorial, and our database samplesheet we can run nf-core/taxprofiler! -#### Input Samplesheet preparation +#### Input nf-core/taxprofiler samplesheet preparation Firstly we prepare an input _samplesheet_ (not database sheet!) with our reads and their metadata. @@ -156,3 +168,40 @@ sample,run_accession,instrument_platform,fastq_1,fastq_2,fasta ERX5474932,ERR5766176,ILLUMINA,ERX5474932_ERR5766176_1.fastq.gz,ERX5474932_ERR5766176_2.fastq.gz, ERX5474932,ERR5766176_B,ILLUMINA,ERX5474932_ERR5766176_B_1.fastq.gz,ERX5474932_ERR5766176_B_2.fastq.gz, ``` + +#### Prepare and run nf-core/taxprofiler command + +Then we can set up our command for running nf-core/taxprofiler, using our samplesheet and nf-core/createtaxdb generated database `.csv`. + +For the purpose of this tutorial, we will skip a lot of preprocessing steps and go straight to profiling. + +```bash +nextflow run nf-core/taxprofiler \ + -profile docker \ + -c tutorial.conf \ + --input input.csv \ + --databases results/createtaxdb_downstream_samplesheets/taxprofiler.csv \ + --outdir taxprofiler_results/ \ + --perform_shortread_qc \ + --run_centrifuge \ + --run_kraken2 +``` + +Execute this command, and the pipeline should run! + +### Clean Up + +Once you have completed the tutorial, you can run the following command to delete all downloaded and output files. + +```bash +rm -r createtaxdb-tutorial +``` + +:::warning +Don’t forget to change out of the directory above before trying to delete it! +::: + +### Summary + +In this tutorial we've shown you how you can generate input sheets for downstream pipelines. +In this case we told nf-core/createtaxdb with `--generate_downstream_samplesheets` and additional flags to create an analysis-ready database sheet that can be given to the nf-core/taxprofiler `--databases` flag.