Skip to content

Commit

Permalink
Update your first script (#5664)
Browse files Browse the repository at this point in the history
Signed-off-by: Christopher Hakkaart <chris.hakkaart@seqera.io>
Co-authored-by: Ben Sherman <bentshermann@gmail.com>
  • Loading branch information
christopher-hakkaart and bentsherman authored Jan 16, 2025
1 parent dc6cc41 commit 7439ce2
Showing 1 changed file with 161 additions and 69 deletions.
230 changes: 161 additions & 69 deletions docs/your-first-script.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,115 +2,207 @@

# Your first script

This guide details fundamental skills to run a basic Nextflow pipeline. It includes:

- Running a pipeline
- Modifying and resuming a pipeline
- Configuring a pipeline parameter

<h3>Prerequisites</h3>

You will need the following to get started:

- Nextflow. See {ref}`install-page` for instructions to install or update your version of Nextflow.

## Run a pipeline

This script defines two processes. The first splits a string into 6-character chunks, writing each one to a file with the prefix `chunk_`, and the second receives these files and transforms their contents to uppercase letters. The resulting strings are emitted on the `result` channel and the final output is printed by the `view` operator. Copy the following example into your favorite text editor and save it to a file named `tutorial.nf`:
You will run a basic Nextflow pipeline that splits a string of text into two files and then converts lowercase letters to uppercase letters. You can see the pipeline here:

```{code-block} groovy
:class: copyable
// Default parameter input
params.str = "Hello world!"
// splitString process
process splitString {
publishDir "results/lower"
input:
val x
output:
path 'chunk_*'
script:
"""
printf '${x}' | split -b 6 - chunk_
"""
}
```{literalinclude} snippets/your-first-script.nf
:language: nextflow
// convertToUpper process
process convertToUpper {
publishDir "results/upper"
tag "$y"
input:
path y
output:
path 'upper_*'
script:
"""
cat $y | tr '[a-z]' '[A-Z]' > upper_${y}
"""
}
// Workflow block
workflow {
ch_str = Channel.of(params.str) // Create a channel using parameter input
ch_chunks = splitString(ch_str) // Split string into chunks and create a named channel
convertToUpper(ch_chunks.flatten()) // Convert lowercase letters to uppercase letters
}
```

Execute the script by entering the following command in your terminal:
This script defines two processes:

- `splitString`: takes a string input, splits it into 6-character chunks, and writes the chunks to files with the prefix `chunk_`
- `convertToUpper`: takes files as input, transforms their contents to uppercase letters, and writes the uppercase strings to files with the prefix `upper_`

The `splitString` output is emitted as a single element. The `flatten` operator splits this combined element so that each file is treated as a sole element.

The outputs from both processes are published in subdirectories, that is, `lower` and `upper`, in the `results` directory.

To run your pipeline:

1. Create a new file named `main.nf` in your current directory
2. Copy and save the above pipeline to your new file
3. Run your pipeline using the following command:

```{code-block}
:class: copyable
nextflow run main.nf
```
You will see output similar to the following:
```console
$ nextflow run tutorial.nf
N E X T F L O W ~ version 24.10.3
Launching `main.nf` [big_wegener] DSL2 - revision: 13a41a8946
N E X T F L O W ~ version 23.10.0
executor > local (3)
[69/c8ea4a] process > splitLetters [100%] 1 of 1 ✔
[84/c8b7f1] process > convertToUpper [100%] 2 of 2 ✔
HELLO
WORLD!
[82/457482] splitString (1) | 1 of 1 ✔
[2f/056a98] convertToUpper (chunk_aa) | 2 of 2 ✔
```

:::{note}
For versions of Nextflow prior to `22.10.0`, you must explicitly enable DSL2 by adding `nextflow.enable.dsl=2` to the top of the script or by using the `-dsl2` command-line option.
:::

You can see that the first process is executed once, and the second twice. Finally the result string is printed.
Nextflow creates a `work` directory to store files used during a pipeline run. Each execution of a process is run as a separate task. The `splitString` process is run as one task and the `convertToUpper` process is run as two tasks. The hexadecimal string, for example, `82/457482`, is the beginning of a unique hash. It is a prefix used to identify the task directory where the script was executed.

It's worth noting that the process `convertToUpper` is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk `Hello`) will be executed before the one processing the second split (the chunk `world!`). Thus, you may very likely see the final result printed in a different order:
:::{tip}
Run your pipeline with `-ansi-log false` to see each task printed on a separate line:

```{code-block} bash
:class: copyable
nextflow run main.nf -ansi-log false
```
WORLD!
HELLO

You will see output similar to the following:

```console
N E X T F L O W ~ version 24.10.3
Launching `main.nf` [peaceful_watson] DSL2 - revision: 13a41a8946
[43/f1f8b5] Submitted process > splitString (1)
[a2/5aa4b1] Submitted process > convertToUpper (chunk_ab)
[30/ba7de0] Submitted process > convertToUpper (chunk_aa)
```

:::{tip}
The hexadecimal string, e.g. `22/7548fa`, is the unique hash of a task, and the prefix of the directory where the task is executed. You can inspect a task's files by changing to the directory `$PWD/work` and using this string to find the specific task directory.
:::
:::

(getstarted-resume)=

## Modify and resume

Nextflow keeps track of all the processes executed in your pipeline. If you modify some parts of your script, only the processes that are actually changed will be re-executed. The execution of the processes that are not changed will be skipped and the cached result used instead. This helps a lot when testing or modifying part of your pipeline without having to re-execute it from scratch.
Nextflow tracks task executions in a task cache, a key-value store of previously executed tasks. The task cache is used in conjunction with the work directory to recover cached tasks. If you modify and resume your pipeline, only the processes that are changed will be re-executed. The cached results will be used for tasks that don't change.

For the sake of this tutorial, modify the `convertToUpper` process in the previous example, replacing the process script with the string `rev $x`, like so:
You can enable resumability using the `-resume` flag when running a pipeline. To modify and resume your pipeline:

```nextflow
process convertToUpper {
input:
path x
output:
stdout
script:
"""
rev $x
"""
}
```
1. Open `main.nf`
2. Replace the `convertToUpper` process with the following:

Then save the file with the same name, and execute it by adding the `-resume` option to the command line:
```{code-block} groovy
:class: copyable
process convertToUpper {
publishDir "results/upper"
tag "$y"
```bash
nextflow run tutorial.nf -resume
```
input:
path y
It will print output similar to this:
output:
path 'upper_*'
script:
"""
rev $y > upper_${y}
"""
}
```
3. Save your changes
4. Run your updated pipeline using the following command:
```{code-block} bash
:class: copyable
nextflow run main.nf -resume
```
You will see output similar to the following:
```console
N E X T F L O W ~ version 24.10.3
Launching `main.nf` [furious_curie] DSL2 - revision: 5490f13c43
```
N E X T F L O W ~ version 23.10.0
executor > local (2)
[69/c8ea4a] process > splitLetters [100%] 1 of 1, cached: 1 ✔
[d0/e94f07] process > convertToUpper [100%] 2 of 2 ✔
olleH
!dlrow
[82/457482] splitString (1) | 1 of 1, cached: 1 ✔
[02/9db40b] convertToUpper (chunk_aa) | 2 of 2 ✔
```

You will see that the execution of the process `splitLetters` is actually skipped (the process ID is the same), and its results are retrieved from the cache. The second process is executed as expected, printing the reversed strings.

:::{tip}
The pipeline results are cached by default in the directory `$PWD/work`. Depending on your script, this folder can take up a lot of disk space. It's a good idea to clean this folder periodically, as long as you know you won't need to resume any pipeline runs.
:::
Nextflow skips the execution of the `splitString` process and retrieves the results from the cache. The `convertToUpper` process is executed twice.

For more information, see the {ref}`cache-resume-page` page.
See {ref}`cache-resume-page` for more information about Nextflow cache and resume functionality.

(getstarted-params)=

## Pipeline parameters

Pipeline parameters are simply declared by prepending to a variable name the prefix `params`, separated by dot character. Their value can be specified on the command line by prefixing the parameter name with a double dash character, i.e. `--paramName`
Parameters are used to control the inputs to a pipeline. They are declared by prepending a variable name to the prefix `params`, separated by dot character. Parameters can be specified on the command line by prefixing the parameter name with a double dash character, for example, `--paramName`. Parameters specified on the command line override parameters specified in a main script.

For the sake of this tutorial, you can try to execute the previous example specifying a different input string parameter, as shown below:
You can configure the `str` parameter in your pipeline. To modify your `str` parameter:

```bash
nextflow run tutorial.nf --str 'Bonjour le monde'
```
1. Run your pipeline using the following command:

The string specified on the command line will override the default value of the parameter. The output will look like this:
```{code-block} bash
:class: copyable
nextflow run main.nf --str 'Bonjour le monde'
```
You will see output similar to the following:
```console
N E X T F L O W ~ version 24.10.3
Launching `main.nf` [distracted_kalam] DSL2 - revision: 082867d4d6
```
N E X T F L O W ~ version 23.10.0
executor > local (4)
[8b/16e7d7] process > splitLetters [100%] 1 of 1 ✔
[eb/729772] process > convertToUpper [100%] 3 of 3 ✔
m el r
edno
uojnoB
[55/a3a700] process > splitString (1) [100%] 1 of 1 ✔
[f4/af5ddd] process > convertToUpper (chunk_ac) [100%] 3 of 3 ✔
```

:::{versionchanged} 20.11.0-edge
Any `.` (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, `--foo.bar Hello` will be interpreted as `params.foo.bar`. If you want to have a parameter name that contains a `.` (dot) character, escape it using the back-slash character, e.g. `--foo\.bar Hello`.
:::
The input string is now longer and the `splitString` process splits it into three chunks. The `convertToUpper` process is run three times.

See {ref}`cli-params` for more information about modifying pipeline parameters.

<h2>Next steps</h2>

Your first script is a brief introduction to running pipelines, modifying and resuming pipelines, and pipeline parameters. See [training.nextflow.io](https://training.nextflow.io/) for further Nextflow training modules.

0 comments on commit 7439ce2

Please sign in to comment.