Skip to content

Commit

Permalink
Merge pull request #402 from ksahlin/release
Browse files Browse the repository at this point in the history
Version 0.13.0
  • Loading branch information
marcelm committed Mar 4, 2024
2 parents a333a29 + f956c98 commit 11aaa5c
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 33 deletions.
6 changes: 5 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# Strobealign Changelog

## development version
## v0.13.0 (2024-03-04)

* #394: Added option `--aemb` (abundance estimation for metagenomic binning),
which makes strobealign output a table with estimated abundance values for
each contig (instead of SAM or PAF). This was contributed by Shaojun Pan
(@psj1997).
* #386: Parallelize indexing even more by using @alugowski’s
[poolSTL](https://github.com/alugowski/) `pluggable_sort`.
Indexing a human reference (measured on CHM13) now takes only ~45 s on a
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.16)

project(strobealign VERSION 0.12.0)
project(strobealign VERSION 0.13.0)
include(FetchContent)

option(ENABLE_AVX "Enable AVX2 support" OFF)
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ If needed, run `make` with `VERBOSE=1` to get more logging output.
After CMake has been run, you can use this one-liner to compile strobealign and
run the tests:
```
make -j -C build && tests/run.sh
make -s -j -C build && tests/run.sh
```

Whenever you make changes that could potentially affect mapping results, you can
Expand Down
78 changes: 49 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,47 +77,68 @@ as a developer.

### Python bindings

Experimental and incomplete Python bindings can be installed with
Experimental Python bindings can be installed with
`pip install .`. The only documentation for the moment are the tests in
`tests/*.py`.

## Usage

To align paired-reads against a reference FASTA and produce a sorted BAM file,
using eight threads:
```
strobealign ref.fa reads.fq > output.sam # Single-end reads
strobealign ref.fa reads1.fq reads2.fq > output.sam # Paired-end reads
strobealign -x ref.fa reads.fq > output.paf # Single-end reads mapping only (PAF)
strobealign -x ref.fa reads1.fq reads2.fq > output.paf # Paired-end reads mapping only (PAF)
strobealign -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
```

To use interleaved files, use the `--interleaved` flag:

For single-end reads:
```
strobealign ref.fa reads.fq --interleaved > output.sam # Single and/or paired-end reads
strobealign -t 8 ref.fa reads.fastq.gz | samtools sort -o sorted.bam
```

To report secondary alignments, set parameter `-N [INT]` for a maximum of `[INT]` secondary alignments.

The above commands are suitable for interactive use and test runs.
For normal use, avoid creating SAM files on disk as they get very large compared
to their compressed BAM counterparts. Instead, either pipe strobealign’s output
into `samtools view` to create unsorted BAM files:
For mixed reads (the input file can contain both single and paired-end reads):
```
strobealign ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools view -o mapped.bam
strobealign -t 8 ref.fa --interleaved reads.fastq.gz | samtools sort -o sorted.bam
```
Or use `samtools sort` to create a sorted BAM file:

In alignment mode, strobealign produces SAM output. By piping the output
directly into `samtools`, the above commands avoid creating potentially large
intermediate SAM files and also reduce disk I/O.

To produce unsorted BAM, use `samtools view` instead of `samtools sort`.


### Mapping-only mode

The command-line option `-x` switches strobealign into mapping-only mode,
in which it will output [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md)
files instead of SAM. For example:
```
strobealign ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools sort -o sorted.bam
strobealign -x -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | igzip > output.paf.gz
```
This is usually faster than doing the two steps separately because fewer
intermediate files are created.
`igzip` is a faster version of gzip that is part of
[ISA-L](https://github.com/intel/isa-l/).
If it is not available, replace it with `pigz` or regular `gzip` in the
command.


## Abundance estimation mode (for metagenomic binning)

The command-line option `--aemb` switches strobealign into abundance estimation
mode, intended for metagenomic binning.
In this mode, strobealign outputs a single table with abundance values in
tab-separated value format instead of SAM or PAF.

To output the estimated abundance of every contig, the format of output file is: contig_id \t abundance_value:
Paired-end example:
```
strobealign ref.fa reads.fq --aemb > abundance.txt # Single-end reads
strobealign ref.fa reads1.fq reads2.fq --aemb > abundance.txt # Paired-end reads
strobealign -t 8 --aemb ref.fa reads.1.fastq.gz reads.2.fastq.gz > abundances.tsv
```
The output table contains one row for each contig of the reference.
The first column is the reference/contig id and the second its abundance.

The abundance is the number of bases mapped to a contig, divided by the length
of the contig. Reads mapping to *n* different locations are weighted 1/*n*.

Further columns may be added to this table in future versions of strobealign.


## Command-line options

Expand All @@ -127,12 +148,11 @@ options. Some important ones are:
* `-r`: Mean read length. If given, this overrides the read length estimated
from the input file(s). This is usually only required in combination with
`--create-index`, see [index files](#index-files).
* `-t N`, `--threads=N`: Use N threads. This mainly applies to the mapping step
as the indexing step is only partially parallelized.
* `-t N`, `--threads=N`: Use N threads (both for mapping and indexing).
* `--eqx`: Emit `=` and `X` CIGAR operations instead of `M`.
* `-x`: Only map reads, do not do no base-level alignment. This switches the
output format from SAM to [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md).
* `--aemb`: Output the estimated abundance value of every contig, the format of output file is: contig_id \t abundance_value.
* `--aemb`: Output estimated abundance value of each contig, see section above.
* `--rg-id=ID`: Add RG tag to each SAM record.
* `--rg=TAG:VALUE`: Add read group metadata to the SAM header. This can be
specified multiple times. Example: `--rg-id=1 --rg=SM:mysamle --rg=LB:mylibrary`.
Expand Down Expand Up @@ -172,19 +192,19 @@ To create an index, use the `--create-index` option.
Since strobealign needs to know the read length, either provide it with
read file(s) as if you wanted to map them:

strobealign --create-index ref.fa reads.1.fastq.gz reads.2.fastq.gz
strobealign --create-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz

Or set the read length explicitly with `-r`:

strobealign --create-index ref.fa -r 150
strobealign --create-index -t 8 ref.fa -r 150

This creates a file named `ref.fa.rX.sti` containing the strobemer index,
where `X` is the canonical read length that the index is optimized for (see
above).
To use the index when mapping, provide option `--use-index` when doing the
actual mapping:

strobealign --use-index ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...
strobealign --use-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools ...

- Note that the `.sti` files are usually tied to a specific strobealign version.
That is, when upgrading strobealign, the `.sti` files need to be regenerated.
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
name="strobealign",
description="Python bindings for strobealign",
license="MIT",
version="0.12.0",
version="0.13.0",
packages=["strobealign"],
package_dir={"": "src/python"},
cmake_install_dir="src/python",
Expand Down

0 comments on commit 11aaa5c

Please sign in to comment.