Releases: esteinig/vircov
Releases · esteinig/vircov
0.6.0
Major updates making applications more useful 🥳
Two short-hand command-line arguments (-i
and -T
) break with previous versions 💀
- Release binaries CI/CD
- Input alignment format (
-i/--alignment
) from file extension (bam|sam|cram|paf
) or specifically with--alignment-format
- Added
--aligned/--group-aligned
filter to supplement filter by unique aligned reads (--reads/--group-reads
) - Pretty table output short argument is now
-T
(previously-t
) - Input alignment short argument is now
-i
(previously-A
) - Added
-H
argument to print machine-readable header to non-pretty table output [#13] - Reference alignment grouping by field in header and automated reference selection:
- Requires annotation in reference sequence header (description) e.g.taxid=9606; segment="M"
- Whitespace around header fields or values is trimmed (start-end) internally on parsing
---group-by <field>
: group alignments by this field
---group-sep <delimiter>
: the delimiter with which fields in the header are separated
---group-select-split <dir>
: selects a single reference per group and outputs to file in<dir >
({group_id}.fasta
)
---group-select-by <coverage|reads>
: selection by highest coverage or max reads
---group-select-order
outputs the selected reference with index prefixes sorted byselect-by
metric ({idx}-{group_id}.fasta
)
- Example:--group-by "taxid=" --group-sep ";" --group-select-split ref_seqs/ --group-select-by coverage
- If segment fields are specified each select segment reference is output by highest coverage or reads
- Command line:--segment-field
and--segment-field-nan
- Example:--segment-field "segment=" --segment-field-nan "segment=N/A"
- Grouped filtering and outputs behave different to non-grouped filtering and outputs:
- Non-group filters (--regions
,--reads
,--aligned
,--coverage
,--length
) are applied before grouping
- Group filters can be applied (--group-regions
,--group-reads
,--group-coverage
,--group-aligned
)
- Grouped output fields are distinct from the non-grouped fields - they change the following (described in--help
):
* Reference sequence identifier is the value that is grouped by followed by the number of grouped members in brackets e.g.9606 (5)
* Distinct alignment regions are summed across group members
* Alignments are summed across group members
* Unique reads aligned are recomputed across group members
* Covered bases and reference lengths are set to 0
* Coverage is selected to be the highest among the group members - Conditional coverage filter applied to
--regions
filters and applies it only if coverage is below this threshold
- This rescues high coverage sequences as these usually have few regions
---regions-coverage <0.0-1.0>
- a sufficient value can be somewhere around 0.3 - 0.6
- Short argument for conditional coverage filter (-t
) has replaced pretty table output (now-T
)
0.5.0
Command line:
--paf | --bam
input with "-" for reading fromstdin
- changed long name of
--cov-reg
to--regions
Main:
- added
SAM/BAM/CRAM
support [#3] - rewrote interval parsing for PAF format [#8]
- fixed bug in filtering coverage plot outputs [#9]
- added table output confirmation test [#5]
- added basic BAM reader tests, including query alignment length from CIGAR [#5]
- reimplemented custom PAF parser due to variable CIGAR tags [#8]
Other:
- replaced
noodles
fasta parsing withrust-bio
- removed
csv
crate
Test coverage:
- couldn't figure out one line for file name match statement [#14]
- slight regression in coverage from reader functions
0.4.0
Operational, added features / command line options:
- input alignment now required arg:
vircov test.paf
[previously:--paf
option] - filter results output by
--seq-len
: minimum reference sequence length--cov-reg
minimum number of detected coverage regions
- long help menu with
--help
- pretty table output with
--table
- 100% test coverage 🥳
- continuous integration for Linux and MacOS