-
Notifications
You must be signed in to change notification settings - Fork 23
mean_scores
mean_scores calculates either the global or local mean value or quality SCORES in the stream. The quality SCORES are encoded Phred style in character string.
The global (default) behaviour calculates the SCORES_MEAN as the sum of all the scores over the length of the SCORES string.
The local means SCORES_MEAN_LOCAL are calculated using means from a sliding window, where the smallest mean is returned.
Thus, subquality records, with either a overall too low mean quality or with local dip in quality, can then be filtered using grab.
... | mean_scores [options]
[-? | --help] # Print full usage description.
[-l | --local] # Calculate local means.
[-w <uint> | --window_size=<uint>] # Window size - Default=5
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following Fastq entry in the file test.fastq
:
@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
+HWI-EAS157_20FFGAAXX:2:1:888:434
abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh
The values of the scores in decimal are:
SCORES: 33;34;35;36;37;38;39;40;40;40;40;40;40;40;27;27;27;11;
11;11;11;11;40;37;37;40;40;40;40;40;40;40;40;40;40;40;
To read in these entries and calculate the mean for the quality scores of each entry, do:
read_fastq -i test.fastq | mean_scores
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
SEQ_LEN: 36
SCORES: abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh
SCORES_MEAN: 33.94
---
To calculate local means for a sliding window, do:
read_fastq -i test.fastq | mean_scores -l
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
SEQ_LEN: 36
SCORES: abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh
SCORES_MEAN_LOCAL: 11.0
---
Which indicates a local minimum was located at the stretch of KKKKK = 11+11+11+11+11 / 5 = 11.0
Martin Asser Hansen - Copyright (C) - All rights reserved.
June 2010
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
mean_scores is part of the Biopieces framework.