-
Notifications
You must be signed in to change notification settings - Fork 23
replace_vals
#summary Replace values to a specied key for each record in the stream.
If you need to replace the values for a given key in all records to e.g. replace an ID with
a description, you can use replace_vals. To replace a single value use the switches -s
and -r
but to replace many diffrent values you need to specify these in a table file that is given as
argument to the -f
switch. This table file is read - skipping lines starting with #
- into a
hash where the search strings are used as keys and the replace strings are values. It is also
possible to change the delimiter of the columns using the -d
switch.
... | replace_vals -k <key> [options]
[-? | --help] # Print full usage description.
[-k <string> | --key=<string>] # Key whos values should be replaced.
[-s <string> | --search=<string>] # Search string.
[-r <string> | --replace=<string>] # Replacement string.
[-f <file!> | --file=<file!>] # File with table of search/replace columns.
[-S <uint> | --search_col=<uint>] # Column with search strings - Default=1
[-R <uint> | --replace_col=<uint>] # Column with replace strings - Default=2
[-d <string> | --delimiter=<string>] # Table delimiter - Default='\s+'
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file test.fna
>test1
AAGTGTATGAGCCCAGTCGCCCTA
>test2
CGGGAACCTGATCAGCTGTCTACA
To replace the values for the SEQ_NAME
key matching test2
with foo
do:
read_fasta -i test.fna | replace_vals -k SEQ_NAME -s test2 -r foo
SEQ_NAME: test1
SEQ: AAGTGTATGAGCCCAGTCGCCCTA
SEQ_LEN: 24
---
SEQ_NAME: foo
SEQ: CGGGAACCTGATCAGCTGTCTACA
SEQ_LEN: 24
---
To replace multiple different values we need to specify these in a table file. Consider the
following table in the file test.tab
:
test1 foo
bar test2
Per default the search strings are in the first column (search_col
default is 1) and the default
replace strings are the second column (replace_col
default is 2). Using the -f
will cause the
table file to be read and a hash is build with the elements in the first column as keys and the
elements in the second column as values. Thus we can replace like this:
read_fasta -i test.fna | replace_vals -k SEQ_NAME -f test.tab
SEQ_NAME: foo
SEQ: AAGTGTATGAGCCCAGTCGCCCTA
SEQ_LEN: 24
---
SEQ_NAME: test2
SEQ: CGGGAACCTGATCAGCTGTCTACA
SEQ_LEN: 24
---
It is possible to change the search_col
and replace_col
:
read_fasta -i test.fna | replace_vals -k SEQ_NAME -f test.tab -S 2 -R 1
SEQ_NAME: test1
SEQ: AAGTGTATGAGCCCAGTCGCCCTA
SEQ_LEN: 24
---
SEQ_NAME: bar
SEQ: CGGGAACCTGATCAGCTGTCTACA
SEQ_LEN: 24
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
December 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
replace_vals is part of the Biopieces framework.