Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 6 revisions

#summary Read tabular data.

Biopiece: read_tab

Description

Tabular input can be read with [read_tab] which will read in chosen rows and chosen columns (separated by a given delimiter) from a table in ASCII text format.

If no --keys are given and there is a comment line beginning with # the fields here will be used as keys.

Usage

read_tab [options] -i <table file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Read tabular data from file.
[-d <string> | --delimit=<string>]   #  Changes delimiter  -  Default='\s+'
[-c <string> | --cols=<list>]        #  Comma separated list of cols to read in that order.
[-k <string> | --keys=<list>]        #  Comma separated list of keys to use for each column.
[-s <uint>   | --skip=<uint>]        #  Skip number of initial records  -  Default=0.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file     -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file     -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following table from the file from the file test.tab:

Organism   Sequence    Count
Human      ATACGTCAG   23524
Dog        AGCATGAC    2442
Mouse      GACTG       234
Cat        AAATGCA     2342

Reading the entire table:

read_tab -i test.tab

The above command will result in 5 records, one for each row, where the keys V0, V1, V2 are the default keys for the columns:

V0: Organism
V2: Count
V1: Sequence
---
V0: Human
V2: 23524
V1: ATACGTCAG
---
V0: Dog
V2: 2442
V1: AGCATGAC
---
V0: Mouse
V2: 234
V1: GACTG
---
V0: Cat
V2: 2342
V1: AAATGCA
---

However, if the first line is a comment line that can be skipped using the -s switch which will skip a specified number of lines before reading. So to get the rows with data do:

read_tab -i test.tab -s 1

V0: Human
V2: 23524
V1: ATACGTCAG
---
V0: Dog
V2: 2442
V1: AGCATGAC
---
V0: Mouse
V2: 234
V1: GACTG
---
V0: Cat
V2: 2342
V1: AAATGCA
---

To explicitly name the columns (or the keys) use the -k switch:

read_tab -i test.tab -s 1 -k ORGANISM,SEQ,COUNT

SEQ: ATACGTCAG
ORGANISM: Human
COUNT: 23524
---
SEQ: AGCATGAC
ORGANISM: Dog
COUNT: 2442
---
SEQ: GACTG
ORGANISM: Mouse
COUNT: 234
---
SEQ: AAATGCA
ORGANISM: Cat
COUNT: 2342
---

It is possible to select a subset of columns to read by using the -c switch which takes a comma separated list of columns numbers (first column is designated 0) as argument. So to read in only the sequence and the count so that the count comes before the sequence do:

read_tab -i test.tab -s 1 -c 2,1

V0: 23524
V1: ATACGTCAG
---
V0: 2442
V1: AGCATGAC
---
V0: 234
V1: GACTG
---
V0: 2342
V1: AAATGCA
---

It is also possible to rename the columns with the -k switch:

read_tab -i test.tab -s 1 -c 2,1 -k COUNT,SEQ

SEQ: ATACGTCAG
COUNT: 23524
---
SEQ: AGCATGAC
COUNT: 2442
---
SEQ: GACTG
COUNT: 234
---
SEQ: AAATGCA
COUNT: 2342
---

Last, if we change the first line in the ´test.tab´ line to include a ´#´ like this:

#Organism  Sequence    Count
Human      ATACGTCAG   23524
Dog        AGCATGAC    2442
Mouse      GACTG       234
Cat        AAATGCA     2342

...then the fields in this line will be used as keys:

read_tab -i test.tab 

Organism: Human
Count: 23524
Sequence: ATACGTCAG
---
Organism: Dog
Count: 2442
Sequence: AGCATGAC
---
Organism: Mouse
Count: 234
Sequence: GACTG
---
Organism: Cat
Count: 2342
Sequence: AAATGCA
---

See also

[write_tab]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

[read_tab] is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally