codonpair
calculates codon pair score and codon pair bias. CPS
values are identical to those produced by the perl script from
Dimitris Papamichail (cps_perl
directory) and, presumably,
used in the following work:
Virus attenuation by genome-scale changes in codon pair bias.
Coleman JR1, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S.
Science. 2008 Jun 27;320(5884):1784-7. doi: 10.1126/science.1155761.
https://www.ncbi.nlm.nih.gov/pubmed/18583614
Either, clone the repo and install with pip
git clone git@github.com:smsaladi/codonpair.git
pip install ./codonpair
Or... have pip handle the details:
pip install git+git://github.com/smsaladi/codonpair@master#codonpair
All dependencies should be checked for and, if necessary, installed
automatically by pip
.
Initialize a codonpair.CodonPair
object by specifying a list of reference sequences
CodonPair.from_sequences
, from a named reference CodonPair.from_named_reference
,
a reference file CodonPair.from_reference_file
,
or simply providing a pd.DataFrame
with codon counts to CodonPair
.
The following named references are recognized/bundled with this package.
E. coli
(BL21 DE3)S. pneumoniae
(TIGR4)cps_perl
- the reference file provided with the perl implementation
The default constructor CodonPair()
uses the E. coli
.
Then calculate the codon pair score for a provided sequence with CodonPair.cpb
which returns a dictionary with the
- total codon pair score
total_cps
- the sum of the values of each codon pair - the number of codons
n_pair
- excluding codon pairs not found in the reference - the codon pair bias
cpb
-total_cps/n_pair
For one-off calculations, codonpair.calc_cpb
can be used directly
for with the sequence of interest (calling the default constructor under the hood).
import codonpair
cp = codonpair.CodonPair.from_named_reference('E. coli')
cp.cpb("ATGATCCCCTTACAACATGGACTGATCCTCGCGGCAATCTTATTCGTTCTTGGCTTAACC")
For convenience, the executable cps
installed into the path by pip:
cps test.fasta > test.scores.txt
See CodonPair.write_reference
to write codon pair counts for a reference set to
the filename provided to be used with future calculations.