Skip to content

A codon pair score/codon pair bias calculator

License

Notifications You must be signed in to change notification settings

smsaladi/codonpair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status PyPI version PyPI - Downloads DOI

codonpair

codonpair calculates codon pair score and codon pair bias. CPS values are identical to those produced by the perl script from Dimitris Papamichail (cps_perl directory) and, presumably, used in the following work:

Virus attenuation by genome-scale changes in codon pair bias.
Coleman JR1, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S.
Science. 2008 Jun 27;320(5884):1784-7. doi: 10.1126/science.1155761.
https://www.ncbi.nlm.nih.gov/pubmed/18583614

Installation

Either, clone the repo and install with pip

git clone git@github.com:smsaladi/codonpair.git
pip install ./codonpair

Or... have pip handle the details:

pip install git+git://github.com/smsaladi/codonpair@master#codonpair

All dependencies should be checked for and, if necessary, installed automatically by pip.

Usage

Initialize a codonpair.CodonPair object by specifying a list of reference sequences CodonPair.from_sequences, from a named reference CodonPair.from_named_reference, a reference file CodonPair.from_reference_file, or simply providing a pd.DataFrame with codon counts to CodonPair.

The following named references are recognized/bundled with this package.

  • E. coli (BL21 DE3)
  • S. pneumoniae (TIGR4)
  • cps_perl - the reference file provided with the perl implementation

The default constructor CodonPair() uses the E. coli.

Then calculate the codon pair score for a provided sequence with CodonPair.cpb which returns a dictionary with the

  • total codon pair score total_cps - the sum of the values of each codon pair
  • the number of codons n_pair - excluding codon pairs not found in the reference
  • the codon pair bias cpb - total_cps/n_pair

For one-off calculations, codonpair.calc_cpb can be used directly for with the sequence of interest (calling the default constructor under the hood).

import codonpair
cp = codonpair.CodonPair.from_named_reference('E. coli')
cp.cpb("ATGATCCCCTTACAACATGGACTGATCCTCGCGGCAATCTTATTCGTTCTTGGCTTAACC")

For convenience, the executable cps installed into the path by pip:

cps test.fasta > test.scores.txt

See CodonPair.write_reference to write codon pair counts for a reference set to the filename provided to be used with future calculations.