Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Searching for genes by homology with other genes

mattb112885 edited this page Apr 17, 2013 · 8 revisions

Finding genes homologous to a specific gene

Recall that you can get the ITEP gene ID for a gene of interest with specific function or locus tag by using the db_getGenesWithAnnotation.py function:

$ db_getGenesWithAnnotation.py "Cbei_1843"

fig|290402.1.peg.1824 1-phosphofructokinase_YP_001308970.1_Cbei_1843

Many ITEP scripts are designed to take output such as this (which is by default printed to stdout) and use them as inputs to another script using pipes (|). One such script is the script db_getBlastResultsContainingGenes.py function, which identifies all genes homologous to one or more input genes. We connect the results of the previous script with this new one by using the following syntax:

$ db_getGenesWithAnnotation.py "Cbei_1843" | db_getBlastResultsContainingGenes.py -g 1

Which produces an output like this (only one line shown here)

fig|290402.1.peg.1824 fig|290402.1.peg.1824 100.0 300 0 0 1 300 1 300 5e-173 600.0 600.0 600.0

This is the standard (-m9) tab-delimited output from BLAST with some additional information added to the end. The table consists of in order: query gene, target gene, percent ID, length of alignment, number of mismatches, number of gap openings, query start, query end, target start, target end, E-value, bitscore, query self-bitscore and target self-bitscore.

The script by default gives you BLASTP results - you can also get BLASTN results with identical format by using the -n flag:

$ db_getGenesWithAnnotation.py "Cbei_1843" | db_getBlastResultsContainingGenes.py -g 1 -n

fig|290402.1.peg.1824 fig|290402.1.peg.1824 100.0 903 0 0 1 903 1 903 0.0 1629.0 1629.0 1629.0

You can specify an E-value cutoff for results to display with the -c flag.

Finally, if the input file has the gene IDs you want to query against in a different column, change the value of -g to reflect that. The -g argument is optional if the gene ID is in the first column as described in the help text (Use the -h flag with any python script to get help text).

Clone this wiki locally