-
Notifications
You must be signed in to change notification settings - Fork 15
Searching for genes by homology with other genes
Recall that you can get the ITEP gene ID for a gene of interest with specific function or locus tag by using the db_getGenesWithAnnotation.py function:
$ db_getGenesWithAnnotation.py "Cbei_1843"
fig|290402.1.peg.1824 1-phosphofructokinase_YP_001308970.1_Cbei_1843
Many ITEP scripts are designed to take output such as this (which is by default printed to stdout) and use them as inputs to another script using pipes (|). One such script is the script db_getBlastResultsContainingGenes.py function, which identifies all genes homologous to one or more input genes. We connect the results of the previous script with this new one by using the following syntax:
$ db_getGenesWithAnnotation.py "Cbei_1843" | db_getBlastResultsContainingGenes.py -g 1
Which produces an output like this (only one line shown here)
fig|290402.1.peg.1824 fig|290402.1.peg.1824 100.0 300 0 0 1 300 1 300 5e-173 600.0 600.0 600.0
This is the standard (-m9) tab-delimited output from BLAST with some additional information added to the end. The table consists of in order: query gene, target gene, percent ID, length of alignment, number of mismatches, number of gap openings, query start, query end, target start, target end, E-value, bitscore, query self-bitscore and target self-bitscore.
The script by default gives you BLASTP results - you can also get BLASTN results with identical format by using the -n flag:
$ db_getGenesWithAnnotation.py "Cbei_1843" | db_getBlastResultsContainingGenes.py -g 1 -n
fig|290402.1.peg.1824 fig|290402.1.peg.1824 100.0 903 0 0 1 903 1 903 0.0 1629.0 1629.0 1629.0
Finally, if the input file has the gene IDs you want to query against in a different column, change the value of -g to reflect that. The -g argument is optional if the gene ID is in the first column as described in the help text (Use the -h flag with any python script to get help text).