Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Searching for functions using conserved domains

mattb112885 edited this page Jun 13, 2013 · 10 revisions

WARNING: The VM distribution is unable to run RPSBLAST against the full CDD due to memory limitations of a 32-bit machine (64-bit machines in VMs will only work if your host computer has sufficient virtualization capabilities, which many do not). Therefore if you try to run main4.sh it will fail unless you modify it to only search a particular database of interest (e.g. Pfam.pn instead of Cdd.pn) and not the entire CDD.

Finding conserved domains in the CDD by keyword search

If you are interested in conserved domains that match a particular description, you can search through the descriptions by using the db_getExternalClustersByDescription.py function. This script takes any number of possible descriptions to match in a case-insensitive manner and returns any of the CDD domains that match that description. For example, if you are interested in biotin synthase you can search for domains related to it using the following (some descriptions have been truncated for readability):

$ db_getExternalClustersByDescription.py "biotin synthase"
30848   COG0502 BioB    Biotin synthase and related enzymes [Coenzyme metabolism]       335
32586   COG2516 COG2516 Biotin synthase-related enzyme [General function prediction only]       339
178013  PLN02389        PLN02389        biotin synthase 379
180492  PRK06256        PRK06256        biotin synthase; Validated      336
180835  PRK07094        PRK07094        biotin synthase; Provisional    323
181453  PRK08508        PRK08508        biotin synthase; Provisional    279
185063  PRK15108        PRK15108        biotin synthase; Provisional    345
129447  TIGR00347       bioD    dethiobiotin synthase. [description truncated]       166
200012  TIGR00433       bioB    biotin synthase. [description truncated]     296
100105  cd01335 Radical_SAM     Radical SAM superfamily. ... Examples are biotin synthase (BioB),...  204
198863  cl06149 BATS    Biotin and Thiamin Synthesis associated domain. Biotin synthase (BioB), ...    0
148534  pfam06968       BATS    Biotin and Thiamin Synthesis associated domain. Biotin synthase (BioB), EC:2.8.1.6, c...   93
205678  pfam13500       AAA_26  AAA domain. ... found in a number of proteins involved in cofactor biosynthesis such as dethiobiotin synthase ...       197
197846  smart00729      Elp3    Elongator protein 3, MiaB family, Radical SAM. This superfamily contains ... biotin synthase ...    216
197944  smart00876      BATS    Biotin and Thiamin Synthesis associated domain...    94

You can also specify that you only want results from a specific database, such as PFAM here:

$ db_getExternalClustersByDescription.py "biotin synthesase" -d pfam
148534  pfam06968       BATS    Biotin and Thiamin Synthesis associated domain. Biotin synthase (BioB), EC:2.8.1.6, c...   93
205678  pfam13500       AAA_26  AAA domain. ... found in a number of proteins involved in cofactor biosynthesis such as dethiobiotin synthase ...       197

Searching for conserved domains associated with a protein

You can search for the conserved domains associated with a protein with the db_getExternalClusterGroups.py function, which takes a list of genes from standard in and returns to you a list of RPSBLAST hits to the CDD.

The function gives you the option to append the cluster's name (e.g. BATS) or description to the results table, to cut off results at an E-value lower than the default value of 1E-5, or to limit the printed results to those in a given conserved database (e.g. COG). See the function's help text for details.

NOTE: If you get the following error, it indicates that you have not run main4.sh (or that it failed):

error:
Traceback (most recent call last):
File "[directory]/src/db_getExternalClusterGroups.py", line 47, in <module>
   cur.execute(cmd, (geneid, ) )
sqlite3.OperationalError: no such table: rpsblast_results

Searching for proteins associated with a conserved domain

You can perform the reverse search (looking for proteins matching a domain, such as pfam00001) using the db_getHitsToExternalClusters.py function. It takes a list of external cluster IDs as input and returns a list of RPSBLAST hits to those external clusters (including names and descriptions).

Visualizing conserved domains associated with a protein

You can visualize the locations and strengths (E-values) of the hits from a given protein to conserved domain databases using the db_displayExternalClusterHits.py function, which takes a list of gene IDs as input and produces a PNG file displaying the position and name of each sufficiently-strong hit to external domains in relation to the gene (strongest hits are on the bottom).

Clone this wiki locally