Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Generating draft metabolic reconstructions from a reference

mattb112885 edited this page Apr 19, 2013 · 5 revisions

The concept of a GPR

A GPR (gene-protein-reaction relationship) is a Boolean relationship between genes that specifies which combinations of the genes are necessary and\or sufficient for a reaction to be catalyzed in a cell. They are commonly found in genome-scale metabolic reconstructions of specific organisms, which attempt to catalog all of the metabolic reactions in a cell and often (in combination with filling gaps and performing some mathematical transformations) can be used to make quantitative predictions of cellular phenotypes ( see review )

This is an example of a GPR from the C. beijerinckii metabolic model that we will use to illustrate the use of ITEP to propagate reaction calls (the same set of commands can be used with any number of GPRs, allowing you to attempt to propagate an entire model to other organisms in ITEP).

PFK  ( Cbei_0998 or Cbei_4852 )

The first column here is the reaction name assigned in the model, which can be anything. The second column contains the gene-protein-reaction relationship as assigned in the model.

Converting the gene IDs to ITEP IDs

In order to make use of this in ITEP you have to convert the IDs to ITEP gene IDs, but thankfully, most genome-scale metabolic reconstructions use locus tags in their GPR relationships, and most Genbank files include locus tags (particularly those from RefSeq or from the PubSEED).

To perform an ID conversion first save all of the GPR relationships in a tab-delimited file (I'm going to call this "TEST_GPR") with the first column representing the reaction ID and the second representing the GPR (as above). Then run the following command, where $root is replaced by the root directory of your repository:

$ replaceAliasesWithGeneNames.py $root/aliases/aliases TEST_GPR > TEST_GPR_REPLACED

The first argument is the "aliases file" that is automatically created when processing the genbank files. If all goes well you should get an output that looks like this:

PFK     ( fig|290402.1.peg.992 or fig|290402.1.peg.4768 )

It is worth checking to see if any of the conversions failed (if the conversion fails the program just prints out the GPR with the same IDs as were inputted). In these cases, you cannot use ITEP to evaluate the reaction, because it could not find them from the Genbank file inputs.

Evaluating the presence and absence of the reaction

Now to evaluate this you need to have run a clustering analysis to get presence\absence calls for each of the genes in the GPR. Recall that you can get a list of all the cluster runs available by running:

$ db_getAllClusterRuns.py
all_I_2.0_c_0.4_m_maxbit

Lets use the results of this cluster run to evaluate whether reaction PFK is present in C. novii and A. woodii. The function to do this is db_evaluateReactionsFromGpr.py

$ db_evaluateReactionsFromGpr.py -g TEST_GPR_REPLACED -i all_I_2.0_c_0.4_m_maxbit
orgs    Clostridium_beijerinckii_NCIMB_8052     Clostridium_novyi_NT    Acetobacterium_woodii_DSM_1030
PFK     1       1       1

Ands and Ors

A Boolean rule can have either an AND or an OR; connecting two genes with an AND implies that both of these genes must be present for a function to occur (note though - it is possible to get more complicated combinations, such as ( (A AND B) or C ), which means that C is sufficient to perform a function, but if it is not present both A and B must be present to do it).

Now, lets see what happens if you use "and" instead of "or" in the Boolean rule. Create a new file with the following GPR in it:

PFK     ( fig|290402.1.peg.992 and fig|290402.1.peg.4768 )

Save it as "TEST_GPR_AND" and run

$ db_evaluateReactionsFromGpr.py -g TEST_GPR_AND -i all_I_2.0_c_0.4_m_maxbit
orgs    Clostridium_beijerinckii_NCIMB_8052     Clostridium_novyi_NT    Acetobacterium_woodii_DSM_1030
PFK     1       0       0

This is due to the finding (which we explored in greater depth in an earlier tutorial) that only one of the PFK-6 homologs in C. beijerinckii is actually conserved in the other organisms in the database, so if we require that both homologs must be present, then the reaction is predicted to be absent in these other two organisms.

If you have GPRs with "And" in them and want to see if ANY of the genes required for a reaction to be catalyzed are predicted to be present, use the -o flag to change all of the instances of "and" to "or". If any of the genes are predicted to be present the reaction will also be predicted present.

Finally, note that the prediction is very sensitive to the clustering parameters and cutoffs. It is worth running with several different cutoffs and taking a closer look at the ones that are sensitive to this using other ITEP scripts.

Clone this wiki locally