-
Notifications
You must be signed in to change notification settings - Fork 34
Map2Slim
Given a GO slim file, and a current ontology (in one or more files), this script will map a gene association file (containing annotations to the full GO) to the terms in the GO slim.
The script can be used to either create a new gene association file, which contains the most pertinent GO slim accessions, or in count-mode, in which case it will give distinct gene product counts for each slim term.
The association file format is described here:
http://geneontology.org/page/go-annotation-file-formats
GO is a Directed Acyclic Graph (DAG), not a tree. This means that there is often more than one path from a GO term up to the root Gene_Ontology node; the path may intersect multiple terms in the slim ontology - which means that one annotation can map to multiple slim terms!
GO also uses multiple relations (object properties) and depending on which GO file you use with map2slim different relations will be considered for slimming purposes. We recommend the go-basic version of the ontology be used, which contains:
- subClassOf (is a)
- part of
- regulates (+ positively and negatively regulates)
You can also use the full version of GO and filter those relationships you do not want to consider.
In a hypothetical example, blue circles show terms in the GO slim and yellow circles show terms in the full ontology. The full ontology subsumes the slim, so the blue terms are also in the ontology.
GO ID MAPS TO SLIM ID ALL SLIM ANCESTORS
===== =============== ==================
5 2+3 2,3,1
6 3 only 3,1
7 4 only 4,3,1
8 3 only 3,1
9 4 only 4,3,1
10 2+3 2,3,1
The 2nd column shows the most pertinent ID(s) in the slim direct mapping. The 3rd column shows all ancestors in the slim.
Note in particular the mapping of ID 9: although this has two paths to the root through the slim via 3 and 4, 3 is discarded because it is subsumed by 4.
On the other hand, 10 maps to both 2 and 3 because these are both the first slim ID in the two valid paths to the root, and neither subsumes the other.
The algorithm used is:
-
to map any one term in the full ontology: find all valid paths through to the root node in the full ontology
-
for each path, take the first slim term encountered in the path
-
discard any redundant slim terms in this set i.e. slim terms subsumed by other slim terms in the set
OWLTools provides a dedicated option for map2slim (--map2slim
). The general workflow is as follows:
-
Load the ontology
OWLTools can load local ontology files or PURLs.
-
Load the GAF
OWLTools expects Gene Annotations Files (GAFs) as local files, use:
--gaf FILE
-
Select subset
There are two options to define the relevant subset:
- use existing subset:
--subset NAME
OR
-
use custom set of identifiers
--idfile FILE
The id file is expected to contain a single identifier per line
- use existing subset:
-
Save modified GAF
Set the output file for the mapped annotations using
--write-gaf FILE
Example command lines:
- using a custom slim from an id file:
owltools go.obo --gaf annotations.gaf --map2slim --idfile slim.terms --write-gaf annotations.mapped.gaf
- using an existing slim
owltools go.obo --gaf annotations.gaf --map2slim --subset goslim_pombe --write-gaf annotations.mapped.gaf
General information about getting and using OWLTools can be found at https://github.com/owlcollab/owltools/wiki/Install-OWLTools