-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- Installation
- The workflow
- Step 1 and 2: reading and processing data
- Step 3: running ontoFAST interactively
- Step 4: visualize and query annotations
- Step 5: save your data
To install and run ontoFAST
you need to have R and RStudio preinstalled.
install.packages("ontoFAST")
install.packages("igraph")
library("ontoFAST")
The development version of ontoFAST
can be directly installed from the ontoFAST repository at GitHub
install.packages("devtools")
library("devtools")
install_github("sergeitarasov/ontoFAST")
install.packages("igraph")
library("ontoFAST")
The workflow of ontoFAST
consists of the following steps:
-
Read in required ontology and character statements:
- Ontology has to be the
.obo
file format or any R file format (e.g.,.rmd
,rda
). The ontology.obo
files can be downloaded from BioPortal or other repositories. - Character statements must be in a table file format. The tables can imported in
R
in.csv
fomat. If you have your character matrix innexus
ortnt
format open it inMesquite
and copy character statements to e.g.,Excel
spreadsheet, then save the spreadsheet as a.csv
file. The example of the table file format for character statements is below. TheontoFAST
is designed to annotate the statements from one table cell only. So, merge the "character statements" and "states" if you are need both during the annotation.
CHARACTER_STATEMENTS STATE_1 STATE_2 1 Ocellar corona absent present 2 Supraantennal groove or depression absent present, accommodating scapes 3 Notch on medial margin of eye absent present - Ontology has to be the
-
Run automatic annotation of characters with ontology terms using
ontoFAST
functions. This step is optional and can be skipped if not required. -
Run
ontoFAST
interactively to make de novo character annotations, post-process automatic annotations or edit previous annotations. The interactive mode visualizes the ontology as a graph thus providing a convenient way to navigate through it. -
As soon as annotations are done you can:
- visualize annotations using the sunburst plots to demonstrate hierarchical relationships or Cytoscape to see the network structure.
- query annotations using the in-built
ontoFAST
functions.
-
Save your results.
First, we need to read in the required ontology and character statements. This can be done either (a) quickly with the in-built functions to process ontology and characters or (b) manually and slowly by creating the required ontology object. The latter can be useful if a finer tuning of information stored in ontology is needed.
Let's first read in ontology. In this example, I use Hymenoptera anatomy ontology that is available as embedded data set (HAO
):
data(HAO)
hao_obo<-HAO
Alternatively ontology can be parsed directly from .obo
file using get_OBO
function from ontologyIndex
package.
hao_obo<-get_OBO(system.file("data_onto", "HAO.obo", package = "ontoFAST"),extract_tags="everything", propagate_relationships = c("BFO:0000050", "is_a"))
The ontology object (i.e., hao_obo
) is a list with numerous subsists that contain information of ontology terms and their relationships.
Now, let's read in character statements. The character statements stored in the table format can be imported in R using read.csv
function. Here, I use embedded data set from morphological phylogeny of Sharkey et.al. (2011). This data sets contains 392 characters. It also available as both embedded R data set and a .csv
file.
data(Sharkey_2011)
Sharkey_characters<-Sharkey_2011
# using csv file
Sharkey_characters<-read.csv(system.file("data_onto", "Sharkey_2011.csv", package = "ontoFAST"), header=T, stringsAsFactors = F, na.strings = "")
To automatically process the data use onto_process
function. This function parses synonyms, character statements, and character IDs into hao_obo
object. By default, it also performs automatic annotation of character statements with ontology terms (do.annot = TRUE
).
hao_obo<-onto_process(hao_obo, Sharkey_characters[,1], do.annot = TRUE)
The automatic processing of data is done. Now, you can proceed to the interactive editing of your data. See the Step 3.
Manual processing assumes that all elements of ontology object are created manually.
First, let's read ontology and characters
hao_obo<-HAO
Sharkey_characters<-Sharkey_2011
Let's create IDs which will be used to call the characters. All data have to be placed in the ontology object (here hao_obo
).
id_characters<-paste("CHAR:",c(1:392), sep="")
hao_obo$id_characters<-id_characters
Now, let's add character statements to hao_obo
and associate them with the IDs.
name_characters<-Sharkey_characters[,1]
names(name_characters)<-id_characters
hao_obo$name_characters<-name_characters
To make automatic annotation more efficient, we can use the synonyms of ontology terms. The synonyms are stored in hao_obo$synonym
; they have to be pre-possessed to be available for automatic annotation. To do it, we use syn_extract()
function.
hao_obo$parsed_synonyms<-syn_extract(hao_obo)
Now, we can run the automatic annotation.
hao_obo$auto_annot_characters<-annot_all_chars(hao_obo, use.synonyms=TRUE, min_set=TRUE)
The manual processing is done, proceed to the Step 3 for running the interactive session.
First, we to create a new environment to store a variable that will serve as an input and output for the interactive mode; the new environment should be called ontofast
(other names will not work), it enables global usage of the input variable for the functions operating during the interactive session. ontoFAST
will not be working correctly without creating this new environment; it is recommended to set the empty enviroment as a parent for the created environment (see here). Next, use the function make_shiny_in()
to create an object in the ontofast
environment, let's call it shiny_in
. Note, the name of this variable is taken as an argument shiny_in="shiny_in"
by the function runOntoFast()
to launch the interactive session. Executing the runOntoFast()
line strat the interactive session. It may take a few seconds until all characters are loaded.
ontofast <- new.env(parent = emptyenv())
ontofast$shiny_in <- make_shiny_in(hao_obo)
runOntoFast(is_a = c("is_a"), part_of = c("BFO:0000050"), shiny_in="shiny_in", file2save = "OntoFAST_shiny_in.RData")
The interactive interface consists of four panels . The ontology panel
shows an interactive graph where nodes are the ontology classes and edges are usually part_of
and is_a
relationships. The customize panel
on the top of the window allows selecting relationships to display and navigate to a required term by typing a few letters of its name. The information panel
shows ID, synonyms, and definition of the term selected in the navigation panel. The leftmost character panel
shows the character statements.
There are three ways to annotate characters:
- if you ran fuzzy matching with
onto_process()
function, the candidate terms are shown below the "Add" button and can be selected by checking the respective box(es). - click on a node in the
ontology panel
, move the cursor to thecharacter panel
and click the respective "Add" button. - paste term URI right in the
character panel
. Every character can be annotated with more than one term.
By default ontoFAST
displays all the characters in the data set. Using argument nchar=N
you my restrict the visualization to N
characters. You may also use ontoFAST as an ontology browser without loading characters by specifying show.chars=FALSE
. Use "Save file" button of the interactive interface (top right corner) to save data during the session.
All changes made during the interactive session are immediately saved to ontofast$shiny_in
object. The annotations are stored as lists in ontofast$shiny_in$terms_selected
and ontofast$shiny_in$terms_selected_id
The interface of ontoFAST. (A) Ontology panel. (B) Customize panel. (C) Information panel. (D) Character panel.
Having the characters annotated, one can proceed to the next step - visualizing and querying the annotations.
The hierarchical tree-like ontological dependencies among characters can be visualized using sunburst plot. This plot shows hierarchy through a series of rings. Each ring corresponds to a level in the ontological hierarchy, with the inner circles representing the root nodes and outermost circles representing character statements. To do this in R, you need to have sunburstR package intalled.
# install.packages("sunburstR")
library("sunburstR")
To have an interpretable and clear visualization, I suggest using either part_of
or is_a
relationships as they have more or less a tree-like hierarchy. Using both relationships simultaneously can be messy.
Let's read in HAO ontology and propagate only part_of
relationships. The ID for the part_of
relationships in HAO is "BFO:0000050"
. To use is_a
relationships, change the argument propagate_relationships
to "is_a"
.
ontology_partof=get_OBO(system.file("data_onto", "HAO.obo", package = "ontoFAST"),
extract_tags="everything", propagate_relationships = c("BFO:0000050"))
Now, I quickly process ontology to incorporate character statements, without using automatic characters annotation. Next, I incorporate the embedded manual annotations stored in Sharkey_2011_annot
data object as a list annot_characters
of ontology_partof
object.
ontology_partof<-onto_process(ontology_partof, Sharkey_2011[,1], do.annot = F)
data(Sharkey_2011_annot)
ontology_partof$annot_characters<-Sharkey_2011_annot
The input for sunburst plot can be created using paths_sunburst
function. You may consider excluding some high-level terms to make visualization clearer by specifying exclude.terms = exclude_terms
.
data(exclude_terms)
tb<-paths_sunburst(ontology_partof, annotations = ontology_partof$annot_characters,
exclude.terms = exlude_terms, sep = "-")
The data are now ready for visualization. Use sunburst
function form sunburstR
to visualize them. The visualization in R or a browser is interactive - check it out by placing mouse over.
sunburst(tb)
Visualization of characters annotated using ontoFAST for Hymenoptera and Scarabaeinae (dung beetles). A-B The network of ontology classes and the linked characters produced using Cytoscape. (B-D) The plots show hierarchy of characters and ontological classes, produced using sunburstR package
Yo may consider using Cytoscape to get insight into the complex network of ontology terms and characters. To do it, export the annotations into Cytoscape format using export_cytoscape
function and save the exported object as a csv file.
ontology<-HAO
# processing ontology to incorporate character statements
ontology<-onto_process(ontology, Sharkey_2011[,1], do.annot = F)
# embedding manual annotations
ontology$annot_characters<-Sharkey_2011_annot
# exporting
cyto<-export_cytoscape(ontology, annotations = ontology$annot_characters, is_a = c("is_a"), part_of = c("BFO:0000050"))
write.csv(cyto, file="HAO_chars.csv")
To import the saved file to Cytoscape open Cytoscape and choose File -> Import -> Network -> "HAO_chars.csv".
- For each term you can get a number of characters which are descendants of the term.
chars_per_term(ontology, annotations = ontology$annot_characters) %>% head()
- Get ancestral ontology terms for a set of characters.
get_ancestors_chars(ontology, c("CHAR:1", "CHAR:2", "CHAR:3"),
annotations = ontology$annot_characters)
- Get characters which are descendants of a particular ontology term.
get_descendants_chars(ontology, annotations = ontology$annot_characters, terms="HAO:0000653")
The convenient way to save all data is to save the ontology object using native R format .Rdata
. For example, if you did manual and automatic annotations you can save shiny_in
object in .Rdata
. The Rdata
format will save all information stored in shiny_in
.
save(ontofast$shiny_in, file="shiny_in.Rdata")
# to read the file in R
# shiny_in <- readRDS('shiny_in.Rdata')
You can also export your annotation and characters in a readable csv table using different ways:
# exporting annotations using export_annotations
tb<-export_annotations(ontofast$shiny_in, annotations="manual", incl.names=TRUE,collapse="; ")
tb<-export_annotations(ontofast$shiny_in, annotations="auto", incl.names=TRUE,collapse="; ")
tb<-export_annotations(ontofast$shiny_in, annotations="auto", incl.names=TRUE,collapse=NULL)
# write.csv(out, "annotations.csv")
# exporting annotations using list2edges
out <- list2edges(ontofast$shiny_in$terms_selected_id)
# write.csv(out, "annotations.csv")
Tune the format of the table using arguments of export_annotations
function.
annot_csv<-export_annotations(ontology, annotations = ontology$annot_characters, incl.names = T,
sep.head = ", (", sep.tail = ")", collapse = ";")
head(annot_csv)
# save file
write.csv(annot_csv, file="annot_csv.csv")