Skip to content
Sergei Tarasov edited this page May 11, 2021 · 40 revisions

ontoFAST: Manual

Table of Contents

Installation

To install and run ontoFAST you need to have R and RStudio preinstalled.

Installing from CRAN

install.packages("ontoFAST")
install.packages("igraph")
library("ontoFAST")

Installing the development version

The development version of ontoFAST can be directly installed from the ontoFAST repository at GitHub

install.packages("devtools")
library("devtools")
install_github("sergeitarasov/ontoFAST")
install.packages("igraph")
library("ontoFAST")

The workflow

The workflow of ontoFAST consists of the following steps:

  1. Read in required ontology and character statements:

    • Ontology has to be the .obo file format or any R file format (e.g., .rmd, rda). The ontology .obo files can be downloaded from BioPortal or other repositories.
    • Character statements must be in a table file format. The tables can imported in R in .csv fomat. If you have your character matrix in nexus or tnt format open it in Mesquite and copy character statements to e.g., Excel spreadsheet, then save the spreadsheet as a .csv file. The example of the table file format for character statements is below. The ontoFAST is designed to annotate the statements from one table cell only. So, merge the "character statements" and "states" if you are need both during the annotation.
    CHARACTER_STATEMENTS STATE_1 STATE_2
    1 Ocellar corona absent present
    2 Supraantennal groove or depression absent present, accommodating scapes
    3 Notch on medial margin of eye absent present
  2. Run automatic annotation of characters with ontology terms using ontoFAST functions. This step is optional and can be skipped if not required.

  3. Run ontoFAST interactively to make de novo character annotations, post-process automatic annotations or edit previous annotations. The interactive mode visualizes the ontology as a graph thus providing a convenient way to navigate through it.

  4. As soon as annotations are done you can:

    • visualize annotations using the sunburst plots to demonstrate hierarchical relationships or Cytoscape to see the network structure.
    • query annotations using the in-built ontoFAST functions.
  5. Save your results.

Step 1 and 2: reading and processing data

First, we need to read in the required ontology and character statements. This can be done either (a) quickly with the in-built functions to process ontology and characters or (b) manually and slowly by creating the required ontology object. The latter can be useful if a finer tuning of information stored in ontology is needed.

(a) Quick processing of data

Let's first read in ontology. In this example, I use Hymenoptera anatomy ontology that is available as embedded data set (HAO):

data(HAO)
hao_obo<-HAO

Alternatively ontology can be parsed directly from .obo file using get_OBO function from ontologyIndex package.

hao_obo<-get_OBO(system.file("data_onto", "HAO.obo", package = "ontoFAST"),extract_tags="everything", propagate_relationships = c("BFO:0000050", "is_a"))

The ontology object (i.e., hao_obo) is a list with numerous subsists that contain information of ontology terms and their relationships.

Now, let's read in character statements. The character statements stored in the table format can be imported in R using read.csv function. Here, I use embedded data set from morphological phylogeny of Sharkey et.al. (2011). This data sets contains 392 characters. It also available as both embedded R data set and a .csv file.

data(Sharkey_2011)
Sharkey_characters<-Sharkey_2011

# using csv file
Sharkey_characters<-read.csv(system.file("data_onto", "Sharkey_2011.csv", package = "ontoFAST"), header=T,  stringsAsFactors = F, na.strings = "")

To automatically process the data use onto_process function. This function parses synonyms, character statements, and character IDs into hao_obo object. By default, it also performs automatic annotation of character statements with ontology terms (do.annot = TRUE).

hao_obo<-onto_process(hao_obo, Sharkey_characters[,1], do.annot = TRUE)

The automatic processing of data is done. Now, you can proceed to the interactive editing of your data. See the Step 3.

(b) Manual and slow processing of data

Manual processing assumes that all elements of ontology object are created manually.

First, let's read ontology and characters

hao_obo<-HAO
Sharkey_characters<-Sharkey_2011

Let's create IDs which will be used to call the characters. All data have to be placed in the ontology object (here hao_obo).

id_characters<-paste("CHAR:",c(1:392), sep="")
hao_obo$id_characters<-id_characters

Now, let's add character statements to hao_obo and associate them with the IDs.

name_characters<-Sharkey_characters[,1]
names(name_characters)<-id_characters
hao_obo$name_characters<-name_characters

To make automatic annotation more efficient, we can use the synonyms of ontology terms. The synonyms are stored in hao_obo$synonym; they have to be pre-possessed to be available for automatic annotation. To do it, we use syn_extract() function.

hao_obo$parsed_synonyms<-syn_extract(hao_obo)

Now, we can run the automatic annotation.

hao_obo$auto_annot_characters<-annot_all_chars(hao_obo, use.synonyms=TRUE, min_set=TRUE)

The manual processing is done, proceed to the Step 3 for running the interactive session.

Step 3: Running ontoFAST Interactively

First, we to create a new environment to store a variable that will serve as an input and output for the interactive mode; the new environment should be called ontofast (other names will not work), it enables global usage of the input variable for the functions operating during the interactive session. ontoFAST will not be working correctly without creating this new environment; it is recommended to set the empty enviroment as a parent for the created environment (see here). Next, use the function make_shiny_in() to create an object in the ontofast environment, let's call it shiny_in. Note, the name of this variable is taken as an argument shiny_in="shiny_in" by the function runOntoFast()to launch the interactive session. Executing the runOntoFast() line strat the interactive session. It may take a few seconds until all characters are loaded.

ontofast <- new.env(parent = emptyenv())
ontofast$shiny_in <- make_shiny_in(hao_obo)
runOntoFast(is_a = c("is_a"), part_of = c("BFO:0000050"), shiny_in="shiny_in", file2save = "OntoFAST_shiny_in.RData")

The interface and annotations

The interactive interface consists of four panels . The ontology panel shows an interactive graph where nodes are the ontology classes and edges are usually part_of and is_a relationships. The customize panel on the top of the window allows selecting relationships to display and navigate to a required term by typing a few letters of its name. The information panel shows ID, synonyms, and definition of the term selected in the navigation panel. The leftmost character panel shows the character statements.

There are three ways to annotate characters:

  1. if you ran fuzzy matching with onto_process() function, the candidate terms are shown below the "Add" button and can be selected by checking the respective box(es).
  2. click on a node in the ontology panel, move the cursor to the character panel and click the respective "Add" button.
  3. paste term URI right in the character panel. Every character can be annotated with more than one term.

By default ontoFAST displays all the characters in the data set. Using argument nchar=N you my restrict the visualization to N characters. You may also use ontoFAST as an ontology browser without loading characters by specifying show.chars=FALSE. Use "Save file" button of the interactive interface (top right corner) to save data during the session.

All changes made during the interactive session are immediately saved to ontofast$shiny_in object. The annotations are stored as lists in ontofast$shiny_in$terms_selected and ontofast$shiny_in$terms_selected_id

The interface of ontoFAST. (A) Ontology panel. (B) Customize panel. (C) Information panel. (D) Character panel.

minipic

Step 4: Visualize and Query Annotations

Having the characters annotated, one can proceed to the next step - visualizing and querying the annotations.

Sunburst plots

The hierarchical tree-like ontological dependencies among characters can be visualized using sunburst plot. This plot shows hierarchy through a series of rings. Each ring corresponds to a level in the ontological hierarchy, with the inner circles representing the root nodes and outermost circles representing character statements. To do this in R, you need to have sunburstR package intalled.

# install.packages("sunburstR")
 library("sunburstR")

To have an interpretable and clear visualization, I suggest using either part_of or is_a relationships as they have more or less a tree-like hierarchy. Using both relationships simultaneously can be messy. Let's read in HAO ontology and propagate only part_of relationships. The ID for the part_of relationships in HAO is "BFO:0000050". To use is_a relationships, change the argument propagate_relationships to "is_a".

ontology_partof=get_OBO(system.file("data_onto", "HAO.obo", package = "ontoFAST"),
              extract_tags="everything", propagate_relationships = c("BFO:0000050"))

Now, I quickly process ontology to incorporate character statements, without using automatic characters annotation. Next, I incorporate the embedded manual annotations stored in Sharkey_2011_annot data object as a list annot_characters of ontology_partof object.

ontology_partof<-onto_process(ontology_partof, Sharkey_2011[,1], do.annot = F)
data(Sharkey_2011_annot)
ontology_partof$annot_characters<-Sharkey_2011_annot

The input for sunburst plot can be created using paths_sunburst function. You may consider excluding some high-level terms to make visualization clearer by specifying exclude.terms = exclude_terms.

data(exclude_terms)
tb<-paths_sunburst(ontology_partof, annotations = ontology_partof$annot_characters, 
                   exclude.terms = exlude_terms, sep = "-")

The data are now ready for visualization. Use sunburst function form sunburstR to visualize them. The visualization in R or a browser is interactive - check it out by placing mouse over.

sunburst(tb)

Visualization of characters annotated using ontoFAST for Hymenoptera and Scarabaeinae (dung beetles). A-B The network of ontology classes and the linked characters produced using Cytoscape. (B-D) The plots show hierarchy of characters and ontological classes, produced using sunburstR package sunburst plot

Cytoscape.

Yo may consider using Cytoscape to get insight into the complex network of ontology terms and characters. To do it, export the annotations into Cytoscape format using export_cytoscape function and save the exported object as a csv file.

ontology<-HAO
# processing ontology to incorporate character statements
ontology<-onto_process(ontology, Sharkey_2011[,1], do.annot = F)
# embedding manual annotations
ontology$annot_characters<-Sharkey_2011_annot

# exporting
cyto<-export_cytoscape(ontology, annotations = ontology$annot_characters, is_a = c("is_a"), part_of = c("BFO:0000050"))
write.csv(cyto, file="HAO_chars.csv")

To import the saved file to Cytoscape open Cytoscape and choose File -> Import -> Network -> "HAO_chars.csv".

Query linked characters

  1. For each term you can get a number of characters which are descendants of the term.
chars_per_term(ontology, annotations = ontology$annot_characters) %>% head()
  1. Get ancestral ontology terms for a set of characters.
get_ancestors_chars(ontology, c("CHAR:1", "CHAR:2", "CHAR:3"), 
                    annotations = ontology$annot_characters)
  1. Get characters which are descendants of a particular ontology term.
get_descendants_chars(ontology, annotations = ontology$annot_characters, terms="HAO:0000653")

Step 5: save your data

The convenient way to save all data is to save the ontology object using native R format .Rdata. For example, if you did manual and automatic annotations you can save shiny_in object in .Rdata. The Rdata format will save all information stored in shiny_in.

save(ontofast$shiny_in, file="shiny_in.Rdata")
# to read the file in R
# shiny_in <- readRDS('shiny_in.Rdata')

You can also export your annotation and characters in a readable csv table using different ways:

# exporting annotations using export_annotations
tb<-export_annotations(ontofast$shiny_in, annotations="manual", incl.names=TRUE,collapse="; ")
tb<-export_annotations(ontofast$shiny_in, annotations="auto", incl.names=TRUE,collapse="; ")
tb<-export_annotations(ontofast$shiny_in, annotations="auto", incl.names=TRUE,collapse=NULL)
# write.csv(out, "annotations.csv")

# exporting annotations using list2edges
out <- list2edges(ontofast$shiny_in$terms_selected_id)
# write.csv(out, "annotations.csv")

Tune the format of the table using arguments of export_annotations function.

annot_csv<-export_annotations(ontology, annotations = ontology$annot_characters, incl.names = T,
  sep.head = ", (", sep.tail = ")", collapse = ";") 
head(annot_csv)

# save file
write.csv(annot_csv, file="annot_csv.csv")