-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.Rmd
244 lines (173 loc) Β· 9.66 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: "`orthogene`: Interspecies gene mapping"
author: "`r rworkflows::use_badges(branch='main', add_bioc_release = TRUE, add_bioc_download_month = TRUE, add_bioc_download_rank = TRUE, add_bioc_download_total = TRUE)`"
date: "<h4>README updated: <i>`r format( Sys.Date(), '%b-%d-%Y')`</i></h4>"
output:
github_document
---
```{r, echo=FALSE, include=FALSE}
pkg <- read.dcf("DESCRIPTION", fields = "Package")[1]
title <- read.dcf("DESCRIPTION", fields = "Title")[1]
description <- read.dcf("DESCRIPTION", fields = "Description")[1]
URL <- read.dcf('DESCRIPTION', fields = 'URL')[1]
owner <- tolower(strsplit(URL,"/")[[1]][4])
```
# Intro
`r description`
In brief, `orthogene` lets you easily:
- [**`convert_orthologs`** between any two species.](https://neurogenomics.github.io/orthogene/articles/orthogene#convert-orthologs)
- [**`map_species`** names onto standard taxonomic ontologies.](https://neurogenomics.github.io/orthogene/articles/orthogene#map-species)
- [**`report_orthologs`** between any two species.](https://neurogenomics.github.io/orthogene/articles/orthogene#report-orthologs)
- [**`map_genes`** onto standard ontologies](https://neurogenomics.github.io/orthogene/articles/orthogene#map-genes)
- [**`aggregate_mapped_genes`** in a matrix.](https://neurogenomics.github.io/orthogene/articles/orthogene#aggregate-mapped-genes)
- [get **`all_genes`** from any species.](https://neurogenomics.github.io/orthogene/articles/orthogene#get-all-genes)
- [**`infer_species`** from gene names.](https://neurogenomics.github.io/orthogene/articles/infer_species.html)
- [**`create_background`** gene lists based one, two, or more species.](https://neurogenomics.github.io/orthogene/reference/create_background.html)
- [**`get_silhouettes`** of each species from phylopic.](https://neurogenomics.github.io/orthogene/reference/get_silhouettes.html)
- [**`prepare_tree`** with evolutionary divergence times across >147,000 species.](https://neurogenomics.github.io/orthogene/reference/prepare_tree.html)
## Citation
If you use ``r pkg``, please cite:
<!-- Modify this by editing the file: inst/CITATION -->
> `r citation(pkg)$textVersion`
## [Documentation website](https://neurogenomics.github.io/orthogene/)
## [PDF manual](https://github.com/neurogenomics/orthogene/blob/main/inst/orthogene_1.5.1.pdf)
# Installation
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
# orthogene is only available on Bioconductor>=3.14
if(BiocManager::version()<"3.14") BiocManager::install(update = TRUE, ask = FALSE)
BiocManager::install("orthogene")
```
## Docker
`orthogene` can also be installed via a [Docker](https://hub.docker.com/repository/docker/neurogenomicslab/orthogene) or [Singularity](https://sylabs.io/guides/2.6/user-guide/singularity_and_docker.html)
container with Rstudio pre-installed. Further [instructions provided here](https://neurogenomics.github.io/orthogene/articles/docker).
# Methods
```{r setup}
library(orthogene)
data("exp_mouse")
# Setting to "homologene" for the purposes of quick demonstration.
# We generally recommend using method="gprofiler" (default).
method <- "homologene"
```
For most functions, `orthogene` lets users choose between different methods,
each with complementary strengths and weaknesses:
`"gprofiler"`, `"homologene"`, and `"babelgene"`
In general, we recommend you use `"gprofiler"` when possible,
as it tends to be more comprehensive.
While `"babelgene"` contains less species, it queries a wide variety
of orthology databases and can return a column "support_n" that tells
you how many databases support each ortholog gene mapping.
This can be helpful when you need a semi-quantitative
measure of mapping quality.
It's also worth noting that for smaller gene sets,
the speed difference between these methods becomes negligible.
```{r pros_cons, echo=FALSE}
pros_cons <- data.frame(
gprofiler=c("Reference organisms"="700+",
"Gene mappings"="More comprehensive",
"Updates"="Frequent",
"Orthology databases"=paste("Ensembl",
"HomoloGene",
"WormBase",sep = ", "),
"Data location"="Remote",
"Internet connection"="Required",
"Speed"="Slower"),
homologene=c("# reference organisms"="20+",
"Gene mappings"="Less comprehensive",
"Updates"="Less frequent",
"Orthology databases"="HomoloGene",
"Data location"="Local",
"Internet connection"="Not required",
"Speed"="Faster"),
babelgene=c("# reference organisms"="19 (but cannot convert between pairs of non-human species)",
"Gene mappings"="More comprehensive",
"Updates"="Less frequent",
"Orthology databases"="HGNC Comparison of Orthology Predictions (HCOP), which includes predictions from eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN",
"Data location"="Local",
"Internet connection"="Not required",
"Speed"="Medium")
)
knitr::kable(pros_cons)
```
# Quick example
## Convert orthologs
[`convert_orthologs`](https://neurogenomics.github.io/orthogene/reference/convert_orthologs.html)
is very flexible with what users can supply as `gene_df`,
and can take a `data.frame`/`data.table`/`tibble`, (sparse) `matrix`,
or `list`/`vector` containing genes.
Genes, transcripts, proteins, SNPs, or genomic ranges will be recognised in
most formats (HGNC, Ensembl, RefSeq, UniProt, etc.)
and can even be a mixture of different formats.
All genes will be mapped to gene symbols, unless specified otherwise with the
`...` arguments (see `?orthogene::convert_orthologs` or [here
](https://neurogenomics.github.io/orthogene/reference/convert_orthologs.html)
for details).
### Note on non-1:1 orthologs
A key feature of
[`convert_orthologs`](https://neurogenomics.github.io/orthogene/reference/convert_orthologs.html)
is that it handles the issue of genes with many-to-many mappings across species.
This can occur due to evolutionary divergence, and the function of these genes
tend to be less conserved and less translatable.
Users can address this using different strategies via `non121_strategy=`.
```{r convert_orthologs}
gene_df <- orthogene::convert_orthologs(gene_df = exp_mouse,
gene_input = "rownames",
gene_output = "rownames",
input_species = "mouse",
output_species = "human",
non121_strategy = "drop_both_species",
method = method)
knitr::kable(as.matrix(head(gene_df)))
```
`convert_orthologs` is just one of the many useful functions in `orthogene`.
Please see the
[documentation website](https://neurogenomics.github.io/orthogene/articles/orthogene)
for the full vignette.
# Additional resources
## [Hex sticker creation](https://github.com/neurogenomics/orthogene/blob/main/inst/hex/hexSticker.Rmd)
## [Benchmarking methods](https://github.com/neurogenomics/orthogene/blob/main/inst/benchmark/benchmarks.Rmd)
# Session Info
<details>
```{r Session Info}
utils::sessionInfo()
```
</details>
# Related projects
## Tools
- [`gprofiler2`](https://cran.r-project.org/web/packages/gprofiler2/vignettes/gprofiler2.html):
`orthogene` uses this package. `gprofiler2::gorth()` pulls from
[many orthology mapping databases](https://biit.cs.ut.ee/gprofiler/page/organism-list).
- [`homologene`](https://github.com/oganm/homologene):
`orthogene` uses this package. Provides API access to NCBI
[HomoloGene](https://www.ncbi.nlm.nih.gov/homologene) database.
- [`babelgene`](https://cran.r-project.org/web/packages/babelgene/vignettes/babelgene-intro.html): `orthogene` uses this package. `babelgene::orthologs()` pulls from
[many orthology mapping databases](https://cran.r-project.org/web/packages/babelgene/vignettes/babelgene-intro.html).
- [`annotationTools`](https://www.bioconductor.org/packages/release/bioc/html/annotationTools.html):
For interspecies microarray data.
- [`orthology`](https://www.leibniz-hki.de/en/orthology-r-package.html):
R package for ortholog mapping (deprecated?).
- [`hpgltools::load_biomart_orthologs()`](https://rdrr.io/github/elsayed-lab/hpgltools/man/load_biomart_orthologs.html):
Helper function to get orthologs from biomart.
- [`JustOrthologs`](https://github.com/ridgelab/JustOrthologs/):
Ortholog inference from multi-species genomic sequences.
- [`orthologr`](https://github.com/drostlab/orthologr):
Ortholog inference from multi-species genomic sequences.
- [`OrthoFinder`](https://github.com/davidemms/OrthoFinder):
Gene duplication event inference from multi-species genomics.
## Databases
- [HomoloGene](https://www.ncbi.nlm.nih.gov/homologene):
NCBI database that the R package
[homologene](https://github.com/oganm/homologene) pulls from.
- [gProfiler](https://biit.cs.ut.ee/gprofiler):
Web server for functional enrichment analysis and conversions of gene lists.
- [OrtholoGene](http://orthologene.org/resources.html):
Compiled list of gene orthology resources.
## Contact
### [Neurogenomics Lab](https://www.neurogenomics.co.uk/)
UK Dementia Research Institute
Department of Brain Sciences
Faculty of Medicine
Imperial College London
[GitHub](https://github.com/neurogenomics)
[DockerHub](https://hub.docker.com/orgs/neurogenomicslab)
<br>