diff --git a/articles/V01CRBHitsBasicVignette.html b/articles/V01CRBHitsBasicVignette.html index 564e4fc..70da330 100644 --- a/articles/V01CRBHitsBasicVignette.html +++ b/articles/V01CRBHitsBasicVignette.html @@ -198,7 +198,7 @@
The CRBH algorithm was introduced by Aubry
S, Kelly S et al. (2014) and ported to python shmlast (Scott C.
@@ -261,9 +261,9 @@ The resulting CRBHs (see #crbhalgorithm) can be further processed to
e.g. filtered for Tandem Duplicate (see CRBHits
@@ -819,9 +819,9 @@ Note: Default data.frame handling
with the dplyr package is possible on the original
@@ -1064,8 +1068,8 @@ Note: The following example can take some time and
is not calculated by the vignette building process. Ullrich K (2023).
CRBHits: Conditional reciprocal best hits (CRBHits) in R.
-https://gitlab.gwdg.de/mpievolbio-it/crbhits,
-https://mpievolbio-it.pages.gwdg.de/crbhits/.
+R package version 0.0.5,
+https://mpievolbio-it.pages.gwdg.de/crbhits/, https://gitlab.gwdg.de/mpievolbio-it/crbhits.
The CRBH algorithm was introduced by @aubry2014deep and builds upon the traditional RBH approach to find additional orthologous sequences between two sets of sequences. As described earlier [@aubry2014deep; @scott2017shmlast], CRBH uses the sequence search results to fit an expect value (E-value) cutoff given each RBH to subsequently add sequence pairs to the list of bona-fide orthologs given their alignment length. Unfortunately, as mentioned by @scott2017shmlast, the original implementation of CRBH (crb-blast) lag improved blast-like search algorithm to speed up the analysis. As a consequence, @scott2017shmlast ported CRBH to python shmlast, while shmlast cannot deal with IUPAC nucleotide code so far. CRBHits constitutes a new R package, which build upon previous implementations and ports CRBH into the R environment, which is popular among biologists. CRBHits improve CRBH by additional implemented filter steps [@rost1999twilight] and the possibility to apply custom filters prior E-value fitting. Further, the resulting CRBH pairs can be evaluated for the presence of tandem duplicated genes, gene order based syntenic groups and evolutionary rates. The obtained CRBHit pairs can also be used to calculate synonymous (Ks) and nonsynonymous (Ka) substitutions per hit pair using either the model from @li1993unbiased or from @yang2000estimating.2.
-
2.1. 1. step: sequence similarity search
@@ -798,9 +798,9 @@
2.3.1. CRBHit
3. Ka/Ks Calculations
-3. Ka/Ks CalculationsNCBI or ENSEMBL Input (see
cds2genepos()
function)) and supply selfblast
results for both Input CDS.3.1. Codon Alignments - cds2codonaln()
diff --git a/articles/V02KaKsVignette.html b/articles/V02KaKsVignette.html
index 7e17c2f..eff5c31 100644
--- a/articles/V02KaKsVignette.html
+++ b/articles/V02KaKsVignette.html
@@ -187,7 +187,7 @@
Table of ContentssuppressPackageStartupMessages
(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(gridExtra))
-suppressPackageStartupMessages(library(curl))
+suppressPackageStartupMessages(library(curl))
## compile LAST, KaKs_Calculator2.0 and DAGchainer for the vignette
vignette.paths <- make_vignette()
1.3. Calculate/Filte
## get help
#?cds2rbh
-
1.5 Assign Tandem
plotCurve=TRUE,
lastpath=vignette.paths[1])
attributes(ARATHA_selfblast_crbh)$selfblast
-
-## get selfblast CRBHit pairs for A. lyrata
ARALYR_selfblast_crbh <- cds2rbh(
@@ -657,9 +657,9 @@
1.5 Assign Tandem
plotCurve=TRUE,
lastpath=vignette.paths[1])
attributes(ARALYR_selfblast_crbh)$selfblast
-## get gene position for A. thaliana longest isoforms
ARATHA.cds.longest.genepos <- cds2genepos(
@@ -706,8 +706,8 @@
1.5 Assign Tandem
ggplot2::facet_wrap(~gene.chr) +
ggplot2::scale_colour_manual(values=
CRBHitsColors(length(table(tandem_group_size))))
1.6 Synteny with DAGchainer<
## plot DAGchainer results for each chromosome combination
dim(ARATHA_ARALYR_crbh.dagchainer.bp)
plot_dagchainer(ARATHA_ARALYR_crbh.dagchainer.bp)
-
-## plot DAGchainer results selected chromosomes
g <- plot_dagchainer(
@@ -773,31 +773,34 @@
1.6 Synteny with DAGchainer<
"AA2:NW_003302551.1", "AA2:NW_003302550.1",
"AA2:NW_003302549.1", "AA2:NW_003302548.1"))
g
-## change title size
g + ggplot2::theme(title=element_text(size=16))
-## change axis title size (gene1.mid; gene2.mid)
g + ggplot2::theme(axis.title.x=element_text(size=16),
axis.title.y=element_text(size=16))
-## change grid title size
g + theme(strip.text.x=element_text(size=16), strip.text.y=element_text(size=16))
-## change grid axis size and angle
g + theme(axis.text.x=element_text(size=12, angle=90))
@@ -999,9 +1002,9 @@ ## get help
#?plot_dagchainer
3. Ka/Ks Filt
## plot Ka/Ks results as histogram colored by Ka/Ks values
g <- plot_kaks(kaks=ath_aly_ncbi_kaks)
-
-## plot Ka/Ks results as histogram filter for ka.min, ka.max, ks.min, ks.max
g.min_max <- plot_kaks(
@@ -1010,9 +1013,9 @@
3. Ka/Ks Filt
ka.max=1,
ks.min=0,
ks.max=1)
-## select subset of chromosomes - needs gene position information
head(ARATHA.cds.longest.genepos)
@@ -1026,8 +1029,9 @@
3. Ka/Ks Filt
"NC_003071.7",
"NW_003302551.1",
"NW_003302554.1"))
-## plot Ka/Ks results and split by chromosome - needs gene position information
@@ -1041,8 +1045,8 @@
3. Ka/Ks Filt
"NW_003302551.1",
"NW_003302554.1"),
splitByChr=TRUE)
3. Ka/Ks Filt
## filter for Ks values < 1 on plot object and plot
g.split$g.kaks$data %>% dplyr::filter(ks<1) %>%
ggplot2::ggplot() + ggplot2::geom_histogram(binwidth=0.1, aes(x=ks))
-
4.
threads=2,
plotCurve=TRUE,
lastpath=vignette.paths[1])
-
-## get gene position for H. sapiens longest isoforms
HOMSAP.cds.longest.genepos <- cds2genepos(
@@ -1150,9 +1154,9 @@
4.
select.chr=c(
"AA1:1","AA1:2","AA1:3","AA1:4","AA1:5","AA1:14",
"AA2:1","AA2:3","AA2:4","AA2:5","AA2:14"))
-## get selfblast CRBHit pairs for H. sapiens
HOMSAP_selfblast_crbh <- cds2rbh(
@@ -1200,9 +1204,9 @@
4.
select.chr=c(
"AA1:1","AA1:2","AA1:3","AA1:4","AA1:5","AA1:14",
"AA2:1","AA2:3","AA2:4","AA2:5","AA2:14"))
4.
## plot Ka/Ks results as histogram colored by Ka/Ks values
g <- plot_kaks(hom_pan_ensembl_kaks)
-
Citation
@Manual{,
title = {CRBHits: Conditional reciprocal best hits (CRBHits) in R},
author = {Kristian K Ullrich},
year = {2023},
- note = {https://gitlab.gwdg.de/mpievolbio-it/crbhits,
+ note = {R package version 0.0.5,
https://mpievolbio-it.pages.gwdg.de/crbhits/},
+ url = {https://gitlab.gwdg.de/mpievolbio-it/crbhits},
}
Summary
Functions and Examples isoform.source = "NCBI", plotCurve = TRUE,
threads = 8)
#get help ?cdsfile2rbh
@@ -148,8 +148,8 @@
Functions and Examples#get help ?rbh2dagchainer
#get help ?plot.dagchainer
#get help ?plot.kaks