diff --git a/articles/V01CRBHitsBasicVignette.html b/articles/V01CRBHitsBasicVignette.html index 564e4fc..70da330 100644 --- a/articles/V01CRBHitsBasicVignette.html +++ b/articles/V01CRBHitsBasicVignette.html @@ -198,7 +198,7 @@

Table of ContentssuppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(gridExtra)) -suppressPackageStartupMessages(library(curl)) +suppressPackageStartupMessages(library(curl)) ## compile LAST, KaKs_Calculator2.0 and DAGchainer for the vignette vignette.paths <- CRBHits::make_vignette() @@ -238,9 +238,9 @@

1. Installation2. Conditional Reciprocal Best Hits - Algorithm

-
-

Figure: Overview of the cds2rbh() -function

+
+Figure: Overview of the cds2rbh() function
Figure: Overview of the cds2rbh() +function

The CRBH algorithm was introduced by Aubry S, Kelly S et al. (2014) and ported to python shmlast (Scott C. @@ -261,9 +261,9 @@

2. -

Figure: Individual steps of the -cds2rbh() function

+
+Figure: Individual steps of the cds2rbh() function
Figure: Individual steps of the +cds2rbh() function

2.1. 1. step: sequence similarity search @@ -798,9 +798,9 @@
2.3.1. CRBHit

3. Ka/Ks Calculations

-
-

Figure: Overview of the rbh2kaks() -function steps

+
+Figure: Overview of the rbh2kaks() function steps
Figure: Overview of the rbh2kaks() +function steps

The resulting CRBHs (see #crbhalgorithm) can be further processed to e.g. filtered for Tandem Duplicate (see CRBHits @@ -819,9 +819,9 @@

3. Ka/Ks CalculationsNCBI or ENSEMBL Input (see cds2genepos() function)) and supply selfblast results for both Input CDS.

-
-

Figure: Individual steps of the -rbh2kaks() function

+
+Figure: Individual steps of the rbh2kaks() function
Figure: Individual steps of the +rbh2kaks() function

3.1. Codon Alignments - cds2codonaln() diff --git a/articles/V02KaKsVignette.html b/articles/V02KaKsVignette.html index 7e17c2f..eff5c31 100644 --- a/articles/V02KaKsVignette.html +++ b/articles/V02KaKsVignette.html @@ -187,7 +187,7 @@

Table of ContentssuppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(gridExtra)) -suppressPackageStartupMessages(library(curl)) +suppressPackageStartupMessages(library(curl)) ## compile LAST, KaKs_Calculator2.0 and DAGchainer for the vignette vignette.paths <- make_vignette()

@@ -452,9 +452,9 @@

1.3. Calculate/Filte ## get help #?cds2rbh -
-

Figure: CRBHit pairs between A. thaliana and A. -lyrata

+
+Figure: CRBHit pairs between A. thaliana and A. lyrata
Figure: CRBHit pairs between A. thaliana and A. +lyrata
@@ -641,9 +641,9 @@

1.5 Assign Tandem plotCurve=TRUE, lastpath=vignette.paths[1]) attributes(ARATHA_selfblast_crbh)$selfblast -
-

Figure: selfblast CRBHit pairs between of A. -thaliana

+
+Figure: selfblast CRBHit pairs between of A. thaliana
Figure: selfblast CRBHit pairs between of A. +thaliana
## get selfblast CRBHit pairs for A. lyrata
 ARALYR_selfblast_crbh <- cds2rbh(
@@ -657,9 +657,9 @@ 

1.5 Assign Tandem plotCurve=TRUE, lastpath=vignette.paths[1]) attributes(ARALYR_selfblast_crbh)$selfblast

-
-

Figure: selfblast CRBHit pairs between of A. -lyrata

+
+Figure: selfblast CRBHit pairs between of A. lyrata
Figure: selfblast CRBHit pairs between of A. +lyrata
## get gene position for A. thaliana longest isoforms
 ARATHA.cds.longest.genepos <- cds2genepos(
@@ -706,8 +706,8 @@ 

1.5 Assign Tandem ggplot2::facet_wrap(~gene.chr) + ggplot2::scale_colour_manual(values= CRBHitsColors(length(table(tandem_group_size))))

-
-

Figure: Tandem duplicates A. thaliana

+
+Figure: Tandem duplicates A. thaliana
Figure: Tandem duplicates A. thaliana
@@ -758,9 +758,9 @@

1.6 Synteny with DAGchainer< ## plot DAGchainer results for each chromosome combination dim(ARATHA_ARALYR_crbh.dagchainer.bp) plot_dagchainer(ARATHA_ARALYR_crbh.dagchainer.bp) -
-

Figure: Dagchainer results of A. thaliana and A. -lyrata

+
+Figure: Dagchainer results of A. thaliana and A. lyrata
Figure: Dagchainer results of A. thaliana and A. +lyrata
## plot DAGchainer results selected chromosomes
 g <- plot_dagchainer(
@@ -773,31 +773,34 @@ 

1.6 Synteny with DAGchainer< "AA2:NW_003302551.1", "AA2:NW_003302550.1", "AA2:NW_003302549.1", "AA2:NW_003302548.1")) g

-
-

Figure: Dagchainer results selected chromosomes

+
+Figure: Dagchainer results selected chromosomes
Figure: Dagchainer results selected +chromosomes
## change title size
 g + ggplot2::theme(title=element_text(size=16))
-
-

Figure: Dagchainer results altered title size

+
+Figure: Dagchainer results altered title size
Figure: Dagchainer results altered title +size
## change axis title size (gene1.mid; gene2.mid)
 g + ggplot2::theme(axis.title.x=element_text(size=16),
     axis.title.y=element_text(size=16))
-
-

Figure: Dagchainer results altered axis title -size

+
+Figure: Dagchainer results altered axis title size
Figure: Dagchainer results altered axis title +size
## change grid title size
 g + theme(strip.text.x=element_text(size=16), strip.text.y=element_text(size=16))
-
-

Figure: Dagchainer results altered grid title -size

+
+Figure: Dagchainer results altered grid title size
Figure: Dagchainer results altered grid title +size
## change grid axis size and angle
 g + theme(axis.text.x=element_text(size=12, angle=90))
-
-

Figure: Dagchainer results altered grid angle

+
+Figure: Dagchainer results altered grid angle
Figure: Dagchainer results altered grid +angle
## get help
 #?plot_dagchainer
@@ -999,9 +1002,9 @@

3. Ka/Ks Filt ## plot Ka/Ks results as histogram colored by Ka/Ks values g <- plot_kaks(kaks=ath_aly_ncbi_kaks) -
-

Figure: Ka/Ks results as histogram colored by -Ka/ks

+
+Figure: Ka/Ks results as histogram colored by Ka/ks
Figure: Ka/Ks results as histogram colored by +Ka/ks
## plot Ka/Ks results as histogram filter for ka.min, ka.max, ks.min, ks.max
 g.min_max <- plot_kaks(
@@ -1010,9 +1013,9 @@ 

3. Ka/Ks Filt ka.max=1, ks.min=0, ks.max=1)

-
-

Figure: Ka/Ks results filtered for ka.min, ka.max, -ks.max

+
+Figure: Ka/Ks results filtered for ka.min, ka.max, ks.max
Figure: Ka/Ks results filtered for ka.min, +ka.max, ks.max
## select subset of chromosomes - needs gene position information
 head(ARATHA.cds.longest.genepos)
@@ -1026,8 +1029,9 @@ 

3. Ka/Ks Filt "NC_003071.7", "NW_003302551.1", "NW_003302554.1"))

-
-

Figure: Ka/Ks results filtered for chromosomes

+
+Figure: Ka/Ks results filtered for chromosomes
Figure: Ka/Ks results filtered for +chromosomes
## plot Ka/Ks results and split by chromosome - needs gene position information
 
@@ -1041,8 +1045,8 @@ 

3. Ka/Ks Filt "NW_003302551.1", "NW_003302554.1"), splitByChr=TRUE)

-
-

Figure: Ka/Ks results split by chromosomes

+
+Figure: Ka/Ks results split by chromosomes
Figure: Ka/Ks results split by chromosomes

Note: Default data.frame handling with the dplyr package is possible on the original @@ -1064,8 +1068,8 @@

3. Ka/Ks Filt ## filter for Ks values < 1 on plot object and plot g.split$g.kaks$data %>% dplyr::filter(ks<1) %>% ggplot2::ggplot() + ggplot2::geom_histogram(binwidth=0.1, aes(x=ks)) -
-

Figure: Ks results as histogram

+
+Figure: Ks results as histogram
Figure: Ks results as histogram
@@ -1120,9 +1124,9 @@

4. threads=2, plotCurve=TRUE, lastpath=vignette.paths[1]) -
-

Figure: CRBHit pairs of H. sapiens and P. -troglodytes

+
+Figure: CRBHit pairs of H. sapiens and P. troglodytes
Figure: CRBHit pairs of H. sapiens and P. +troglodytes
## get gene position for H. sapiens longest isoforms
 HOMSAP.cds.longest.genepos <- cds2genepos(
@@ -1150,9 +1154,9 @@ 

4. select.chr=c( "AA1:1","AA1:2","AA1:3","AA1:4","AA1:5","AA1:14", "AA2:1","AA2:3","AA2:4","AA2:5","AA2:14"))

-
-

Figure: Dagchainer results of H. sapiens and P. -troglodytes

+
+Figure: Dagchainer results of H. sapiens and P. troglodytes
Figure: Dagchainer results of H. sapiens and P. +troglodytes
## get selfblast CRBHit pairs for H. sapiens
 HOMSAP_selfblast_crbh <- cds2rbh(
@@ -1200,9 +1204,9 @@ 

4. select.chr=c( "AA1:1","AA1:2","AA1:3","AA1:4","AA1:5","AA1:14", "AA2:1","AA2:3","AA2:4","AA2:5","AA2:14"))

-
-

Figure: Dagchainer results of H. sapiens and P. -troglodytes

+
+Figure: Dagchainer results of H. sapiens and P. troglodytes
Figure: Dagchainer results of H. sapiens and P. +troglodytes

Note: The following example can take some time and is not calculated by the vignette building process.

@@ -1220,9 +1224,9 @@

4. ## plot Ka/Ks results as histogram colored by Ka/Ks values g <- plot_kaks(hom_pan_ensembl_kaks) -
-

Figure: Ka/Ks results of H. sapiens and P. -troglodytes

+
+Figure: Ka/Ks results of H. sapiens and P. troglodytes
Figure: Ka/Ks results of H. sapiens and P. +troglodytes
diff --git a/authors.html b/authors.html index 15ad030..55e2726 100644 --- a/authors.html +++ b/authors.html @@ -73,15 +73,16 @@

Citation

Ullrich K (2023). CRBHits: Conditional reciprocal best hits (CRBHits) in R. -https://gitlab.gwdg.de/mpievolbio-it/crbhits, -https://mpievolbio-it.pages.gwdg.de/crbhits/. +R package version 0.0.5, +https://mpievolbio-it.pages.gwdg.de/crbhits/, https://gitlab.gwdg.de/mpievolbio-it/crbhits.

@Manual{,
   title = {CRBHits: Conditional reciprocal best hits (CRBHits) in R},
   author = {Kristian K Ullrich},
   year = {2023},
-  note = {https://gitlab.gwdg.de/mpievolbio-it/crbhits,
+  note = {R package version 0.0.5, 
 https://mpievolbio-it.pages.gwdg.de/crbhits/},
+  url = {https://gitlab.gwdg.de/mpievolbio-it/crbhits},
 }
diff --git a/paper.html b/paper.html index 5a76cb1..7b14276 100644 --- a/paper.html +++ b/paper.html @@ -64,8 +64,8 @@

Summary

The CRBH algorithm was introduced by @aubry2014deep and builds upon the traditional RBH approach to find additional orthologous sequences between two sets of sequences. As described earlier [@aubry2014deep; @scott2017shmlast], CRBH uses the sequence search results to fit an expect value (E-value) cutoff given each RBH to subsequently add sequence pairs to the list of bona-fide orthologs given their alignment length.

Unfortunately, as mentioned by @scott2017shmlast, the original implementation of CRBH (crb-blast) lag improved blast-like search algorithm to speed up the analysis. As a consequence, @scott2017shmlast ported CRBH to python shmlast, while shmlast cannot deal with IUPAC nucleotide code so far.

CRBHits constitutes a new R package, which build upon previous implementations and ports CRBH into the R environment, which is popular among biologists. CRBHits improve CRBH by additional implemented filter steps [@rost1999twilight] and the possibility to apply custom filters prior E-value fitting. Further, the resulting CRBH pairs can be evaluated for the presence of tandem duplicated genes, gene order based syntenic groups and evolutionary rates.

-
-

Overview of the two main pipeline function and its subtasks. cds2rbh(): from CDS to CRBHit pairs; rbh2kaks(): from CRBHit pairs to Ka/Ks values.

+
+Overview of the two main pipeline function and its subtasks. cds2rbh(): from CDS to CRBHit pairs; rbh2kaks(): from CRBHit pairs to Ka/Ks values.
Overview of the two main pipeline function and its subtasks. cds2rbh(): from CDS to CRBHit pairs; rbh2kaks(): from CRBHit pairs to Ka/Ks values.
-
-

Accepted condition reciprocal best hits based on RBH fitting.

+
+Accepted condition reciprocal best hits based on RBH fitting.
Accepted condition reciprocal best hits based on RBH fitting.

The obtained CRBHit pairs can also be used to calculate synonymous (Ks) and nonsynonymous (Ka) substitutions per hit pair using either the model from @li1993unbiased or from @yang2000estimating.

-
-

Selfblast CRBHit pair results for Arabidopsis thaliana. (A) DAGchainer dotplot per chromosome colored by syntenic group and (B) colored by Ks. (C) Histogram of Ks values colored by syntenic group.

+
+Selfblast CRBHit pair results for Arabidopsis thaliana. (A) DAGchainer dotplot per chromosome colored by syntenic group and (B) colored by Ks. (C) Histogram of Ks values colored by syntenic group.
Selfblast CRBHit pair results for Arabidopsis thaliana. (A) DAGchainer dotplot per chromosome colored by syntenic group and (B) colored by Ks. (C) Histogram of Ks values colored by syntenic group.
@@ -185,8 +185,8 @@

Conclusions

Availability

CRBHits is an open source software made available under the MIT license. It can be installed from its gitlab repository using the devtools package.

-
devtools::install_gitlab("mpievolbio-it/crbhits", 
- host = "https://gitlab.gwdg.de")", build_vignettes = TRUE)
+
devtools::install_gitlab("mpievolbio-it/crbhits", 
+ host = "https://gitlab.gwdg.de")", build_vignettes = TRUE)

The R package website, which contain a detailed HOWTO to install the prerequisites (mentioned above) and package vignettes are availbale at https://mpievolbio-it.pages.gwdg.de/crbhits.

diff --git a/pkgdown.yml b/pkgdown.yml index 765ee75..6d8e231 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -1,8 +1,8 @@ -pandoc: 2.19.2 +pandoc: 3.1.1 pkgdown: 2.0.7 pkgdown_sha: ~ articles: V01CRBHitsBasicVignette: V01CRBHitsBasicVignette.html V02KaKsVignette: V02KaKsVignette.html -last_built: 2023-12-22T08:14Z +last_built: 2023-12-22T08:21Z diff --git a/reference/rbh2dagchainer.html b/reference/rbh2dagchainer.html index 961a24d..64f6019 100644 --- a/reference/rbh2dagchainer.html +++ b/reference/rbh2dagchainer.html @@ -299,7 +299,7 @@

Examples

rbhpairs=ath_selfhits_crbh, gene.position.cds1=ath.genepos, gene.position.cds2=ath.genepos) -#> [1] "/__w/_temp/Library/CRBHits/extdata/dagchainer/run_DAG_chainer.pl -i /tmp/RtmpGsadR7/file1add171bf94d -o 0 -e -3 -g 10000 -M 50 -D 200000 -E 0.001 -A 5 -I -s" +#> [1] "/__w/_temp/Library/CRBHits/extdata/dagchainer/run_DAG_chainer.pl -i /tmp/Rtmp0CdOH1/file1f2543d5e20b -o 0 -e -3 -g 10000 -M 50 -D 200000 -E 0.001 -A 5 -I -s" head(ath_selfblast_crbh.dagchainer) #> gene1.chr gene1.seq.id gene1.start gene1.end gene1.mid gene1.idx gene2.chr #> 1 AA1:1 AT1G05100.1 1469541 1470881 1470211 437 AA1:1

Performance comparison for CRBHit pair (Schizosaccharomyces pombe vs. Nematostella vectensis) and Ka/Ks calculations (Intel Xeon CPU E5-2620 v3 @ 2.40GHz; 3411 hit pairs; 2 x Threads).
Number of Threads