This repo contains code and data used in the paper "doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications".
Reproducible reports for all the analyses we performed are available as a Quarto book at https://almeidasilvaf.github.io/doubletrouble_paper/.
Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, which we made available through a user-friendly web interface (available at https://almeidasilvaf.github.io/doubletroubledb). doubletrouble is freely accessible from Bioconductor (https://bioconductor.org/packages/doubletrouble), and provides a valuable resource to study the evolutionary consequences of gene and genome duplications.
Keywords: molecular evolution, comparative genomics, paralogous genes, polyploidy.