fastEmu
is an R
package for estimating changes in the abundance of microbial categories (taxa, genes, etc.) associated with covariates using high throughput sequencing data.
fastEmu
is a wrapper for the package radEmu
that implements a fast version of the hypothesis testing procedure implemented in radEmu
.
Check out radEmu
for a full list of reasons to use this method for your differential abundance analysis. Some highlights include:
radEmu
uses high throughput sequencing data to estimate changes in the "absolute abundance" of categoriesradEmu
doesn't require data transformations, a reference taxon, or pseudocountsradEmu
is robust to differential sampling depths and differential detection of categories (taxa, genes, etc.)radEmu
has great error control, including in small samples and data generated from different distributions
However, hypothesis testing in radEmu
can be slow, especially when there is a large number of categories. This is why we created fastEmu
! fastEmu
uses a simplified model to test the same parameters* tested in radEmu
, but much faster.
*We don't estimate exactly the same parameters as in radEmu
. In radEmu
, we estimate
the log fold difference in abundance associated with the covariate for each category relative to the typical log fold difference in abundance associated with the covariate across all categories. This lets us pick out the categories that are the most differentially abundant compared to the overall set of categories. In fastEmu
, we cannot compare to the typical log fold difference all categories, because we need to reduce the size of our model in order to run it quickly. Instead, we compare to the typical log fold difference in a subset of categories. The smaller the subset, the faster the test will run.
We suggest choosing a meaningful subset if there is one in your analysis. For example, when we run differential abundance analyses for Kegg Orthologies (KOs) that represent genes, we compare to the typical log fold difference in KOs that encode ribosomal proteins. We do this because we expect these KOs to be changing very little across any covariate. If you have something similar in your analysis you can choose that, otherwise you can choose an arbitrary subset and approximate the parameters from radEmu
pretty well!
To download fastEmu
, use the code below.
# install.packages("devtools")
devtools::install_github("statdivlab/fastEmu")
library(radEmu)
The vignettes demonstrate example usage of the main functions. Please file an issue if you have a request for a tutorial that is not currently included.
The following code will run radEmu
to estimate log fold differences in abundance for
all genes in fastEmu
to quickly run a robust score test for the hypothesis that the
emu_test <- fastEmuTest(constraint_cats = ribosomal_subset,
formula = ~ treatment,
data = sample_data,
Y = count_data,
test_kj = data.frame(k = 2, j = 5))
We additionally have a pkgdown
website that contains pre-built versions of our function documentation and our vignettes (coming soon).
If you use fastEmu
for your analysis, check back in soon for a preprint!
If you encounter a bug or would like make a change request, please file it as an issue here.
If you're a developer, we would love to review your pull requests.