BEEM is an approach to infer models for microbial community dynamics based on metagenomic sequencing data (16S or shotgun-metagenomics). It is based on the commonly used generalized Lotka-Volterra modelling (gLVM) framework. BEEM uses an iterative EM algorithm to simultaneously infer scaling factors (microbial biomass) and model parameters (microbial growth rate and interaction terms) from longitudinal data and can thus work directly with the relative abundance values that are obtained with metagenomic sequencing.
Note: BEEM stands for Biomass Estimation and model inference with an Expectation Maximization algorithm. We have now extended the BEEM framework to be able to work with cross-sectional data (BEEM-static, check out our R package here).
BEEM was written in R (>=3.3.1) and requires the following packages:
- foreach
- doMC: this currently only works on MacOS or LinuxOS
- lokern
- pspline
- monomvn
You can install BEEM as an R package using devtools
devtools::install_github('csb5/beem')
The input files for BEEM should have the same format as described in the manual for MDSINE. The following two files are required by BEEM:
This should be a tab-delimited text file whose first row has the sample IDs and the first column has the OTU IDs (or taxonomic annotations). Each row should then contain the relative abundance of one OTU across all samples and each column should contain the relative abundances of all OTUs in that sample.
The metadata file should be a tab-delimited text file with the following columns:
sampleID isIncluded subjectID measurementID
sampleID
: sample IDs matching the first row of the OTU tableisIncluded
: whether the sample should be included in the analysis (1-include, 0-exclude)subjectID
: indicator for which biological replicate the sample belongs tomeasurementID
: time in standardized units from the start of the experiment
We have provided several sample input files that were also analyzed in our manuscript.
Data from Props et. al. (2016)
- OTU count table:
vignettes/props_et_al_analysis/counts.sel.txt
- Metadata:
vignettes/props_et_al_analysis/metadata.sel.txt
Data from Gibbons et. al. (2017)
- OTU count table:
vignettes/gibbons_et_al_analysis/{DA,DB,M3,F4}.counts.txt
- Metadata:
vignettes/gibbons_et_al_analysis/{DA,DB,M3,F4}.metadata.txt
## Load functions
library(beem)
## Read inputs
counts <- read.table('counts.txt', head=F, row.names=1)
metadata <- read.table('metadata.txt', head=T)
## Run BEEM
res <- EM(dat=input, meta=metadata)
## Estimate parameters
biomass <- biomassFromEM(res)
write.table(biomass, 'biomass.txt', col.names=F, row.names=F, quote=F)
gLVparameters <- paramFromEM(res, counts, metadata)
write.table(gLVparameters, 'gLVparameters.txt', col.names=T, row.names=F, sep='\t' , quote=F)
BEEM estimated parameters is an R data.frame
(a table) with the following columns in order:
parameter_type
:growth_rate
orinteraction
source_taxon
: source taxon for interaction (NA
ifparameter_type
isgrowth_rate
)target_taxon
: target taxon for interaction or growth ratevalue
: parameter valuesignificance
: confidence level of the inferred interaction (only meaningful for interactions)
The commands for reproducing the analysis reportd in the manuscript are presented as jupyter notebooks: (1) notebook on a demo of the gLVM simulation, (2) notebook for Props et. al. and (3) notebook for Gibbons et. al..
C Li, K R Chng, J S Kwah, T V Av-Shalom, L Tucker-Kellogg & N Nagarajan. (2019). An expectation-maximization algorithm enables accurate ecological modeling using longitudinal metagenome sequencing data. Microbiome.
Please direct any questions or feedback to Chenhao Li (cli40@mgh.harvard.edu) and Niranjan Nagarajan (nagarajann@gis.a-star.edu.sg).