Data simulation methods

We simulated data from a hypothetical system where maternal individuals (number of mothers $n_m = 20$) each have a microbiome, defined as a multivariate normally distributed vector of taxon abundances (number of taxa $n_t = 300$). Each maternal individual has a number of offspring ($n_o = 20$). Each offspring's microbiome taxon abundances are the mean of its mother's microbiome abundances plus multivariate normally distributed noise. Each offspring's trait value is a linear combination of its microbiome abundances with normally distributed noise added. Specifically, the first six taxa were simulated to have a positive effect on the trait mean and the remaining 194 taxa were simulated to have zero effect: the simulated effects were $[50, 20, 10, 5, 2, 1, 0, 0, 0, ...]$.

We fit the statistical model described below to the full dataset in which each offspring microbiome was paired with its corresponding trait value (complete data). We also fit a model to a dataset where half of the offspring from each mother were missing trait data, and the other half were missing microbiome data (missing data).

Model fitting methods

The statistical model is a Bayesian multivariate linear mixed model with uncorrelated random intercepts for each maternal individual on each microbiome taxon and on the trait response. The trait response also has fixed effects for each of the microbiome taxa, but interactions between microbiome taxa are not modeled. Residual correlation between taxa is not modeled. The standard deviations of the random intercepts were assigned Gamma(1, 1) prior distributions, and the standard deviation of the residuals of each taxon abundance and the trait response were also assigned Gamma(1, 1) prior distributions. The fixed effects (effects of each microbiome abundance on the trait response) were assigned a regularized horseshoe prior with 1 df for the Student t-distribution of the local shrinkage parameter, 1 df for the Student t-distribution of the global shrinkage parameter, 50 for the scale parameter of the slab (indicating weak regularization), 10 df for the Student t-distribution of the slab, and 6/294 ratio of the expected number of non-zero coefficients to expected number of zero coefficients, with the prior scaled using the residual standard deviation.

The joint posterior distribution of the parameters was sampled using Hamiltonian Monte Carlo, with four Markov chains each run for 5000 discarded warmup iterations and 2500 sampling iterations, for a total of 10000 posterior samples. All parameter values were initialized at zero for all chains. The adaptation delta parameter was set to 0.95 to minimize the number of divergent transitions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simulation_model_fitting_methods.md

simulation_model_fitting_methods.md

Data simulation methods

Model fitting methods

Files

simulation_model_fitting_methods.md

Latest commit

History

simulation_model_fitting_methods.md

File metadata and controls

Data simulation methods

Model fitting methods