Abstract
The recently completed second phase of the Human Microbiome Project has highlighted the relationship between dynamic changes in the microbiome and disease, motivating new microbiome study designs based on longitudinal sampling. Yet, analysis of such data is hindered by presence of technical noise, high dimensionality, and data sparsity. To address these challenges, we propose LUMINATE (LongitUdinal Microbiome INference And zero deTEction), a fast and accurate method for inferring relative abundances from noisy read count data. We demonstrate on synthetic data that LUMINATE is orders of magnitude faster than current approaches, with better or similar accuracy. This translates to feasibility of analyzing data at the requisite dimensionality for current studies. We further show that LUMINATE can accurately distinguish biological zeros, when a taxon is absent from the community, from technical zeros, when a taxon is below the detection threshold. We conclude by demonstrating the utility of LUMINATE for downstream analysis by using estimates of latent relative abundances to fit the parameters of a dynamical system, leading to more accurate predictions of community dynamics.
Code availability https://github.com/tyjo/luminate