Dynamic interaction network inference from longitudinal microbiome data
(2019) 7:54
Lugo-Martinez et al. Microbiome
https://doi.org/10.1186/s40168-019-0660-3
M ET HO DO LO GY
Open Access
Dynamic interaction network inference
from longitudinal microbiome data
Jose Lugo-Martinez1†
, Daniel Ruiz-Perez2† , Giri Narasimhan2,3* and Ziv Bar-Joseph1*
Abstract
Background: Several studies have focused on the microbiota living in environmental niches including human
body sites. In many of these studies, researchers collect longitudinal data with the goal of understanding not only just
the composition of the microbiome but also the interactions between the different taxa. However, analysis of such
data is challenging and very few methods have been developed to reconstruct dynamic models from time series
microbiome data.
Results: Here, we present a computational pipeline that enables the integration of data across individuals for the
reconstruction of such models. Our pipeline starts by aligning the data collected for all individuals. The aligned
profiles are then used to learn a dynamic Bayesian network which represents causal relationships between taxa and
clinical variables. Testing our methods on three longitudinal microbiome data sets we show that our pipeline improve
upon prior methods developed for this task. We also discuss the biological insights provided by the models which
include several known and novel interactions. The extended CGBayesNets package is freely available under the MIT
Open Source license agreement. The source code and documentation can be downloaded from https://github.com/
jlugomar/longitudinal_microbiome_analysis_public.
Conclusions: We propose a computational pipeline for analyzing longitudinal microbiome data. Our results provide
evidence that microbiome alignments coupled with dynamic Bayesian networks improve predictive performance
over previous methods and enhance our ability to infer biological relationships within the microbiome and between
taxa and clinical factors.
Keywords: Dynamic interaction network inference, Longitudinal microbiome analysis, Microbial composition
prediction, Dynamic Bayesian networks, Temporal alignment
Background
Multiple efforts have attempted to study the microbiota
living in environmental niches including human body
sites. These microbial communities can play beneficial as
well as harmful roles in their hosts and environments.
For instance, microbes living in the human gut perform
numerous vital functions for homeostasis ranging from
harvesting essential nutrients to regulating and maintaining the immune system. Alternatively, a compositional
imbalance known as dysbiosis can lead to a wide range
*Correspondence: ;;
† Jose Lugo-Martinez and Daniel Ruiz-Perez contributed equally to this work.
1
Computational Biology Department, School of Computer Science, Carnegie
Mellon University, 5000 Forbes Avenue, Pittsburgh 15213, Pennsylvania, USA
2
Bioinformatics Research Group (BioRG), Florida International University, 11200
SW 8th Street, Miami 33199, Florida, USA
Full list of author information is available at the end of the article
of human diseases [1], and is linked to environmental
problems such as harmful algal blooms [2].
While many studies profile several different types of
microbial taxa, it is not easy in most cases to uncover the
complex interactions within the microbiome and between
taxa and clinical factors (e.g., gender, age, ethnicity).
Microbiomes are inherently dynamic, thus, in order to
fully reconstruct these interactions, we need to obtain
and analyze longitudinal data [3]. Examples include characterizing temporal variation of the gut microbial communities from pre-term infants during the first weeks of
life, and understanding responses of the vaginal microbiota to biological events such as menses. Even when such
longitudinal data is collected, the ability to extract an
accurate set of interactions from the data is still a major
challenge.
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Lugo-Martinez et al. Microbiome
(2019) 7:54
To address this challenge, we need computational
time-series tools that can handle data sets that may
exhibit missing or noisy data and non-uniform sampling.
Furthermore, a critical issue which naturally arises when
dealing with longitudinal biological data is that of temporal rate variations. Given longitudinal samples from
different individuals (for example, gut microbiome), we
cannot expect that the rates in which interactions take
place is exactly the same between these individuals. Issues
including age, gender, external exposure, etc. may lead to
faster or slower rates of change between individuals. Thus,
to analyze longitudinal data across individuals, we need to
first align the microbial data. Using the aligned profiles,
we can next employ other methods to construct a model
for the process being studied.
Most current approaches for analyzing longitudinal
microbiome data focus on changes in outcomes over time
[4, 5]. The main drawback of this approach is that individual microbiome entities are treated as independent
outcomes, hence, potential relationships between these
entities are ignored. An alternative approach involves
the use of dynamical systems such as the generalized Lotka-Volterra (gLV) models [6–10]. While gLV
and other dynamical systems can help in studying the
stability of temporal bacterial communities, they are
not well-suited for temporally sparse and non-uniform
high-dimensional microbiome time series data (e.g., limited frequency and number of samples), as well as
noisy data [3, 10]. Additionally, most of these methods eliminate any taxa whose relative abundance profile
exhibits a zero entry (i.e., not present in a measurable amount at one or more of the measured time
points. Finally, probabilistic graphical models (e.g., hidden
Markov models, Kalman filters, and dynamic Bayesian
networks) are machine learning tools which can effectively model dynamic processes, as well as discover causal
interactions [11].
In this work, we first adapt statistical spline estimation
and dynamic time-warping techniques for aligning timeseries microbial data so that they can be integrated across
individuals. We use the aligned data to learn a Dynamic
Bayesian Network (DBN), where nodes represent microbial taxa, clinical conditions, or demographic factors and
edges represent causal relationships between these entities. We evaluate our model by using multiple data sets
comprised of the mi (...truncated)