Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks.
Briefings in Bioinformatics, 2022, 23(4), 1–11
https://doi.org/10.1093/bib/bbac219
Advance access publication date: 10 June 2022
Problem Solving Protocol
Discovering gene regulatory networks of multiple
phenotypic groups using dynamic Bayesian networks
Polina Suter, Jack Kuipers and Niko Beerenwinkel
Corresponding author: Niko Beerenwinkel, ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26,4058 Basel, Switzerland.
Tel: +41 61 387 31 69; E-mail:
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression
data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable
to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for
multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different
structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show
in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We
applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database,
each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize
responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The
classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal
cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced
in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and
normal cells may be used for the discovery of targeted therapies.
Keywords: dynamic Bayesian networks, time series, gene expression, Bayesian learning, classification, MCMC
Introduction
Learning gene regulatory networks (GRNs) from gene
expression data has been the focus of much research in
the last decades [7, 10, 38, 62]. The precise knowledge of
GRNs can help to understand the molecular mechanisms
driving diseases and facilitate the search for targeted
therapies [3, 32]. Multiple computational methods can be
used to learn GRNs from observational data, including
correlation analysis [20, 29, 34], Boolean networks [31,
36], Bayesian networks [5, 12, 59], differential equation
models [60, 61] and machine learning approaches [19]. A
recent benchmarking study [63] revealed no clear winner
among different methods for GRN reconstruction, with
different methods demonstrating advantages in different
settings.
A Bayesian network is a probabilistic graphical model
representing dependencies between random variables
via a directed acyclic graph (DAG). Due to its probabilistic nature, this model is well suited to describe noisy
biological data. However, static Bayesian networks do
not allow directed cycles, rendering it impossible for
them to model feedback loops. The Dynamic Bayesian
Network (DBN) model overcomes this problem by including dependencies between nodes at different time points
and accommodating the possibility of cycles [28, 39, 50].
DBN models were used to learn biological networks
[35], including GRNs [2, 6, 15, 30, 59, 64] and multiomics networks [48]. Learning DBN structures from data
is computationally challenging because the number of
possible network topologies grows exponentially with
the number of nodes. Some methods solve this issue by
employing a greedy search [48, 59], others restrict the
network topology by prohibiting instantaneous dependencies between genes or limiting the number of possible
incoming edges per each node [18, 35, 57]. However, topological restrictions may potentially result in the discovery
of suboptimal models [41].
Another limitation of most network learning methods
lies in the assumption that all samples in the dataset
represent the same GRN, however this assumption may
be violated. For example, it has been shown experimentally that protein–protein interactions differ drastically
Polina Suter She is a biotech investment manager at Magnetic Capital. Her research interests include context-specific network learning and multi-omics data
integration. This work was primarily conducted while the author was a PhD candidate at ETH Zurich.
Jack Kuipers He is a senior scientist at ETH Zurich. His research is focused on cancer evolution modelling, phylogenetic tree inference, probabilistic graphical
models and single-cell sequencing analysis.
Niko Beerenwinkel He is a professor at ETH Zurich. His research is at the interface of mathematics, statistics, and computer science with biology and medicine.
Received: December 16, 2021. Revised: April 29, 2022. Accepted: May 10, 2022
© The Author(s) 2022. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/
by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial
re-use, please contact
2
|
Suter et al.
between tumor and normal cell lines [23]. Hence the
discovery of context-specific GRNs can facilitate the discovery of targeted therapies [56]. Only limited research
was devoted to learning DBNs from distinct but related
contexts [24, 42, 43]. However, none of the methods was
applied to networks with more than 40 nodes, and all
suggested approaches utilized limited DBN topologies
that assume no instantaneous dependencies between
genes.
The goal of this study was to create a scalable framework for learning DBN models that provide high predictive accuracy and can be used for learning GRNs
for multiple subgroups of samples, defined, for example, by molecular, histological or clinical phenotypes. We
employed a Bayesian approach [26] for learning DBNs
that is scalable to networks with hundreds of nodes and
implemented in the R-package BiDAG [52]. BiDAG was
previously used for context-specific learning of static
gene networks [27, 51]. This package allows selecting
from a wide range of network topologies, including prior
information from public gene interaction databases and
modeling gene interactions whose strength changes over
time. In addition, the Bayesian approach to structure
learning implemented in the package is well equipped
to prevent overfitting, a known problem occurring in the
analysis of high-dimensional biological data.
Apart from BiDAG, we found five R-packages for learning DBNs, namely G1DBN [28], dbnlearn [8], dbnR [44],
ebdbNet [45] and bnstruct [9] (...truncated)