Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy
Penas et al. BMC Bioinformatics (2017) 18:52
DOI 10.1186/s12859-016-1452-4
METHODOLOGY ARTICLE
Open Access
Parameter estimation in large-scale
systems biology models: a parallel and
self-adaptive cooperative strategy
David R. Penas1 , Patricia González2 , Jose A. Egea3 , Ramón Doallo2 and Julio R. Banga1*
Abstract
Background: The development of large-scale kinetic models is one of the current key issues in computational
systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic
models. Global optimization methods can be used to solve this type of problems but the associated computational
cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters,
requiring a number of initial exploratory runs and therefore further increasing the computation times.
Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate
the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and
incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and
fine-grained parallelism, and (iii) self-tuning strategies.
Results: The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter
estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S.
cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network.
The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of
computation times with respect to several previous state of the art methods (from days to minutes, in several cases)
even when only a small number of processors is used.
Conclusions: The new parallel cooperative method presented here allows the solution of medium and large scale
parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the
method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method
can play a key role in the development of large-scale and even whole-cell dynamic models.
Keywords: Dynamic models, Parameter estimation, Global optimization, Metaheuristics, Parallelization
Background
Computational simulation and optimization are key topics in systems biology and bioinformatics, playing a central
role in mathematical approaches considering the reverse
engineering of biological systems [1–9] and the handling
of uncertainty in that context [10–14]. Due to the significant computational cost associated with the simulation,
calibration and analysis of models of realistic size, several
authors have considered different parallelization strategies in order to accelerate those tasks [15–18].
*Correspondence:
BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, 36208 Vigo, Spain
Full list of author information is available at the end of the article
1
Recent efforts have been focused on scaling-up the
development of dynamic (kinetic) models [19–25], with
the ultimate goal of obtaining whole-cell models [26, 27].
In this context, the problem of parameter estimation in
dynamic models (also known as model calibration) has
received great attention [28–30], particularly regarding
the use of global optimization metaheuristics and hybrid
methods [31–35]. It should be noted that the use of multistart local methods (i.e. repeated local searches started
from different initial guesses inside a bounded domain)
also enjoys great popularity, but it has been shown to
be rather inefficient, even when exploiting high-quality
gradient information [35]. Parallel global optimization
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Penas et al. BMC Bioinformatics (2017) 18:52
strategies have been considered in several system biology
studies, including parallel variants of simulated annealing
[36], evolution strategies [37–40], particle swarm optimization [41, 42] and differential evolution [43].
Scatter search is a promising metaheuristic that in
sequential implementations has been shown to outperform other state of the art stochastic global optimization
methods [35, 44–50]. Recently, a prototype of cooperative scatter search implementation using multiple processors was presented [51], showing good performance
for the calibration of several large-scale models. However, this prototype used a simple synchronous strategy
and small number of processors (due to inefficient communications). Thus, although it could reduce the computation times of sequential scatter search, it still required
very significant efforts when dealing with large-scale
applications.
Here we significantly extend and improve this method
by proposing a new parallel cooperative scheme, named
self-adaptive cooperative enhanced scatter search
(saCeSS) that incorporates the following novel strategies:
• the combination of a coarse-grained
distributed-memory parallelization paradigm and an
underlying fine-grained parallelization of the
individual tasks with a shared-memory model, in
order to improve the scalability.
• an improved cooperation scheme, including an
information exchange mechanism driven by the
quality of the solutions, an asynchronous
communication protocol to handle inter-process
information exchange, and a self-adaptive procedure
to dynamically tune the settings of the parallel
searches.
We present below a detailed description of saCeSS,
including the details of a high-performance implementation based on a hybrid message passing interface (MPI)
and open multi-processing (OpenMP) combination. The
excellent performance and scalability of this novel method
are illustrated considering a set of very challenging parameter estimation problems in large-scale dynamic models
of biological systems. These problems consider kinetic
models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster
Ovary cells and a generic signal transduction network.
The results consistently show that saCeSS is a robust and
efficient method, allowing a very significant reduction
of computation times with respect to previous methods (from days to minutes, in several cases) even when
only a small number of processors is used. Therefore, we
believe tha (...truncated)