Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems
Gábor et al. BMC Systems Biology (2017) 11:54
DOI 10.1186/s12918-017-0428-y
METHODOLOGY ARTICLE
Open Access
Parameter identifiability analysis and
visualization in large-scale kinetic models of
biosystems
Attila Gábor1,2† , Alejandro F. Villaverde1† and Julio R. Banga1*
Abstract
Background: Kinetic models of biochemical systems usually consist of ordinary differential equations that have
many unknown parameters. Some of these parameters are often practically unidentifiable, that is, their values cannot
be uniquely determined from the available data. Possible causes are lack of influence on the measured outputs,
interdependence among parameters, and poor data quality. Uncorrelated parameters can be seen as the key tuning
knobs of a predictive model. Therefore, before attempting to perform parameter estimation (model calibration) it is
important to characterize the subset(s) of identifiable parameters and their interplay. Once this is achieved, it is still
necessary to perform parameter estimation, which poses additional challenges.
Methods: We present a methodology that (i) detects high-order relationships among parameters, and (ii) visualizes
the results to facilitate further analysis. We use a collinearity index to quantify the correlation between parameters in a
group in a computationally efficient way. Then we apply integer optimization to find the largest groups of
uncorrelated parameters. We also use the collinearity index to identify small groups of highly correlated parameters.
The results files can be visualized using Cytoscape, showing the identifiable and non-identifiable groups of
parameters together with the model structure in the same graph.
Results: Our contributions alleviate the difficulties that appear at different stages of the identifiability analysis and
parameter estimation process. We show how to combine global optimization and regularization techniques for
calibrating medium and large scale biological models with moderate computation times. Then we evaluate the
practical identifiability of the estimated parameters using the proposed methodology. The identifiability analysis
techniques are implemented as a MATLAB toolbox called VisId, which is freely available as open source from GitHub
(https://github.com/gabora/visid).
Conclusions: Our approach is geared towards scalability. It enables the practical identifiability analysis of dynamic
models of large size, and accelerates their calibration. The visualization tool allows modellers to detect parts that are
problematic and need refinement or reformulation, and provides experimentalists with information that can be
helpful in the design of new experiments.
Keywords: Parameter estimation, Dynamic models, Identifiability, Global optimization, Regularization, Overfitting
*Correspondence:
† Equal contributors
BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, 36208 Vigo, Spain
Full list of author information is available at the end of the article
1
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Gábor et al. BMC Systems Biology (2017) 11:54
Background
The development of mechanistic (kinetic) models in
order to quantitatively describe the dynamics of biological phenomena is one of the core research themes in
systems biology. During the last decade, fostered by the
greater availability of the necessary experimental data,
the development of large (up to genome-scale) kinetic
models has become one of the main objectives in the
field, as well as in related areas such as synthetic biology, metabolic engineering, or industrial biotechnology
[1–10]. More recently, the first steps towards comprehensive whole-cell models have been taken [11], which
has great potential for applications e.g. in personalized
medicine [12]. However, the development of these largescale integrated dynamic models poses severe challenges
[13, 14]. Those associated with model building are common to the more general problem of reverse engineering
of biological systems [15]. In this context, parameter estimation (i.e. model calibration) is arguably one of the
most studied [16–19], yet more challenging step in model
building.
Parameter estimation in nonlinear dynamic models can
be an extremely hard problem mostly due to the following
issues [15]: lack of identifiability, ill-conditioning, multimodality and over-fitting. The latter three can be handled
via global optimization and regularization methods, as
reviewed and illustrated recently [20]. The present paper
begins by continuing the line of work in [20], addressing
these three issues. To this end we introduce a combination of a global optimization metaheuristic, eSS [21], and
an efficient local search method, the adaptive algorithm
NL2SOL [22]. By using this optimization technique jointly
with regularization it is possible to reduce the calibration
times of large dynamic models and simultaneously avoid
over-fitting. We show this for models from the recently
presented BioPreDyn benchmark collection [23]. Then we
focus on the remaining issue, that is, identifiability analysis of large dynamic models. Our aim is to develop a
methodology which (i) is able to characterize high-order
relationships among parameters, and (ii) scales up well
with model size. Thus, our objective goes beyond finding the subset of identifiable parameters: we also aim to
systematically characterize the space of non-identifiable
parameters, and to facilitate the advanced analysis of the
results with scalable visualization tools.
Identifiability analysis aims at establishing whether it
is possible to determine the values of the unknown
model parameters [24]. It is common to distinguish
between structural and practical identifiability. Structural or a priori identifiability analysis decides whether
the model parameters are uniquely determinable based
on the model formulation, which includes the dynamic
equations, observation functions and stimuli [25]. A
parameter θ of the model is structurally identifiable if
Page 2 of 16
y(θ) = y(θ ) ⇔ θ = θ , where y denotes the model
predictions, which are observable in the experiments. A
parameter θ is structurally locally identifiable if for almost
any value θ ∗ there is a neighbourhood V (θ ∗ ) in which
the above relationship holds. It is globally identifiable if
the relationship holds in all the range of values of the
parameter. If there is some region with non-zero measure where the relationship does no (...truncated)