Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcsystbiol.biomedcentral.com/track/pdf/10.1186/s12918-017-0428-y

Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems

Gábor et al. BMC Systems Biology (2017) 11:54 DOI 10.1186/s12918-017-0428-y METHODOLOGY ARTICLE Open Access Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems Attila Gábor1,2† , Alejandro F. Villaverde1† and Julio R. Banga1* Abstract Background: Kinetic models of biochemical systems usually consist of ordinary differential equations that have many unknown parameters. Some of these parameters are often practically unidentifiable, that is, their values cannot be uniquely determined from the available data. Possible causes are lack of influence on the measured outputs, interdependence among parameters, and poor data quality. Uncorrelated parameters can be seen as the key tuning knobs of a predictive model. Therefore, before attempting to perform parameter estimation (model calibration) it is important to characterize the subset(s) of identifiable parameters and their interplay. Once this is achieved, it is still necessary to perform parameter estimation, which poses additional challenges. Methods: We present a methodology that (i) detects high-order relationships among parameters, and (ii) visualizes the results to facilitate further analysis. We use a collinearity index to quantify the correlation between parameters in a group in a computationally efficient way. Then we apply integer optimization to find the largest groups of uncorrelated parameters. We also use the collinearity index to identify small groups of highly correlated parameters. The results files can be visualized using Cytoscape, showing the identifiable and non-identifiable groups of parameters together with the model structure in the same graph. Results: Our contributions alleviate the difficulties that appear at different stages of the identifiability analysis and parameter estimation process. We show how to combine global optimization and regularization techniques for calibrating medium and large scale biological models with moderate computation times. Then we evaluate the practical identifiability of the estimated parameters using the proposed methodology. The identifiability analysis techniques are implemented as a MATLAB toolbox called VisId, which is freely available as open source from GitHub (https://github.com/gabora/visid). Conclusions: Our approach is geared towards scalability. It enables the practical identifiability analysis of dynamic models of large size, and accelerates their calibration. The visualization tool allows modellers to detect parts that are problematic and need refinement or reformulation, and provides experimentalists with information that can be helpful in the design of new experiments. Keywords: Parameter estimation, Dynamic models, Identifiability, Global optimization, Regularization, Overfitting *Correspondence: † Equal contributors BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, 36208 Vigo, Spain Full list of author information is available at the end of the article 1 © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Gábor et al. BMC Systems Biology (2017) 11:54 Background The development of mechanistic (kinetic) models in order to quantitatively describe the dynamics of biological phenomena is one of the core research themes in systems biology. During the last decade, fostered by the greater availability of the necessary experimental data, the development of large (up to genome-scale) kinetic models has become one of the main objectives in the field, as well as in related areas such as synthetic biology, metabolic engineering, or industrial biotechnology [1–10]. More recently, the first steps towards comprehensive whole-cell models have been taken [11], which has great potential for applications e.g. in personalized medicine [12]. However, the development of these largescale integrated dynamic models poses severe challenges [13, 14]. Those associated with model building are common to the more general problem of reverse engineering of biological systems [15]. In this context, parameter estimation (i.e. model calibration) is arguably one of the most studied [16–19], yet more challenging step in model building. Parameter estimation in nonlinear dynamic models can be an extremely hard problem mostly due to the following issues [15]: lack of identifiability, ill-conditioning, multimodality and over-fitting. The latter three can be handled via global optimization and regularization methods, as reviewed and illustrated recently [20]. The present paper begins by continuing the line of work in [20], addressing these three issues. To this end we introduce a combination of a global optimization metaheuristic, eSS [21], and an efficient local search method, the adaptive algorithm NL2SOL [22]. By using this optimization technique jointly with regularization it is possible to reduce the calibration times of large dynamic models and simultaneously avoid over-fitting. We show this for models from the recently presented BioPreDyn benchmark collection [23]. Then we focus on the remaining issue, that is, identifiability analysis of large dynamic models. Our aim is to develop a methodology which (i) is able to characterize high-order relationships among parameters, and (ii) scales up well with model size. Thus, our objective goes beyond finding the subset of identifiable parameters: we also aim to systematically characterize the space of non-identifiable parameters, and to facilitate the advanced analysis of the results with scalable visualization tools. Identifiability analysis aims at establishing whether it is possible to determine the values of the unknown model parameters [24]. It is common to distinguish between structural and practical identifiability. Structural or a priori identifiability analysis decides whether the model parameters are uniquely determinable based on the model formulation, which includes the dynamic equations, observation functions and stimuli [25]. A parameter θ of the model is structurally identifiable if Page 2 of 16 y(θ) = y(θ ) ⇔ θ = θ , where y denotes the model predictions, which are observable in the experiments. A parameter θ is structurally locally identifiable if for almost any value θ ∗ there is a neighbourhood V (θ ∗ ) in which the above relationship holds. It is globally identifiable if the relationship holds in all the range of values of the parameter. If there is some region with non-zero measure where the relationship does no (...truncated)