Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0143626&type=printable

Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach

RESEARCH ARTICLE Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach Miguel Ponce-de-Leon1*, Jorge Calle-Espinosa1, Juli Peretó2, Francisco Montero1 1 Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain, 2 Departament de Bioquímica i Biologia Molecular and Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, C/José Beltrán 2, Paterna 46980, Spain * Abstract OPEN ACCESS Citation: Ponce-de-Leon M, Calle-Espinosa J, Peretó J, Montero F (2015) Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach. PLoS ONE 10(12): e0143626. doi:10.1371/journal.pone.0143626 Editor: Julio Vera, University of ErlangenNuremberg, GERMANY Received: July 2, 2015 Accepted: November 6, 2015 Published: December 2, 2015 Copyright: © 2015 Ponce-de-Leon et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: Data are all contained within the paper and/or Supporting Information files. Funding: Financial support from Spanish Government (grant reference: BFU2012-39816-C0201 co-financed by FEDER funds and Ministerio de Economía y Competitividad) and Generalitat Valenciana (grant reference: PROMETEOII/2014/ 065) is grateful acknowledged. Also, we obtained support from a doctoral fellowship granted to JCE from the Obra Social Programme of La Caixa Savings Bank. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information. Introduction Metabolic reconstruction is the computational process that aims to elucidate the biochemical network of reactions and metabolites which defines the cell metabolism of a certain organism [1,2]. Since metabolic reconstruction is tightly integrated with genomic information, it can be viewed as a detailed functional annotation of the genome [3,4]. In the first stages of a PLOS ONE | DOI:10.1371/journal.pone.0143626 December 2, 2015 1 / 22 Consistency Analysis of Genome-Scale Models of Bacterial Metabolism Competing Interests: The authors have declared that no competing interests exist. reconstruction, the genome sequence and its annotation are the main source of information used to infer the biochemical pathways of an organism [5]. Furthermore, each entry annotated as an enzyme coding gene usually contains some identifiers, such as Gene Ontology (GO) terms or Enzyme Commission (EC) numbers, which allow the construction of the gene-protein-reaction rules [6], by mapping one or more coding sequences to one or more reactions, through a protein or protein complex. After this, a metabolic database is used to map the enzymatic activities to instances of biochemical reactions, through their EC numbers [7]. A metabolic database typically describes collections of enzymes, reactions and biochemical pathways, which cover most of the known biochemistry [8,9]. Databases commonly used in metabolic reconstruction include the SEED [10], BiGG [11], KEGG [9] or Metacyc [12], among others. Although the objective of a metabolic reconstruction may be to create an organism's specific metabolic database, in many cases the final goal is to develop a genome-scale metabolic model (GSM), that is to say, an in-silico representation of a metabolic network [13]. A GSM can be used to generate hypotheses about the metabolic capabilities of the network through the computational framework known as constraint-based modeling (CBM), which eventually may be experimentally tested [14,15]. Genome-scale reconstruction has rapidly grown in recent years, as has its range of applications [16,17]. Moreover, CBM can be used to improve model formulation by the detection and resolution of inconsistencies. In this sense the analysis of a GSM can be used to refine the annotation of the genome [18,19] and thus to improve model formulation. In general, inconsistencies will appear as holes in the structure of the network. These holes might indicate either global or organism specific gaps in biological knowledge. Global gaps are reflected in the existence of metabolites with an unknown biochemical fate [20], as well as the large number of orphan enzymatic activities [21,22]. On the other hand, organism specific gaps are commonly associated with genome annotation errors reflected in the absence of enzymatic activities coded in the genome, or in the inclusion of activities that are not present in the considered metabolism [23]. The resolution of inconsistencies in GSMs is known as network curation. This is a decisionmaking process where wrongly annotated reactions are removed and candidate reactions are included for the purpose of solving model gaps. The prediction of candidate reactions for filling gaps is referred to as the gap-filling problem [23,24]. Several methods have been proposed for identifying and solving inconsistencies in GSMs in an automatic fashion. Some of these methods rely on the application of optimization techniques [18,24–29], w (...truncated)