Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach
RESEARCH ARTICLE
Consistency Analysis of Genome-Scale
Models of Bacterial Metabolism: A
Metamodel Approach
Miguel Ponce-de-Leon1*, Jorge Calle-Espinosa1, Juli Peretó2, Francisco Montero1
1 Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad
Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain, 2 Departament de Bioquímica i Biologia
Molecular and Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, C/José Beltrán
2, Paterna 46980, Spain
*
Abstract
OPEN ACCESS
Citation: Ponce-de-Leon M, Calle-Espinosa J,
Peretó J, Montero F (2015) Consistency Analysis of
Genome-Scale Models of Bacterial Metabolism: A
Metamodel Approach. PLoS ONE 10(12): e0143626.
doi:10.1371/journal.pone.0143626
Editor: Julio Vera, University of ErlangenNuremberg, GERMANY
Received: July 2, 2015
Accepted: November 6, 2015
Published: December 2, 2015
Copyright: © 2015 Ponce-de-Leon et al. This is an
open access article distributed under the terms of the
Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are
credited.
Data Availability Statement: Data are all contained
within the paper and/or Supporting Information files.
Funding: Financial support from Spanish
Government (grant reference: BFU2012-39816-C0201 co-financed by FEDER funds and Ministerio de
Economía y Competitividad) and Generalitat
Valenciana (grant reference: PROMETEOII/2014/
065) is grateful acknowledged. Also, we obtained
support from a doctoral fellowship granted to JCE
from the Obra Social Programme of La Caixa
Savings Bank. The funders had no role in study
design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked
reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset
of 130 genome-scale models. The results showed that a large number of reactions (~22%)
are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This
metamodel was manually curated using the unconnected modules approach, and then, it
was used as a reference network to perform a gap-filling on each individual genome-scale
model. Finally, a set of 36 models that had not been considered during the construction of
the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed
on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models
were already present in the metabolic database used during the reconstructions process; 2)
the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are
highly dependent on the consistency and completeness of the metamodel or metabolic
database used as the reference network. In conclusion the consistency analysis should be
applied to metabolic databases in order to detect and fill gaps as well as to detect and
remove artifacts and redundant information.
Introduction
Metabolic reconstruction is the computational process that aims to elucidate the biochemical
network of reactions and metabolites which defines the cell metabolism of a certain organism
[1,2]. Since metabolic reconstruction is tightly integrated with genomic information, it can be
viewed as a detailed functional annotation of the genome [3,4]. In the first stages of a
PLOS ONE | DOI:10.1371/journal.pone.0143626 December 2, 2015
1 / 22
Consistency Analysis of Genome-Scale Models of Bacterial Metabolism
Competing Interests: The authors have declared
that no competing interests exist.
reconstruction, the genome sequence and its annotation are the main source of information
used to infer the biochemical pathways of an organism [5]. Furthermore, each entry annotated
as an enzyme coding gene usually contains some identifiers, such as Gene Ontology (GO)
terms or Enzyme Commission (EC) numbers, which allow the construction of the gene-protein-reaction rules [6], by mapping one or more coding sequences to one or more reactions,
through a protein or protein complex. After this, a metabolic database is used to map the enzymatic activities to instances of biochemical reactions, through their EC numbers [7]. A metabolic database typically describes collections of enzymes, reactions and biochemical pathways,
which cover most of the known biochemistry [8,9]. Databases commonly used in metabolic
reconstruction include the SEED [10], BiGG [11], KEGG [9] or Metacyc [12], among others.
Although the objective of a metabolic reconstruction may be to create an organism's specific
metabolic database, in many cases the final goal is to develop a genome-scale metabolic model
(GSM), that is to say, an in-silico representation of a metabolic network [13]. A GSM can be
used to generate hypotheses about the metabolic capabilities of the network through the
computational framework known as constraint-based modeling (CBM), which eventually may
be experimentally tested [14,15]. Genome-scale reconstruction has rapidly grown in recent
years, as has its range of applications [16,17]. Moreover, CBM can be used to improve model
formulation by the detection and resolution of inconsistencies. In this sense the analysis of a
GSM can be used to refine the annotation of the genome [18,19] and thus to improve model
formulation.
In general, inconsistencies will appear as holes in the structure of the network. These holes
might indicate either global or organism specific gaps in biological knowledge. Global gaps are
reflected in the existence of metabolites with an unknown biochemical fate [20], as well as the
large number of orphan enzymatic activities [21,22]. On the other hand, organism specific
gaps are commonly associated with genome annotation errors reflected in the absence of enzymatic activities coded in the genome, or in the inclusion of activities that are not present in the
considered metabolism [23].
The resolution of inconsistencies in GSMs is known as network curation. This is a decisionmaking process where wrongly annotated reactions are removed and candidate reactions are
included for the purpose of solving model gaps. The prediction of candidate reactions for filling
gaps is referred to as the gap-filling problem [23,24]. Several methods have been proposed for
identifying and solving inconsistencies in GSMs in an automatic fashion. Some of these methods rely on the application of optimization techniques [18,24–29], w (...truncated)