Integrating proteomic or transcriptomic data into metabolic models using linear bound flux balance analysis
Bioinformatics, 34(22), 2018, 3882–3888
doi: 10.1093/bioinformatics/bty445
Advance Access Publication Date: 5 June 2018
Original Paper
Systems biology
Mingyuan Tian1,2 and Jennifer L. Reed1,2,*
1
Department of Chemical & Biological Engineering and 2Great Lakes Bioenergy Research Center, University of
Wisconsin-Madison, Madison, WI 53705, USA
*To whom correspondence should be addressed.
Associate Editor: Jonathan Wren
Received on November 2, 2017; revised on April 3, 2018; editorial decision on May 28, 2018; accepted on June 1, 2018
Abstract
Motivation: Transcriptomics and proteomics data have been integrated into constraint-based models to influence flux predictions. However, it has been reported recently for Escherichia coli and
Saccharomyces cerevisiae, that model predictions from parsimonious flux balance analysis
(pFBA), which does not use expression data, are as good or better than predictions from various
algorithms that integrate transcriptomics or proteomics data into constraint-based models.
Results: In this paper, we describe a novel constraint-based method called Linear Bound Flux
Balance Analysis (LBFBA), which uses expression data (either transcriptomic or proteomic) to predict metabolic fluxes. The method uses expression data to place soft constraints on individual
fluxes, which can be violated. Parameters in the soft constraints are first estimated from a training
expression and flux dataset before being used to predict fluxes from expression data in other conditions. We applied LBFBA to E.coli and S.cerevisiae datasets and found that LBFBA predictions
were more accurate than pFBA predictions, with average normalized errors roughly half of those
from pFBA. For the first time, we demonstrate a computational method that integrates expression
data into constraint-based models and improves quantitative flux predictions over pFBA.
Availability and implementation: Code is available in the Supplementary data available at
Bioinformatics online.
Contact:
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
Constraint-based modeling (CBM) can be used to predict cell physiology (e.g. growth rate and metabolic fluxes) under different conditions and improve our understanding of cell metabolism. CBM has
been applied in metabolic engineering (Burgard et al., 2003; Cotten
and Reed, 2013; Kim et al., 2011; Tervo and Reed, 2012), metabolic
comparisons (Bosi et al., 2016; Hamilton and Reed, 2012; Nuccio
and Bäumler, 2014), drug discovery (Chavali et al., 2012; Kim
et al., 2010; Lee et al., 2009) and other health applications (Becker
and Palsson, 2008; Magnúsdóttir et al., 2017; Shlomi et al., 2008).
Recent developments in sequencing and mass spectrometry have
enabled transcriptomics and proteomics datasets to become more
widely available. These omics datasets can be used to derive
expression-based CBM constraints and/or objective functions,
which can potentially improve model predictions.
There are two fundamental ways that expression data has been
integrated into constraint-based models. The first way is to directly
integrate the expression information into the flux bound. For example, Åkesson et al. (Åkesson et al., 2004) set the fluxes to zero if
expression of their associated genes was low. E-Flux (Colijn et al.,
2009) directly models the maximum allowable flux value as a
function of measured gene expression. The second way is to divide
C The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail:
V
3882
Integrating proteomic or transcriptomic data
into metabolic models using linear bound flux
balance analysis
Using expression data to improve flux predictions
3883
Table 1. A comparison between different constraint-based methods integrating gene expression data
Åkesson E-flux GIMME iMAT tFBA MADE PROM
Directly integrated gene expression into flux bound
Maximized agreement or minimized violation between flux and gene expression
Needs flux data to parameterize constraints
Compared flux predictions to measured intracellular fluxes
Number of experimental conditions used
Yes
No
No
Yes (4a)
1
Yes
No
No
No
1
No
Yes
No
No
1
No
Yes
No
No
1
No
Yes
No
No
9
No
Yes
No
No
4
Yes
No
No
No
907
LBFBA
Yes
No
Yes
Yes (37a)
28b
a
The number of fluxes that were compared.
Sensitivity analysis showed that 4 or 5 conditions in the training dataset were sufficient.
Note: These methods are Åkesson (Åkesson et al., 2004), E-flux (Colijn et al., 2009), GIMME (Becker and Palsson, 2008), Imat (Shlomi et al., 2008), tFBA
(van Berlo et al., 2011), MADE (Jensen and Papin, 2011), PROM (Chandrasekaran and Price, 2010) and LBFBA.
b
involves different constraints and an objective function. Flux balance analysis (FBA) is one of the CBM methods often used to predict a flux distribution which maximizes biomass yield. pFBA uses
the sum of the absolute value of the fluxes as an objective function
[Equation (1)] and can be formulated as:
X
jvj j
(1)
min
j2Reaction
s.t.
X
Sij vj ¼ 0 8i 2 Metabolite
LBj vj UBj 8j 2 Reaction
(3)
vj 0 8j 2 Irreversible Reaction
(4)
vj ¼ vlsj 8j 2 Extracellular Reaction
(5)
vbiomass ¼ vmeasured
2.1 Overview of pFBA
CBM is a powerful tool to predict cellular phenotypes and flux distributions. The basic formulation of a constraint-based model
biomass
(6)
Equation (2) is the steady-state mass balance constraint, meaning
there is no accumulation for each metabolite in the cell. S denotes
the stoichiometric matrix where Sij is the stoichiometric coefficient
of metabolite i for reaction j. vj is the flux through reaction j.
Equation (3) is the enzyme capacity constraint, which imposes an
upper bound (UBj ) and lower bound (LBj ) for each reaction (which
is typically 1000 and 1000 mmol/gDW/h, respectively). Equation
(4) ensures that fluxes through irreversible reactions are nonnegative. Equation (5) fixes the extracellular flux values to the bestestimates of the extracellular fluxes (vlsj ) obtained from a least
squares fit between the metabolic model and extracellular flux measurements (see Supplementary Methods for details). Equation (6)
fixes the biomass flux (i.e. growth rate) to the measured value. By
solving pFBA, the flux distribution under a specific condition can be
predicted.
2.2 Mathematical formulation of LBFBA
In LBFBA, gene or protein expression data are used to further tighten the upper and lower bounds for individual fluxes. LBFBA is formulated as the following optimization problem:
X
X
min
jvj j þ b
aj
(7)
j2Reaction
2 Materials and methods
(2)
j2Reaction
j2Rexp
s.t.
X
j2Reaction
Sij vj ¼ 0 8i 2 Metabolite
(8)
the reactions into different categories based on gene expression (e.g.
highly expressed or lowly expressed) and then maximize the agreement (reactions associated with highly expressed genes have high
flux) or minimize the disagreement (reactions associated with lowly
expressed genes should not have hig (...truncated)