Linear programming based gene expression model (LPM-GEM) predicts the carbon source for Bacillus subtilis.
(2022) 23:226
Thanamit et al. BMC Bioinformatics
https://doi.org/10.1186/s12859-022-04742-7
METHODOLOGY ARTICLE
BMC Bioinformatics
Open Access
Linear programming based gene expression
model (LPM‑GEM) predicts the carbon source
for Bacillus subtilis
Kulwadee Thanamit, Franziska Hoerhold, Marcus Oswald and Rainer Koenig*
*Correspondence:
Systems Biology Research
Group, Institute for Infectious
Diseases and Infection Control
(IIMK), Jena University Hospital,
Kollegiengasse 10, 07743 Jena,
Germany
Abstract
Background: Elucidating cellular metabolism led to many breakthroughs in biotechnology, synthetic biology, and health sciences. To date, deriving metabolic fluxes by
13
C tracer experiments is the most prominent approach for studying metabolic fluxes
quantitatively, often with high accuracy and precision. However, the technique has a
high demand for experimental resources. Alternatively, flux balance analysis (FBA) has
been employed to estimate metabolic fluxes without labeling experiments. It is less
informative but can benefit from the low costs and low experimental efforts and gain
flux estimates in experimentally difficult conditions. Methods to integrate relevant
experimental data have been emerged to improve FBA flux estimations. Data from
transcription profiling is often selected since it is easy to generate at the genome scale,
typically embedded by a discretization of differential and non-differential expressed
genes coding for the respective enzymes.
Result: We established the novel method Linear Programming based Gene Expression
Model (LPM-GEM). LPM-GEM linearly embeds gene expression into FBA constraints. We
implemented three strategies to reduce thermodynamically infeasible loops, which is
a necessary prerequisite for such an omics-based model building. As a case study, we
built a model of B. subtilis grown in eight different carbon sources. We obtained good
flux predictions based on the respective transcription profiles when validating with
13
C tracer based metabolic flux data of the same conditions. We could well predict the
specific carbon sources. When testing the model on another, unseen dataset that was
not used during training, good prediction performance was also observed. Furthermore, LPM-GEM outperformed a well-established model building methods.
Conclusion: Employing LPM-GEM integrates gene expression data efficiently. The
method supports gene expression-based FBA models and can be applied as an alternative to estimate metabolic fluxes when tracer experiments are inappropriate.
Keywords: Flux balance analysis, Mixed-integer linear programming, Bacillus subtilis,
Carbon source, Transcriptomics, Constraint-based modeling, Thermodynamically
infeasible loops
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi
cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Thanamit et al. BMC Bioinformatics
(2022) 23:226
Background
Gaining insight into the metabolic fluxes can lead to a better understanding of how cells
maintain their metabolic state and how they metabolically adapt to their microenvironment. It has led to astonishing discoveries such as considerably increased production
yields after metabolic engineering [1–4], improved strain performance [5–7], and understanding various patho-mechanisms and identifying drug targets to cancer or diabetes
[8–12]. In order to determine fluxes in metabolic pathways, metabolites are labeled with
the specific 13C isotope and are traced over time employing mass spectrometry [1, 4, 6,
7, 9, 13–15], providing high accuracy and precision [16, 17]. However, these experiments
are labor-intensive and costly [16, 18–20]. Besides this, constraint-based modeling
(CBM) [21] has been applied to predict metabolic fluxes basing on flux balance analysis (FBA) [22]. FBA can be used to estimate metabolic fluxes without conducting such
labeling experiments. Together with biologically reasonable assumptions as, e.g., bacteria or cancer cells aiming to maximize biomass production, fluxes of the metabolic reactions are derived from physiochemical constraints of their stoichiometry. FBA assumes
a mass balance at a steady state for each (inner) metabolite. Additional constraints may
be derived from thermodynamic constraints implying the directionality and enzyme
capacity estimating a maximal enzymatic rate ( Vmax). By this, FBA bypasses the need for
reaction kinetic parameters facilitating to construct metabolic models on a genome scale
without determining these experimentally demanding parameters [21–25]. It allows getting an estimate of the metabolic flux of interest leading to potential new hypotheses
enabling the design of adapted experiments [2, 3, 10, 12, 26]. However, typically, utilizing
only the stoichiometry of the reactions is insufficient to achieve good flux predictions.
Hence, techniques were developed to add experimental data during model building [23,
27–33]. One of the most effective approaches was to integrate experimental omics data
and specifically transcription profiles as they are not labor-intensive to generate on a
systems view [34–36]. Though the data is not as direct as 13C tracer based data, it led to
considerably good flux predictions [27, 28, 31–33]. Various methods have been developed to use gene expression data for metabolic network models. Most prominently,
the approaches define qualitatively discretized expressed/non-expressed reactions
by setting a threshold as, e.g., implemented in the integrative Metabolic Analysis Tool
(iMAT) [31, 33], the software Gene Inactivity Moderated by Metabolism and Expression
(GIMME) [27], Probabilistic Regulation of Metabolism (PROM) [28] or the metabolic
Context-specificity Assessed by Deterministic Reaction Evaluation (mCADRE) [32]).
Although these context-specific model extraction methods successfully improved flux
predictions compared to FBA not basing on expression data, finding suitable thresholds
can be challenging. Moreover, employing defined thresholds disregard the fine-grained
regulation of (...truncated)