Linear programming based gene expression model (LPM-GEM) predicts the carbon source for Bacillus subtilis. (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188260/pdf/

Linear programming based gene expression model (LPM-GEM) predicts the carbon source for Bacillus subtilis.

(2022) 23:226 Thanamit et al. BMC Bioinformatics https://doi.org/10.1186/s12859-022-04742-7 METHODOLOGY ARTICLE BMC Bioinformatics Open Access Linear programming based gene expression model (LPM‑GEM) predicts the carbon source for Bacillus subtilis Kulwadee Thanamit, Franziska Hoerhold, Marcus Oswald and Rainer Koenig* *Correspondence: Systems Biology Research Group, Institute for Infectious Diseases and Infection Control (IIMK), Jena University Hospital, Kollegiengasse 10, 07743 Jena, Germany Abstract Background: Elucidating cellular metabolism led to many breakthroughs in biotechnology, synthetic biology, and health sciences. To date, deriving metabolic fluxes by 13 C tracer experiments is the most prominent approach for studying metabolic fluxes quantitatively, often with high accuracy and precision. However, the technique has a high demand for experimental resources. Alternatively, flux balance analysis (FBA) has been employed to estimate metabolic fluxes without labeling experiments. It is less informative but can benefit from the low costs and low experimental efforts and gain flux estimates in experimentally difficult conditions. Methods to integrate relevant experimental data have been emerged to improve FBA flux estimations. Data from transcription profiling is often selected since it is easy to generate at the genome scale, typically embedded by a discretization of differential and non-differential expressed genes coding for the respective enzymes. Result: We established the novel method Linear Programming based Gene Expression Model (LPM-GEM). LPM-GEM linearly embeds gene expression into FBA constraints. We implemented three strategies to reduce thermodynamically infeasible loops, which is a necessary prerequisite for such an omics-based model building. As a case study, we built a model of B. subtilis grown in eight different carbon sources. We obtained good flux predictions based on the respective transcription profiles when validating with 13 C tracer based metabolic flux data of the same conditions. We could well predict the specific carbon sources. When testing the model on another, unseen dataset that was not used during training, good prediction performance was also observed. Furthermore, LPM-GEM outperformed a well-established model building methods. Conclusion: Employing LPM-GEM integrates gene expression data efficiently. The method supports gene expression-based FBA models and can be applied as an alternative to estimate metabolic fluxes when tracer experiments are inappropriate. Keywords: Flux balance analysis, Mixed-integer linear programming, Bacillus subtilis, Carbon source, Transcriptomics, Constraint-based modeling, Thermodynamically infeasible loops © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Thanamit et al. BMC Bioinformatics (2022) 23:226 Background Gaining insight into the metabolic fluxes can lead to a better understanding of how cells maintain their metabolic state and how they metabolically adapt to their microenvironment. It has led to astonishing discoveries such as considerably increased production yields after metabolic engineering [1–4], improved strain performance [5–7], and understanding various patho-mechanisms and identifying drug targets to cancer or diabetes [8–12]. In order to determine fluxes in metabolic pathways, metabolites are labeled with the specific 13C isotope and are traced over time employing mass spectrometry [1, 4, 6, 7, 9, 13–15], providing high accuracy and precision [16, 17]. However, these experiments are labor-intensive and costly [16, 18–20]. Besides this, constraint-based modeling (CBM) [21] has been applied to predict metabolic fluxes basing on flux balance analysis (FBA) [22]. FBA can be used to estimate metabolic fluxes without conducting such labeling experiments. Together with biologically reasonable assumptions as, e.g., bacteria or cancer cells aiming to maximize biomass production, fluxes of the metabolic reactions are derived from physiochemical constraints of their stoichiometry. FBA assumes a mass balance at a steady state for each (inner) metabolite. Additional constraints may be derived from thermodynamic constraints implying the directionality and enzyme capacity estimating a maximal enzymatic rate ( Vmax). By this, FBA bypasses the need for reaction kinetic parameters facilitating to construct metabolic models on a genome scale without determining these experimentally demanding parameters [21–25]. It allows getting an estimate of the metabolic flux of interest leading to potential new hypotheses enabling the design of adapted experiments [2, 3, 10, 12, 26]. However, typically, utilizing only the stoichiometry of the reactions is insufficient to achieve good flux predictions. Hence, techniques were developed to add experimental data during model building [23, 27–33]. One of the most effective approaches was to integrate experimental omics data and specifically transcription profiles as they are not labor-intensive to generate on a systems view [34–36]. Though the data is not as direct as 13C tracer based data, it led to considerably good flux predictions [27, 28, 31–33]. Various methods have been developed to use gene expression data for metabolic network models. Most prominently, the approaches define qualitatively discretized expressed/non-expressed reactions by setting a threshold as, e.g., implemented in the integrative Metabolic Analysis Tool (iMAT) [31, 33], the software Gene Inactivity Moderated by Metabolism and Expression (GIMME) [27], Probabilistic Regulation of Metabolism (PROM) [28] or the metabolic Context-specificity Assessed by Deterministic Reaction Evaluation (mCADRE) [32]). Although these context-specific model extraction methods successfully improved flux predictions compared to FBA not basing on expression data, finding suitable thresholds can be challenging. Moreover, employing defined thresholds disregard the fine-grained regulation of (...truncated)