Predictive analytics of environmental adaptability in multi-omic network models
www.nature.com/scientificreports
OPEN
Predictive analytics of
environmental adaptability in
multi-omic network models
received: 20 March 2015
accepted: 14 September 2015
Published: 20 October 2015
Claudio Angione & Pietro Lió
Bacterial phenotypic traits and lifestyles in response to diverse environmental conditions depend
on changes in the internal molecular environment. However, predicting bacterial adaptability is
still difficult outside of laboratory controlled conditions. Many molecular levels can contribute
to the adaptation to a changing environment: pathway structure, codon usage, metabolism. To
measure adaptability to changing environmental conditions and over time, we develop a multiomic model of Escherichia coli that accounts for metabolism, gene expression and codon usage at
both transcription and translation levels. After the integration of multiple omics into the model,
we propose a multiobjective optimization algorithm to find the allowable and optimal metabolic
phenotypes through concurrent maximization or minimization of multiple metabolic markers. In the
condition space, we propose Pareto hypervolume and spectral analysis as estimators of short term
multi-omic (transcriptomic and metabolic) evolution, thus enabling comparative analysis of metabolic
conditions. We therefore compare, evaluate and cluster different experimental conditions, models
and bacterial strains according to their metabolic response in a multidimensional objective space,
rather than in the original space of microarray data. We finally validate our methods on a phenomics
dataset of growth conditions. Our framework, named METRADE, is freely available as a MATLAB
toolbox.
As biologists would agree, there is no biology except in the light of evolution1. However, much of the
uncertainty about the behavior of a microorganism is due to the lack of statistical bioinformatics methodologies for accurate measurement of adaptability to different environmental conditions and over
time2,3. Approaches involving both mathematics and bioinformatics would benefit from the study of the
molecular response to the adaptation. In turn, this would enable to discover the relation between the
environmental (“external”) conditions and the changes in the metabolic-phenotypic networks (the “internal” environment). At the same time, it would elucidate the genotype-phenotype relationship, which is
still an open problem in biology.
Many molecular levels can contribute to adaptability: (i) metabolism, i.e. the set of chemical reactions
taking place in a living organism; (ii) pathway structure, namely groups of biologically-related reactions
with a common goal; (iii) transcriptomics and codon usage, and in general the ability to regulate the
speed of transcription and translation of genes into proteins. For instance, a highly adaptive bacterium
ensures that the structure of its metabolism and the pathway productivity rapidly evolve over time due to
varying environmental conditions or selective pressure4. Analogously, several recent examples show the
coupling of codon usage to adaptive phenotypic variation, suggesting that the genotype functionality and
behavior can be derived from the analysis of the evolution in the codon usage5. Typically, the correlation
between gene expression and codon bias is large for environments similar to those in which the organism
evolved, and small for dissimilar environments6.
Measurements of gene expression level are able to generate transcriptional profiles of microorganisms across a diverse set of environmental conditions. Databases of environmental conditions have been
Computer Laboratory - University of Cambridge, UK. Correspondence and requests for materials should be
addressed to C.A. (email: )
Scientific Reports | 5:15147 | DOI: 10.1038/srep15147
1
www.nature.com/scientificreports/
recently produced for several organisms, including Escherichia coli7, Clostridium8, Salmonella9, and fission yeast10. Although such resources, coupled with statistical analysis, remain key to the interpretation
of measured data, they do not provide a comprehensive understanding of the resulting cellular behavior.
Examples are the cases in which similar gene expression profiles may cause different phenotypic outcomes, while different environmental conditions may give rise to similar behaviors. Additionally, the
actual response to a given condition is highly dependent on the multiple cellular objectives that the
microorganism is required to meet11,12.
Here, we explore the adaptability of E. coli by investigating experimental conditions mapped to a multidimensional objective space. To obtain a phase-space of conditions, we add the gene expression and the
codon usage layers to a flux-balance analysis (FBA) framework, therefore proposing a new multi-omic
model. As a first result, we are able to optimize these layers for the overproduction of metabolites of
interest, predicting the short term bacterial evolution towards the optimum. Then, we present a new
method to map compendia of gene expression profiles to any metabolic objective space. Since each
profile is associated with a growth condition, the objective space becomes the condition phase-space,
which we investigate through principal component analysis, pseudospectra, and a spectral method for
community detection.
To optimize these multi-omic layers, we propose a genetic multiobjective optimization algorithm that
seeks the gene expression profiles such that multiple cellular functions are optimized concurrently. We
use the Pareto front as a tool to seek trade-offs between two or more tasks performed by E. coli, and
specifically to score the performance when the tasks are contending with one another. We simultaneously
optimize tasks by finding the best gene expression profile and codon usage array. Most notably, this may
permit to determine the best environmental condition in which a bacterium has to be grown in order to
reach specific optimal output values from a range of objective functions chosen by the researcher. As a
particular case, it is also possible to investigate the best single or multiple gene knockouts for the given
set of objectives13.
The paper is organized as follows. First, we define the new multi-omic model by adding layers to FBA,
in order to build the level of information required for a meaningful understanding of the landscape of
experimental conditions. Using this augmented FBA framework, we optimize the model and we perform a temporal analysis of bacterial evolution towards an optimal configuration using the hypervolume
indicator. Then, we introduce principal component analysis, pseudospectra and community detection
methods to identify conditions mapped to close regions in the phase-space. We finally derive clusters of
isoadaptability computed not in an absolute fashion, but taking into account the cellular multi-omic. Our
approach is validated against a compendium of growth conditions including measurements of growth
r (...truncated)