A Boolean-based systems biology approach to predict novel genes associated with cancer: Application to colorectal cancer
Nagaraj and Reverter BMC Systems Biology 2011, 5:35
http://www.biomedcentral.com/1752-0509/5/35
RESEARCH ARTICLE
Open Access
A Boolean-based systems biology approach to
predict novel genes associated with cancer:
Application to colorectal cancer
Shivashankar H Nagaraj, Antonio Reverter*
Abstract
Background: Cancer has remarkable complexity at the molecular level, with multiple genes, proteins, pathways
and regulatory interconnections being affected. We introduce a systems biology approach to study cancer that
formally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biology
and, as a proof of concept, we apply it to colorectal cancer.
Results: We first classified all the genes in the human genome into cancer-associated and non-cancer-associated
genes based on extensive literature mining. We then selected a set of functional attributes proven to be highly
relevant to cancer biology that includes protein kinases, secreted proteins, transcription factors, post-translational
modifications of proteins, DNA methylation and tissue specificity. These cancer-associated genes were used to
extract ‘common cancer fingerprints’ through these molecular attributes, and a Boolean logic was implemented in
such a way that both the expression data and functional attributes could be rationally integrated, allowing for the
generation of a guilt-by-association algorithm to identify novel cancer-associated genes. Finally, these candidate
genes are interlaced with the known cancer-related genes in a network analysis aimed at identifying highly
conserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach using
colorectal cancer as a test case and identify several novel candidate genes that are classified according to their
functional attributes. These genes include the following: 1) secreted proteins as potential biomarkers for the early
detection of colorectal cancer (FXYD1, GUCA2B, REG3A); 2) kinases as potential drug candidates to prevent tumor
growth (CDC42BPB, EPHB3, TRPM6); and 3) potential oncogenic transcription factors (CDK8, MEF2C, ZIC2).
Conclusion: We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficiently
predicts novel cancer-associated genes and has universal applicability to the study and advancement of cancer
research.
Background
Cancer is a complex genetic disease that exhibits
remarkable complexity at the molecular level with multiple genes, proteins and pathways and regulatory interconnections being affected. Treating cancer is equally
complex and depends on a number of factors, including
environmental factors, early detection, chemotherapy
and surgery. Cancer is being recognized as a systems
biology disease [1,2], as illustrated by multiple studies
* Correspondence:
Computational and Systems Biology, Commonwealth Scientific and Industrial
Research Organisation (CSIRO), Division of Livestock Industries, Queensland
Bioscience Precinct, 306 Carmody Road, St. Lucia, Brisbane, Queensland 4067,
Australia
that include molecular data integration and network and
pathway analyses in a genome-wide fashion. Such studies have advanced cancer research by providing a global view of cancer biology as molecular circuitry rather
than the dysregulation of a single gene or pathway. For
instance, reverse-engineering of gene networks derived
from expression profiles was used to study prostate cancer [3], from which the androgen-receptor (AR)
emerged as the top candidate marker to detect the
aggressiveness of prostate cancers. Similarly, subnetworks were proposed as potential markers rather
than individual genes to distinguish metastatic from
non-metastatic tumors in a breast cancer study [4]. The
authors in this study argue that sub-network markers
© 2011 Nagaraj and Reverter; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Nagaraj and Reverter BMC Systems Biology 2011, 5:35
http://www.biomedcentral.com/1752-0509/5/35
are more reproducible than individual marker genes
selected without network information and that they
achieve higher accuracy in the classification of metastatic versus non-metastatic tumor signaling. Using genome-wide dysregulated interaction data in B-cell
lymphomas, novel oncogenes have been predicted
in-silico [5]. Finally, taking a signaling-pathway approach,
a map of a human cancer signaling network was built [6]
by integrating cancer signaling pathways with cancerassociated, genetically and epigenetically altered genes.
Gene expression profiling has been widely used to
investigate the molecular circuitry of cancer. In particular, DNA microarrays have been used in almost all of
the main cancers and promise to change the way cancer
is diagnosed, classified and treated [1]. However, expression analyses often result in hundreds of outliers, or differentially expressed genes between normal and cancer
cells or across time points [2]. Owing to the large number of candidate genes, several different hypotheses can
be generated to explain the variation in the expression
patterns for a given study. In addition, the preferential
expressions of some tissue-specific genes present additional challenges in expression data analyses. Nevertheless, recent systems approaches have attempted to
prioritize differentially expressed genes by overlaying
expression data with molecular data, such as interaction
data [3], metabolic data [4] and phenotypic data [5].
Human malignancies are not just confined to genes
and gene products, but also include epigenetic modifications such as DNA methylation and chromosomal aberrations. However, in order to effectively capture the
properties that emerge in a complex disease, we need
analytical methods that provide a robust framework to
formally integrate prior knowledge of the biological
attributes with the experimental data. The simplest
heuristic will search for novel genes with a profile, in
terms of differential expression and/or network connectivity, similar to those for which an association to disease has been well established (see, for instance, the
approaches of [7,8]).
Boolean logic has been found to be optimal for such
tasks. Within the context of cancer, Mukherjee and
Speed [9] show how a series of biological attributes
including ligands, receptors and cytosolic proteins, can
be included in the network inference. More recently,
Mukherjee and co-workers [10] introduced an approach
based on sparse Boolean functions and applied it to the
responsiveness of breast cancer cell lines to an anticancer agent. In addition, large scale literature-based
Boolean models have been used to study apoptosis pathways as well as pathways connected with them.
In this study, we prop (...truncated)