A Boolean-based systems biology approach to predict novel genes associated with cancer: Application to colorectal cancer (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcsystbiol.biomedcentral.com/track/pdf/10.1186/1752-0509-5-35

A Boolean-based systems biology approach to predict novel genes associated with cancer: Application to colorectal cancer

Nagaraj and Reverter BMC Systems Biology 2011, 5:35 http://www.biomedcentral.com/1752-0509/5/35 RESEARCH ARTICLE Open Access A Boolean-based systems biology approach to predict novel genes associated with cancer: Application to colorectal cancer Shivashankar H Nagaraj, Antonio Reverter* Abstract Background: Cancer has remarkable complexity at the molecular level, with multiple genes, proteins, pathways and regulatory interconnections being affected. We introduce a systems biology approach to study cancer that formally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biology and, as a proof of concept, we apply it to colorectal cancer. Results: We first classified all the genes in the human genome into cancer-associated and non-cancer-associated genes based on extensive literature mining. We then selected a set of functional attributes proven to be highly relevant to cancer biology that includes protein kinases, secreted proteins, transcription factors, post-translational modifications of proteins, DNA methylation and tissue specificity. These cancer-associated genes were used to extract ‘common cancer fingerprints’ through these molecular attributes, and a Boolean logic was implemented in such a way that both the expression data and functional attributes could be rationally integrated, allowing for the generation of a guilt-by-association algorithm to identify novel cancer-associated genes. Finally, these candidate genes are interlaced with the known cancer-related genes in a network analysis aimed at identifying highly conserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach using colorectal cancer as a test case and identify several novel candidate genes that are classified according to their functional attributes. These genes include the following: 1) secreted proteins as potential biomarkers for the early detection of colorectal cancer (FXYD1, GUCA2B, REG3A); 2) kinases as potential drug candidates to prevent tumor growth (CDC42BPB, EPHB3, TRPM6); and 3) potential oncogenic transcription factors (CDK8, MEF2C, ZIC2). Conclusion: We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficiently predicts novel cancer-associated genes and has universal applicability to the study and advancement of cancer research. Background Cancer is a complex genetic disease that exhibits remarkable complexity at the molecular level with multiple genes, proteins and pathways and regulatory interconnections being affected. Treating cancer is equally complex and depends on a number of factors, including environmental factors, early detection, chemotherapy and surgery. Cancer is being recognized as a systems biology disease [1,2], as illustrated by multiple studies * Correspondence: Computational and Systems Biology, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Livestock Industries, Queensland Bioscience Precinct, 306 Carmody Road, St. Lucia, Brisbane, Queensland 4067, Australia that include molecular data integration and network and pathway analyses in a genome-wide fashion. Such studies have advanced cancer research by providing a global view of cancer biology as molecular circuitry rather than the dysregulation of a single gene or pathway. For instance, reverse-engineering of gene networks derived from expression profiles was used to study prostate cancer [3], from which the androgen-receptor (AR) emerged as the top candidate marker to detect the aggressiveness of prostate cancers. Similarly, subnetworks were proposed as potential markers rather than individual genes to distinguish metastatic from non-metastatic tumors in a breast cancer study [4]. The authors in this study argue that sub-network markers © 2011 Nagaraj and Reverter; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Nagaraj and Reverter BMC Systems Biology 2011, 5:35 http://www.biomedcentral.com/1752-0509/5/35 are more reproducible than individual marker genes selected without network information and that they achieve higher accuracy in the classification of metastatic versus non-metastatic tumor signaling. Using genome-wide dysregulated interaction data in B-cell lymphomas, novel oncogenes have been predicted in-silico [5]. Finally, taking a signaling-pathway approach, a map of a human cancer signaling network was built [6] by integrating cancer signaling pathways with cancerassociated, genetically and epigenetically altered genes. Gene expression profiling has been widely used to investigate the molecular circuitry of cancer. In particular, DNA microarrays have been used in almost all of the main cancers and promise to change the way cancer is diagnosed, classified and treated [1]. However, expression analyses often result in hundreds of outliers, or differentially expressed genes between normal and cancer cells or across time points [2]. Owing to the large number of candidate genes, several different hypotheses can be generated to explain the variation in the expression patterns for a given study. In addition, the preferential expressions of some tissue-specific genes present additional challenges in expression data analyses. Nevertheless, recent systems approaches have attempted to prioritize differentially expressed genes by overlaying expression data with molecular data, such as interaction data [3], metabolic data [4] and phenotypic data [5]. Human malignancies are not just confined to genes and gene products, but also include epigenetic modifications such as DNA methylation and chromosomal aberrations. However, in order to effectively capture the properties that emerge in a complex disease, we need analytical methods that provide a robust framework to formally integrate prior knowledge of the biological attributes with the experimental data. The simplest heuristic will search for novel genes with a profile, in terms of differential expression and/or network connectivity, similar to those for which an association to disease has been well established (see, for instance, the approaches of [7,8]). Boolean logic has been found to be optimal for such tasks. Within the context of cancer, Mukherjee and Speed [9] show how a series of biological attributes including ligands, receptors and cytosolic proteins, can be included in the network inference. More recently, Mukherjee and co-workers [10] introduced an approach based on sparse Boolean functions and applied it to the responsiveness of breast cancer cell lines to an anticancer agent. In addition, large scale literature-based Boolean models have been used to study apoptosis pathways as well as pathways connected with them. In this study, we prop (...truncated)