Knowledge-guided gene ranking by coordinative component analysis

BMC Bioinformatics, Mar 2010

Background In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data. Results To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers. Conclusion We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/1471-2105-11-162.pdf

Knowledge-guided gene ranking by coordinative component analysis

Chen Wang 0 Jianhua Xuan 0 Huai Li Yue Wang 0 Ming Zhan Eric P Hoffman Robert Clarke 0 Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University , Arlington, VA , USA Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data. Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers. Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification. - Background It is of great interest to identify genes strongly associated with the functionality of gene networks or signal transduction pathways particularly from gene expression microarray data. Two of the earliest approaches to identify such genes are fold-change and multiple t-testing; each aims to rank the genes in the order of their differential expressions under various experimental conditions. Many improvements to the original t-test method have been proposed for microarray data analysis. For example, significant analysis of microarray (SAM) [1] uses a modified t-statistic with an added estimator for gene ranking in which the false discovery rate (FDR) is estimated by a permutation procedure. A bootstrapped p-value approach was introduced in [2] to address the inherent variability in small sample studies. Prior studies have shown that fold-change is more robust than t-test with respect to the reproducibility of gene rankings [3], while other researchers argue that better reproducibility does not guarantee the accuracy of gene ranking[4]. Nonetheless, both methods are severely limited because they neglect the interaction among genes, prioritizing gene relevance only based on individual gene expression values. To address the above-mentioned problem, several gene ranking methods have been proposed to either consider the joint effect of genes or to explore the expression pattern in time-course data. For instance, Opgen-Rhein & Strimmer [5] introduced the "shrinkage t" statistic that is based on a novel and model-free shrinkage estimate of the variance vector across genes. Storey et al. [6] proposed a method (EDGE) to first fit the time-course expression pattern by splines, and then rank genes by hypothesis testing on the spline parameters. Furlanello et al. [7] proposed a classification-based feature elimination scheme to rank genes by iteratively discarding chunks of genes showing least contribution to the classifier. In contrast, other investigators have proposed incorporating biological knowledge for gene ranking. GeneRank [8] ranks genes by integrating gene expression and network structure derived from gene annotations. Ma et al. [9] proposed a strategy to combine gene expression and protein-protein interaction (PPI) knowledge, ranking genes by their association with phenotype calibrated by the PPI information. However, such data integration, while widely adopted, is usually done in a heuristic way and lacks an objective estimate of the true interplay between biological knowledge and gene expression data. In this paper, we propose a knowledge-guided gene ranking scheme, namely a coordina (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2105-11-162.pdf

Chen Wang, Jianhua Xuan, Huai Li, Yue Wang, Ming Zhan, Eric P Hoffman, Robert Clarke. Knowledge-guided gene ranking by coordinative component analysis, BMC Bioinformatics, 2010, pp. 162, 11, DOI: 10.1186/1471-2105-11-162