A PDF file should load here. If you do not see its contents
the file may be temporarily unavailable at the journal website
or you do not have a PDF plug-in installed and enabled in your browser.
Alternatively, you can download the file locally and open with any standalone PDF reader:
http://www.biomedcentral.com/content/pdf/1471-2105-11-162.pdf
Knowledge-guided gene ranking by coordinative component analysis
Chen Wang
0
Jianhua Xuan
0
Huai Li
Yue Wang
0
Ming Zhan
Eric P Hoffman
Robert Clarke
0
Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University
,
Arlington, VA
,
USA
Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data. Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers. Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.
-
Background
It is of great interest to identify genes strongly associated
with the functionality of gene networks or signal
transduction pathways particularly from gene expression
microarray data. Two of the earliest approaches to
identify such genes are fold-change and multiple t-testing;
each aims to rank the genes in the order of their
differential expressions under various experimental conditions.
Many improvements to the original t-test method have
been proposed for microarray data analysis. For example,
significant analysis of microarray (SAM) [1] uses a
modified t-statistic with an added estimator for gene ranking
in which the false discovery rate (FDR) is estimated by a
permutation procedure. A bootstrapped p-value
approach was introduced in [2] to address the inherent
variability in small sample studies. Prior studies have
shown that fold-change is more robust than t-test with
respect to the reproducibility of gene rankings [3], while
other researchers argue that better reproducibility does
not guarantee the accuracy of gene ranking[4].
Nonetheless, both methods are severely limited because they
neglect the interaction among genes, prioritizing gene
relevance only based on individual gene expression
values.
To address the above-mentioned problem, several gene
ranking methods have been proposed to either consider
the joint effect of genes or to explore the expression
pattern in time-course data. For instance, Opgen-Rhein &
Strimmer [5] introduced the "shrinkage t" statistic that is
based on a novel and model-free shrinkage estimate of
the variance vector across genes. Storey et al. [6]
proposed a method (EDGE) to first fit the time-course
expression pattern by splines, and then rank genes by
hypothesis testing on the spline parameters. Furlanello et
al. [7] proposed a classification-based feature elimination
scheme to rank genes by iteratively discarding chunks of
genes showing least contribution to the classifier.
In contrast, other investigators have proposed
incorporating biological knowledge for gene ranking. GeneRank
[8] ranks genes by integrating gene expression and
network structure derived from gene annotations. Ma et al.
[9] proposed a strategy to combine gene expression and
protein-protein interaction (PPI) knowledge, ranking
genes by their association with phenotype calibrated by
the PPI information. However, such data integration,
while widely adopted, is usually done in a heuristic way
and lacks an objective estimate of the true interplay
between biological knowledge and gene expression data.
In this paper, we propose a knowledge-guided gene
ranking scheme, namely a coordina (...truncated)