Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures
Chih-Hung Jen
2
Tsun-Po Yang
1
3
Chien-Yi Tung
1
Shu-Han Su
1
Chi- Hung Lin
1
2
4
Ming-Ta Hsu
0
2
Hsei-Wei Wang
1
2
4
0
Institute of Biochemistry and Molecular Biology, National Yang-Ming University
,
Taipei
,
Taiwan
1
Institute of Microbiology and Immunology, National Yang-Ming University
,
Taipei
,
Taiwan
2
Microarray & Gene Expression Analysis Core Facility, VGH National Yang-Ming University Genome Research Center
,
Taipei
,
Taiwan
3
Current address: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus
,
Hinxton, Cambridge CB10 1SD
,
UK
4
Department of Teaching and Research, Taipei City Hospital
,
Taipei
,
Taiwan
Background: The identification of specific gene expression signature for distinguishing sample groups is a dominant field in cancer research. Although a number of tools have been developed to identify optimal gene expression signatures, the number of signature genes obtained is often overly large to be applied clinically. Furthermore, experimental verification is sometimes limited by the availability of wet-lab materials such as antibodies and reagents. A tool to evaluate the discrimination power of candidate genes is therefore in high demand by clinical researchers. Results: Signature Evaluation Tool (SET) is a Java-based tool adopting the Golub's weighted voting algorithm as well as incorporating the visual presentation of prediction strength for each array sample. SET provides a flexible and easy-to-follow platform to evaluate the discrimination power of a gene signature. Here, we demonstrated the application of SET for several purposes: (1) for signatures consisting of a large number of genes, SET offers the ability to rapidly narrow down the number of genes; (2) for a given signature (from third party analyses or user-defined), SET can reevaluate and re-adjust its discrimination power by selecting/de-selecting genes repeatedly; (3) for multiple microarray datasets, SET can evaluate the classification capability of a signature among datasets; and (4) by providing a module to visualize the prediction strength for each sample, SET allows users to re-evaluate the discrimination power on mis-grouped or less-certain samples. Information obtained from the above applications could be useful in prognostic analyses or clinical management decisions. Conclusion: Here we present SET to evaluate and visualize the sample-discrimination ability of a given gene expression signature. This tool provides a filtration function for signature identification and lies between clinical analyses and class prediction (or feature selection) tools. The simplicity, flexibility and brevity of SET could make it an invaluable tool for marker identification in clinical research.
-
Background
Gene expression profiling based on microarray
technology has been applied widely on monitoring global
transcriptome changes in biological samples. In cancer
research, one of the major microarray applications is to
identify genes, or features, whose expression patterns can
discriminate samples with distinct states (usually defined
by the phenotype of samples such as primary or metastatic
tumour). These identified genes form an expression
signature that can be used to assist clinical management
decisions such as clinical trail risk assessment, treatment
selection, or cancer prognosis [1-5].
To acquire a good expression signature, supervised
methods are more appropriate than unsupervised approaches.
Basically, a supervised prediction method consists of three
common processes: 1) feature selection, 2) computation
of weights for selected features, 3) creation of a prediction
rule [6]. By using the cross-validation method such as
nfold or leave-one-out cross-validation (LOOCV), the
discrimination capability of a signature can be evaluated.
Recently, many classification algorithms (such as SVM,
evolutionary algorithm and I-RELIEF) combining
crossvalidation and heuristic searching to acquire an optimal
expression signature have been proposed [7-9].
Furthermore, those algorithms have been incorporated into
hassle-free tools to aid the acquisition of an optimal
signature. For example, M@CBETH [10] is a web-based
tool aimed at finding the best prediction among different
classification methods. Prophet [11], another web-based
tool, can automatically build classifiers using a strategy
that renders unbiased cross-validated errors. The class
prediction modules in GenePattern [12] also supports several
supervised learning methods. Moreover, for improving
the efficiency and the accuracy of an acquired signature,
several feature selection tools based on statistical analysis
have been developed: RankGene is a feature selection
suite based on statistical ranking analyses [13], HykGene
[14] and mRMR [15] are tools to minimise redundancy of
genes.
Although the aforementioned feature selection and
classification tools are quite useful for acquiring an optimal
signature, a tool assisting signature evaluation is still in high
demand. In (...truncated)