Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures

BMC Bioinformatics, Jan 2008

Background The identification of specific gene expression signature for distinguishing sample groups is a dominant field in cancer research. Although a number of tools have been developed to identify optimal gene expression signatures, the number of signature genes obtained is often overly large to be applied clinically. Furthermore, experimental verification is sometimes limited by the availability of wet-lab materials such as antibodies and reagents. A tool to evaluate the discrimination power of candidate genes is therefore in high demand by clinical researchers. Results Signature Evaluation Tool (SET) is a Java-based tool adopting the Golub's weighted voting algorithm as well as incorporating the visual presentation of prediction strength for each array sample. SET provides a flexible and easy-to-follow platform to evaluate the discrimination power of a gene signature. Here, we demonstrated the application of SET for several purposes: (1) for signatures consisting of a large number of genes, SET offers the ability to rapidly narrow down the number of genes; (2) for a given signature (from third party analyses or user-defined), SET can re-evaluate and re-adjust its discrimination power by selecting/de-selecting genes repeatedly; (3) for multiple microarray datasets, SET can evaluate the classification capability of a signature among datasets; and (4) by providing a module to visualize the prediction strength for each sample, SET allows users to re-evaluate the discrimination power on mis-grouped or less-certain samples. Information obtained from the above applications could be useful in prognostic analyses or clinical management decisions. Conclusion Here we present SET to evaluate and visualize the sample-discrimination ability of a given gene expression signature. This tool provides a filtration function for signature identification and lies between clinical analyses and class prediction (or feature selection) tools. The simplicity, flexibility and brevity of SET could make it an invaluable tool for marker identification in clinical research.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/1471-2105-9-58.pdf

Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures

Chih-Hung Jen 2 Tsun-Po Yang 1 3 Chien-Yi Tung 1 Shu-Han Su 1 Chi- Hung Lin 1 2 4 Ming-Ta Hsu 0 2 Hsei-Wei Wang 1 2 4 0 Institute of Biochemistry and Molecular Biology, National Yang-Ming University , Taipei , Taiwan 1 Institute of Microbiology and Immunology, National Yang-Ming University , Taipei , Taiwan 2 Microarray & Gene Expression Analysis Core Facility, VGH National Yang-Ming University Genome Research Center , Taipei , Taiwan 3 Current address: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , UK 4 Department of Teaching and Research, Taipei City Hospital , Taipei , Taiwan Background: The identification of specific gene expression signature for distinguishing sample groups is a dominant field in cancer research. Although a number of tools have been developed to identify optimal gene expression signatures, the number of signature genes obtained is often overly large to be applied clinically. Furthermore, experimental verification is sometimes limited by the availability of wet-lab materials such as antibodies and reagents. A tool to evaluate the discrimination power of candidate genes is therefore in high demand by clinical researchers. Results: Signature Evaluation Tool (SET) is a Java-based tool adopting the Golub's weighted voting algorithm as well as incorporating the visual presentation of prediction strength for each array sample. SET provides a flexible and easy-to-follow platform to evaluate the discrimination power of a gene signature. Here, we demonstrated the application of SET for several purposes: (1) for signatures consisting of a large number of genes, SET offers the ability to rapidly narrow down the number of genes; (2) for a given signature (from third party analyses or user-defined), SET can reevaluate and re-adjust its discrimination power by selecting/de-selecting genes repeatedly; (3) for multiple microarray datasets, SET can evaluate the classification capability of a signature among datasets; and (4) by providing a module to visualize the prediction strength for each sample, SET allows users to re-evaluate the discrimination power on mis-grouped or less-certain samples. Information obtained from the above applications could be useful in prognostic analyses or clinical management decisions. Conclusion: Here we present SET to evaluate and visualize the sample-discrimination ability of a given gene expression signature. This tool provides a filtration function for signature identification and lies between clinical analyses and class prediction (or feature selection) tools. The simplicity, flexibility and brevity of SET could make it an invaluable tool for marker identification in clinical research. - Background Gene expression profiling based on microarray technology has been applied widely on monitoring global transcriptome changes in biological samples. In cancer research, one of the major microarray applications is to identify genes, or features, whose expression patterns can discriminate samples with distinct states (usually defined by the phenotype of samples such as primary or metastatic tumour). These identified genes form an expression signature that can be used to assist clinical management decisions such as clinical trail risk assessment, treatment selection, or cancer prognosis [1-5]. To acquire a good expression signature, supervised methods are more appropriate than unsupervised approaches. Basically, a supervised prediction method consists of three common processes: 1) feature selection, 2) computation of weights for selected features, 3) creation of a prediction rule [6]. By using the cross-validation method such as nfold or leave-one-out cross-validation (LOOCV), the discrimination capability of a signature can be evaluated. Recently, many classification algorithms (such as SVM, evolutionary algorithm and I-RELIEF) combining crossvalidation and heuristic searching to acquire an optimal expression signature have been proposed [7-9]. Furthermore, those algorithms have been incorporated into hassle-free tools to aid the acquisition of an optimal signature. For example, M@CBETH [10] is a web-based tool aimed at finding the best prediction among different classification methods. Prophet [11], another web-based tool, can automatically build classifiers using a strategy that renders unbiased cross-validated errors. The class prediction modules in GenePattern [12] also supports several supervised learning methods. Moreover, for improving the efficiency and the accuracy of an acquired signature, several feature selection tools based on statistical analysis have been developed: RankGene is a feature selection suite based on statistical ranking analyses [13], HykGene [14] and mRMR [15] are tools to minimise redundancy of genes. Although the aforementioned feature selection and classification tools are quite useful for acquiring an optimal signature, a tool assisting signature evaluation is still in high demand. In (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2105-9-58.pdf

Chih-Hung Jen, Tsun-Po Yang, Chien-Yi Tung, Shu-Han Su, Chi-Hung Lin, Ming-Ta Hsu, Hsei-Wei Wang. Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures, BMC Bioinformatics, 2008, pp. 58, 9, DOI: 10.1186/1471-2105-9-58