SIGNATURE: A workbench for gene expression signature analysis

BMC Bioinformatics, Dec 2011

Background The biological phenotype of a cell, such as a characteristic visual image or behavior, reflects activities derived from the expression of collections of genes. As such, an ability to measure the expression of these genes provides an opportunity to develop more precise and varied sets of phenotypes. However, to use this approach requires computational methods that are difficult to implement and apply, and thus there is a critical need for intelligent software tools that can reduce the technical burden of the analysis. Tools for gene expression analyses are unusually difficult to implement in a user-friendly way because their application requires a combination of biological data curation, statistical computational methods, and database expertise. Results We have developed SIGNATURE, a web-based resource that simplifies gene expression signature analysis by providing software, data, and protocols to perform the analysis successfully. This resource uses Bayesian methods for processing gene expression data coupled with a curated database of gene expression signatures, all carried out within a GenePattern web interface for easy use and access. Conclusions SIGNATURE is available for public use at http://genepattern.genome.duke.edu/signature/.

SIGNATURE: A workbench for gene expression signature analysis

BMC Bioinformatics SIGNATURE: A workbench for gene expression signature analysis Jeffrey T Chang 0 Michael L Gatza 2 Joseph E Lucas 2 William T Barry 1 2 Peyton Vaughn 2 Joseph R Nevins 2 3 0 Department of Integrative Biology and Pharmacology University of Texas Health Science Center at Houston , Houston TX , USA 1 Department of Biostatistics and Bioinformatics Duke University Medical Center , Durham NC , USA 2 Institute for Genome Sciences and Policy Duke University and Duke University Medical Center , Durham NC , USA 3 Department of Molecular Genetics and Microbiology Duke University Medical Center , Durham NC , USA Background: The biological phenotype of a cell, such as a characteristic visual image or behavior, reflects activities derived from the expression of collections of genes. As such, an ability to measure the expression of these genes provides an opportunity to develop more precise and varied sets of phenotypes. However, to use this approach requires computational methods that are difficult to implement and apply, and thus there is a critical need for intelligent software tools that can reduce the technical burden of the analysis. Tools for gene expression analyses are unusually difficult to implement in a user-friendly way because their application requires a combination of biological data curation, statistical computational methods, and database expertise. Results: We have developed SIGNATURE, a web-based resource that simplifies gene expression signature analysis by providing software, data, and protocols to perform the analysis successfully. This resource uses Bayesian methods for processing gene expression data coupled with a curated database of gene expression signatures, all carried out within a GenePattern web interface for easy use and access. Conclusions: SIGNATURE is available for public use at http://genepattern.genome.duke.edu/signature/. - Background Gene expression signatures are powerful tools that can reveal a range of biologically and clinically important characteristics of biological samples. In recent years, signatures have been developed that can differentiate distinct subtypes of tumors, identify important cellular responses to their environment (hypoxia), predict clinical outcomes in cancer, and model the activation of signaling pathways [1]. The power of gene expression signatures derives from their ability to connect an in vitro experimental state with an in vivo one in a quantitative manner. Commonly, the term gene expression signature has been used in two ways. In one, the signature is comprised of a set of genes that share a common pattern of expression. Sometimes this can be reported as genes that increase or decrease in expression, but the basic characteristic of the signature is the identity of the genes. Because of this, these signatures are often called gene sets. Gene sets have been curated from the literature and collected into databases such as MSigDB and GeneSigDB [2,3]. Tools have been developed that can analyze gene sets by looking for shared function or characteristics such as Gene Ontology terms [4] or drug sensitivity [5]. Another tool, single-sample GSEA has been previously applied to predict the co-regulation of gene sets from MSigDB on gene expression samples [6]. Evidence of co-regulation is then used to infer the activation of the phenotype embodied by the gene set. The second type of signature relates the magnitude of increase or decrease in gene expression, in the form of weighted values, to a biological phenotype using a quantitative predictive model [6-16]. These signatures are often developed from experimental conditions that precisely control the phenotype of interest - for instance, the activation of a cell signaling pathway or the response of cells to a defined stimulus. Since the signature is comprised of a quantitative measure of the expression levels of genes that define the phenotype, it allows a direct measurement of the phenotype, rather than an indirect inference through co-regulation of genes in a gene set. A limitation of this approach, however, is the complexity of the methods used to derive and analyze the signatures, making it difficult to apply without significant multidisciplinary expertise [17]. Three major obstacles hinder the broad use of signatures. First, gene expression signature analysis requires the rigorous application of complex statistical methodologies on gene expression data. Second, it requires the acquisition and validation of data that properly capture the biological state of interest. Third, it requires a computational infrastructure that makes available the statistical software and data in an easy to use interface. In sum, gene expression signature analysis requires a confluence of expertise across a range of disciplines, including statistics, biology, and computer science. While others have previously made use of our approach [16], it does require a level of expertise and computational infrastructure not always available in biological laboratories. This bioinformatic investigation, requiring the proper selection and application of statistical algorithms, as well as biological curation and validation of the signatures, can be daunting. Therefore, a challenge is how to develop software tools that enable such analyses for the general user. While it has long been recognized that software can target different types of users, a set of principles for software that is biologist-friendly was recently described [18]. In short, the recommendations are that the software 1) requires no knowledge of programming, 2) allows application of advanced methods, 3) can be used on different operating systems, and 4) provides a natural language description of the results. While such software has been developed for biological sequence alignment [19], sequence annotation [20], phylogenetic analysis [21], and comparison of prokaryotic genomes [22], no such platform exists for gene expression signature analysis. Because of this, and also because of the technical difficulty in performing gene expression analysis, we believe there is a need for a platform that captures a carefully refined analysis workflow, coupling algorithms and data, and enables a researcher to predict gene expression signatures on their samples. Implementation To address the critical need for a platform for gene expression signature analysis, we have developed a collection of tools over the course of several years. First, we have developed BinReg, a statistical algorithm to predict the activation of a gene expression signature on a data set [23,24]. Second, we have curated a database of signatures that predict the activation of oncogenic pathways [25]. Now, we report on the development of a computational platform that combines these in a biologist-friendly interface, using the principles previously established. Here we describe the three components of a novel gene expression signature analysis platform, which we colle (...truncated)


This is a preview of a remote PDF: https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-12-443?site=bmcbioinformatics.biomedcentral.com
Article home page: http://www.biomedcentral.com/1471-2105/12/443

Jeffrey T Chang, Joseph E Lucas, Joseph R Nevins, Michael L Gatza, Peyton Vaughn, William T Barry. SIGNATURE: A workbench for gene expression signature analysis, BMC Bioinformatics, 2011,