ProbFAST: Probabilistic Functional Analysis System Tool (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-11-161.pdf

ProbFAST: Probabilistic Functional Analysis System Tool

Israel T Silva 0 1 2 Ricardo ZN Vncio 0 2 Thiago YK Oliveira 0 1 2 Greice A Molfetta 0 1 2 Wilson A Silva Jr 0 1 2 0 Department of Genetics, Faculty of Medicine, University of Sao Paulo , Ribeirao Preto , Brazil 1 National Institute of Science and Technology in Stem Cell and Cell Therapy, Center for Cell Therapy and Regional Blood Center , Ribeirao Preto , Brazil 2 Department of Genetics, Faculty of Medicine, University of Sao Paulo , Ribeirao Preto , Brazil Background: The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results: We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions: ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http:// gdm.fmrp.usp.br/probfast. - Background Transcriptome analysis of a tissue or cell type has been widely used since the development of methodological approaches for the large-scale study of gene expression such as SAGE [1], MPSS [2], Microarray [3]. The nextgeneration sequencing technology has been adapted to transcriptome analysis and the ability to accurately measure mRNA signals must provide unprecedented impact on gene expression analysis [4,5]. Thus, it is accepted that high-throughput data represents the starting point to predict further our understanding of molecular disorders associated with the physiopathology of a given phenotype. The most classical application to the analysis of gene expression focuses on the identification of genes differentially expressed between two biological conditions. At this stage, a large number of statistical tests is used for a precise identification of candidate genes [6,7]. The network of biological processes involved in the evolution of a tumor or in tissue differentiation is extremely complex and requires the development of mathematical models for a simultaneous analysis of a set of genes in two or more biological conditions. Analyses of this nature are currently performed using standard methods designed for paired analyses. Thus, it is highly necessary to develop methods for analysis of multiple expression of a gene. We shall define the approach in the current study as Multiple Differential Expression (MDE). An example of the application of MDE approach may be illustrated by the following question: what genes have shown an increasing level of expression in three libraries (A, B and C) representing the stages (evolution) of a tumor? To answer this question, the usual procedure analyses couples of libraries separately and makes conjunctions or disjunctions of the relations found, e.g. A > B AND B > C. In fact, this analysis is traditionally used to select any g gene with an expression profiles such as Ag > Bg > Cg. In this type of paired analysis, the main problem is the sensitivity and specificity of statistical tests used to detect what genes are differentially expressed [8]. These statistical measures are closely related to the concepts of type I and type II errors and they are potentiated when more than two biological conditions are analyzed simultaneously. To address this shortfall, we introduced a Bayesian model to compute the generalization of the pairwise comparisons in order to perform MDE analysis. It is a new probabilistic method for targeted gene selection on two or more classes through an intuitive approach involving a question formulation process, and a probability linked to it. In summary, all genes in accordance with the previously formulated question will be ordered on the basis of the probability that the question is true. We presented a web-based system named Probabilistic Functional Analysis System Tool (ProbFAST) that permits suitable MDE analysis on a global scale. This tool differs from others [8-11] by permitting the investigator to analyze the global gene expression in different biological conditions using private and/or public data, integrating it into a set of functional pieces of information including Gene Ontology [12], KEGG [13] and Biocarta [14]. Within this context, the tool becomes useful for the disclosure of genes related to biological processes that are active during the cell differentiation and growth, as well as during organogenesis. ProbFAST is designed primarily for sequencing-based data, including data from next-generation sequencing technology. Implementation Design functionality ProbFAST is a tool which uses the client-server architecture [Additional file 1: Supplemental Figure 1]. The backend consists of a set of MySQL [15] relational tables that store functional information extracted from the KEGG, BioCarta and Gene Ontology repositories. Furthermore, all the expression data of Gene Expression Omnibus (GEO) [16] generated by the counting technique are stored, including 1,800 SAGE and MPSS libraries of approximately 40 species. All databases are monthly updated, ensuring the access to the most recent information. The server side is composed of three main interfaces that enable remote use with convenient data uploading and result visualization features. The analysis starts with a friendly interface for the inclusion of the project name and parameters to the preprocessing and upload of libraries (Figure 1). In the upload process, two options are available: 1) import data from GEO: a search interface allows displaying a list of expression profile experiments related to organism and keywords filter, and 2) the upload option to analyze a new experiment that is not included in the GEO database. To do that, the user needs to submit a file with a predefined format (detailed information on file format is available at the help page). The file may be uploaded compressed in gz, zip, or rar format. The gene identifiers supported by ProbFAST include NCBI ID, gene symbol, tag sequence or Unigene accession. After the submission, users must formulate question(s) by a comprehensive frame box and define the parameters for enrich (...truncated)