Functional assessment of time course microarray data (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-10-S6-S9.pdf

Functional assessment of time course microarray data

Mara Jos Nueda 2 Patricia Sebastin 1 Sonia Tarazona 0 Francisco Garca- Garca 1 Joaqun Dopazo 1 3 4 Alberto Ferrer 0 Ana Conesa 1 0 Department of Applied Statistics and Operations Research, Universidad Politecnica of Valencia , Cno. vera s/n, Edifico 7A, 46022 Valencia , Spain 1 Bioinformatics and Genomics Department, Centro de Investigaciones Principe Felipe , Avda. Autopista Saler 16, 46012 Valencia , Spain 2 Department of Statistics and Operation Research, University of Alicante, Ctra. San Vicente del Raspeig , S/N 03690 Alicante , Spain 3 CIBER de Enfermedades Raras (CIBERER) , ISCIII , Spain 4 Functional Genomics Node (INB), Centro de Investigacion Principe Felipe (CIPF) , Valencia , Spain Motivation: Time-course microarray experiments study the progress of gene expression along time across one or several experimental conditions. Most developed analysis methods focus on the clustering or the differential expression analysis of genes and do not integrate functional information. The assessment of the functional aspects of time-course transcriptomics data requires the use of approaches that exploit the activation dynamics of the functional categories to where genes are annotated. Methods: We present three novel methodologies for the functional assessment of time-course microarray data. i) maSigFun derives from the maSigPro method, a regression-based strategy to model time-dependent expression patterns and identify genes with differences across series. maSigFun fits a regression model for groups of genes labeled by a functional class and selects those categories which have a significant model. ii) PCA-maSigFun fits a PCA model of each functional class-defined expression matrix to extract orthogonal patterns of expression change, which are then assessed for their fit to a time-dependent regression model. iii) ASCA-functional uses the ASCA model to rank genes according to their correlation to principal time expression patterns and assess functional enrichment on a GSA fashion. We used simulated and experimental datasets to study these novel approaches. Results were compared to alternative methodologies. Results: Synthetic and experimental data showed that the different methods are able to capture different aspects of the relationship between genes, functions and co-expression that are biologically meaningful. The methods should not be considered as competitive but they provide different insights into the molecular and functional dynamic events taking place within the biological system under study. - from European Molecular Biology Network (EMBnet) Conference 2008: 20th Anniversary Celebration Martina Franca, Italy. 1820 September 2008 Background Microarray time-course experiments have gained popularity in recent years to address the study of biological phenomena where the dynamics of gene expression is of relevance. In contrast to classical control-case studies, where basically two conditions are compared, time series experiments encompass investigations of diverse nature and complexity. Studies may relate to developmental processes with a large number of sampling points (e.g. [1]), or to stimuli-response experiments where transcriptome changes are assessed in a short time span and may include multiple treatments (e.g. [2]), or may try to capture cyclic variations of gene expression (e.g. [3]). Moreover, samples might be destroyed by the sampling process or be taken from the same individuals along the time component. This results in microarray time-course data being classified as either short (up to 56 time points) or long (from 67 time points) series, single (one treatment or tissue) or multiple (more than one treatment or tissue) series, and longitudinal vs. independent depending if samples are blocked by an individual effect or are not related. A significant number of statistical methods have been published as microarray time-course experiments that have been expanded to address the analysis of this type of data. Many of the developed algorithms consider the clustering of serial data. Proposed strategies include the use of Gaussian mixed models [4], Bayesian models [5], Hidden Markov Models [6], B-splines [7,8], and Fourier Series [9] to model and cluster long series data, while more ad-hoc algorithms have been developed for short series [10,11]. Another important block of methodologies are those that pursue the identification of time-associated differentially expressed genes (d.e.g.'s). In this category we find back the B-spline approach [7,12] a multivariate adaptation of the empirical Bayes test [13] to specifically treat longitudinal data [14] and some ANOVA and regression-based models for short series [15-18]. Finally, Conesa and co-workers presented two methods well suited to independent, multiple series data based either on step-wise regression or singular component analysis [19,20]. In all of these approaches statistical analysis focused on modeling gene expression and identifying those genes with a relevant variation pattern. This orientation, though valid and useful, solves only one (frequently the first) requirement to understand transcriptomics changes from any kind of microarray experiment. In most cases, the analysis proceeds through the identification of cellular processes and functions which are represented by the gene selection, i.e. genes are identified by their functional role and the question is then which functional modifications can be derived from the observed gene regulation. The incorporation of functional information into data analysis is normally obtained by the use of functional annotation databases that define and assign function labels to known genes. The most widely used functional annotation scheme is the Gene Ontology (GO) [21], which characterizes genes for their molecular functions (MF), cellular locations (CC) and involved biological processes (BP), but others such as the KEGG metabolic pathways [22], transcription factor targets [23] or Interpro functional motifs [24] can also be employed for specific biological questions. This functional assessment aspect is traditionally handled in microarray data analysis via the so-called enrichment analysis: the list of significant genes is interrogated for over (and/or under) abundance, as compared to the entire genome represented in the array of the considered functional categories. In time-course microarray data, this strategy could be similarly followed for the set of time-dependent differentially expressed genes (for example, as provided in the time course module of the GEPAS suite, [25]), or for the distinct clusters into which this gene selection can be divided (available in STEM package, [26]). As a matter of fact, gene enrichment analysis is very often used to validate the results of a gene selection or a clustering strategy [27,28]. This strategy for the functional evaluation of differential gene expression has a number of limitations [29]. Firstly, the functional enrich (...truncated)