Identification of gene expression patterns using planned linear contrasts

BMC Bioinformatics, May 2006

Background In gene networks, the timing of significant changes in the expression level of each gene may be the most critical information in time course expression profiles. With the same timing of the initial change, genes which share similar patterns of expression for any number of sampling intervals from the beginning should be considered co-expressed at certain level(s) in the gene networks. In addition, multiple testing problems are complicated in experiments with multi-level treatments when thousands of genes are involved. Results To address these issues, we first performed an ANOVA F test to identify significantly regulated genes. The Benjamini and Hochberg (BH) procedure of controlling false discovery rate (FDR) at 5% was applied to the P values of the F test. We then categorized the genes with a significant F test into 4 classes based on the timing of their initial responses by sequentially testing a complete set of orthogonal contrasts, the reverse Helmert series. For genes within each class, specific sequences of contrasts were performed to characterize their general 'fluctuation' shapes of expression along the subsequent sampling time points. To be consistent with the BH procedure, each contrast was examined using a stepwise Studentized Maximum Modulus test to control the gene based maximum family-wise error rate (MFWER) at the level of αnew determined by the BH procedure. We demonstrated our method on the analysis of microarray data from murine olfactory sensory epithelia at five different time points after target ablation. Conclusion In this manuscript, we used planned linear contrasts to analyze time-course microarray experiments. This analysis allowed us to characterize gene expression patterns based on the temporal order in the data, the timing of a gene's initial response, and the general shapes of gene expression patterns along the subsequent sampling time points. Our method is particularly suitable for analysis of microarray experiments in which it is often difficult to take sufficiently frequent measurements and/or the sampling intervals are non-uniform.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/1471-2105-7-245.pdf

Identification of gene expression patterns using planned linear contrasts

Hao Li 2 Constance L Wood 2 Yushu Liu 2 Thomas V Getchell 1 3 Marilyn L Getchell 0 3 Arnold J Stromberg 2 0 Department of Anatomy and Neurobiology, College of Medicine , Lexington, KY40536-0298 , USA 1 Department of Physiology, College of Medicine , Lexington, KY40536-0298 , USA 2 Department of Statistics, University of Kentucky , 817 Patterson Office Tower, Lexington, KY40536-0027 , USA 3 309 Sanders-Brown Center on Aging, University of Kentucky Medical Center , Lexington, KY40536-0230 , USA Background: In gene networks, the timing of significant changes in the expression level of each gene may be the most critical information in time course expression profiles. With the same timing of the initial change, genes which share similar patterns of expression for any number of sampling intervals from the beginning should be considered co-expressed at certain level(s) in the gene networks. In addition, multiple testing problems are complicated in experiments with multi-level treatments when thousands of genes are involved. Results: To address these issues, we first performed an ANOVA F test to identify significantly regulated genes. The Benjamini and Hochberg (BH) procedure of controlling false discovery rate (FDR) at 5% was applied to the P values of the F test. We then categorized the genes with a significant F test into 4 classes based on the timing of their initial responses by sequentially testing a complete set of orthogonal contrasts, the reverse Helmert series. For genes within each class, specific sequences of contrasts were performed to characterize their general 'fluctuation' shapes of expression along the subsequent sampling time points. To be consistent with the BH procedure, each contrast was examined using a stepwise Studentized Maximum Modulus test to control the gene based maximum family-wise error rate (MFWER) at the level of new determined by the BH procedure. We demonstrated our method on the analysis of microarray data from murine olfactory sensory epithelia at five different time points after target ablation. Conclusion: In this manuscript, we used planned linear contrasts to analyze time-course microarray experiments. This analysis allowed us to characterize gene expression patterns based on the temporal order in the data, the timing of a gene's initial response, and the general shapes of gene expression patterns along the subsequent sampling time points. Our method is particularly suitable for analysis of microarray experiments in which it is often difficult to take sufficiently frequent measurements and/or the sampling intervals are non-uniform. - Background Recent advances in DNA microarray technologies have made it possible to investigate the transcriptional portion of gene networks in a variety of organisms. When microarray experiments are performed to monitor gene expression over time, researchers can address questions concerning the detection of the cellular processes underlying the observed regulatory effects, inference of regulatory networks and, ultimately, assignment of functions to the genes analyzed in the time courses. There is a natural connection between gene function and gene expression. Based on our understanding of cellular processes, genes that are contained in a particular pathway, or respond to a common internal or external stimulus, should be co-regulated and consequently, should show similar patterns of expression. Therefore, identifying patterns of gene expression and grouping genes into expression classes may provide much greater insight into their biological functions. A large group of statistical methods, generally referred to as "cluster analysis", have been developed to identify genes that behave similarly across a range of experimental conditions, including time courses. These statistical algorithms can be divided into two classes, depending on whether they are based on 'similarity' measures or not. Methods based on 'similarity' measures rely on defining a distance (or 'dissimilarity') between gene expression vectors; Euclidean distance and/ or the Pearson correlation coefficient are the two most commonly used distance measures. Examples of similarity measures-based methods are hierarchical clustering [1], k-means [2], self-organization maps (SOM) [3,4], and support vector machine (SVM) [5]. These methods do not consider the temporal structure of the data when used to analyze time-course experiments. In addition, some methods could confuse the clusters because the actual expression patterns of the genes themselves become less relevant as clusters grow in size [6]. The clustering methods in the second class are based on statistical models, without defining a 'similarity' measure. Using statistical models to represent clusters changes the question from how close two data points are to how likely a given data point is under the model. Such clustering methods are more commonly used to analyze time-course microarray experiments. Examples of such methods ar (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2105-7-245.pdf

Hao Li, Constance L Wood, Yushu Liu, Thomas V Getchell, Marilyn L Getchell, Arnold J Stromberg. Identification of gene expression patterns using planned linear contrasts, BMC Bioinformatics, 2006, pp. 245, 7, DOI: 10.1186/1471-2105-7-245