Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets

PLoS Medicine, Sep 2008

Adaikalavan Ramasamy and colleagues outline seven key issues and suggest a stepwise approach in conducting a meta-analysis of microarray datasets.

Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets

Altman DG (2008) Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med 5(9): e184. doi:10.1371/journal.pmed.0050184 Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets Adaikalavan Ramasamy 0 1 Adrian Mondry 0 1 Chris C. Holmes 0 1 Douglas G. Altman 0 1 0 Adaikalavan Ramasamy and Douglas G. Altman are with the Centre for Statistics in Medicine, University of Oxford , Oxford , United Kingdom. Adaikalavan Ramasamy and Chris C. Holmes are with the Department of Statistics, University of Oxford , Oxford , United Kingdom. Adrian Mondry is with Imperial College Healthcare NHS Trust , London , United Kingdom 1 Funding: AR and DGA are funded by Cancer Research UK. AM is supported by Imperial College Healthcare NHS Trust. CCH is partly supported by the UK Medical Research Council and the University of Oxford. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript Summary Points have led to the generation of many highly complex datasets that often try to address similar biological questions. from independent but related studies, is a relatively inexpensive option that has the potential to increase both the statistical power and generalizability of single-study analysis. general, is desirable, and is much enhanced when raw data are available. in conducting meta-analysis of microarray datasets: (1) Identify suitable microarray studies; (2) Extract the data from studies; (3) Prepare the individual datasets; (4) Annotate the individual datasets; (5) Resolve the many-to-many relationship between probes and genes; (6) Combine the study-specific estimates; (7) Analyze, present, and interpret results. reviewing such a meta-analysis. of high-throughput biological data analysis. - Mof tens of thousands of genes in tissue samples icroarray technology measures the mRNA levels simultaneously in a high-throughput and costeffective manner. Since its introduction over a decade ago [1], it has found widespread use in the fields of molecular genetics and functional genomics. It has been applied in order to understand underlying biological mechanisms [2], to discover novel subgroups of diseases [35], to examine drug response [6,7], to classify patients into disease groups [3], and to predict disease outcomes [810]. Some molecular signatures discovered with microarray technology are now being evaluated in prospective randomized clinical trials [11,12]. Despite their great promise, microarray-based studies may report findings that are not reproducible [13] or not robust to the mildest of data perturbations [14,15]. Common causes include improper analysis or validation, insufficient control of false positives, and inadequate reporting of methods [16,17]. The situation is exacerbated by the small sample sizes relative to large numbers of potential predictors; typically tens of thousands of probes are investigated in only tens or hundreds of biological samples. Generalizability across studies [18] also needs to be assessed before considering widespread practical application. For example, the findings of a study using historical controls from a particular geographical region may not be applicable to newer cohorts of patients or different regions. Combining information from multiple existing studies can increase the reliability and generalizability of results. The use of statistical techniques to combine results from independent but related studies is called meta-analysis. However, the term meta-analysis is also widely used to describe the whole study process (as we do here), not just the statistical techniques, for which an alternative term is a systematic review. Through meta-analysis, we can increase the statistical power to obtain a more precise estimate of gene expression differentials, and assess the heterogeneity of the overall estimate. Meta-analysis is relatively inexpensive, since it makes comprehensive use of already available data. Indeed, the advantages of meta-analysis of gene expression microarray datasets have not gone unnoticed by researchers in various fields [1928]. Several meta-analysis techniques have been proposed in the context of microarrays [19,22,2940]. However, no comprehensive framework exists on how to carry out a meta-analysis of microarray datasets. There is a considerable literature to guide the whole review process, including statistical methods for clinical trials and epidemiological studies [4143]. As yet, however, there is little guidance for conducting a meta-analysis of microarray curation. We discuss the sixth issuechoosing a meta-analysis techniqueusing the two-class comparison as an example. The seventh issue of analyzing, presenting, and interpreting data is discussed briefly using an illustrative meta-analysis of 25 datasets. We provide a practical checklist, shown in Table 1, that should enable the reader to make informed decisions on how to conduct a meta-analysis, and to understand better the underlying concepts that make this approach so attractive for analysis of microarray data. Issue 1: Identify Suitable Microarray Datasets The first step in any research project is to clearly define the objectives (Step 1). Meta-analysis could be used to identify genes expressed differentially between two groups [19,22,29,3 0,32,33,35,37,38,40], to robustify cross-platform classification [34], to identify overlaps between samples from heterologous datasets [30], to identify co-expressed genes, or to reconstruct gene networks [31,36,39]. Having a detailed review protocol can further help to clarify the research objectives and methods and to minimize bias from unplanned data-driven analysis. We suggest developing the review protocol by outlining the solutions to the steps in the checklist shown in Table 1. For example, Step 7 (Check the selected study against inclusion-exclusion criteria) might be expanded in the review protocol as follows: Two reviewers will check the eligibility of the identified studies, with disagreements resolved by a third reviewer. A log of excluded studies, with reasons for exclusions, will be maintained. The protocol can be turned into a useful project management tool by incorporating timelines and division of labor. The inclusion-exclusion criteria (Step 2) are eligibility criteria for studies that will help achieve the stated objectives. These criteria could be biological (e.g., specific disease, type of outcome, type of tissues) or technical (e.g., density of array, minimum number of arrays). The retrieved articles must be evaluated as to whether they met the inclusion criteria. Once the inclusion-exclusion criteria have been defined, one needs to perform a comprehensive literature search (Step 3) to identify suitable studies, usually based on Identify suitable microarray studies (Issue 1) 1 Formulate objectives and a review protocol. 2 Define inclusion-exclusion criteria and suitable keywords. (...truncated)


This is a preview of a remote PDF: http://www.plosmedicine.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pmed.0050184&representation=PDF
Article home page: http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0050184

Adaikalavan Ramasamy, Adrian Mondry, Chris C Holmes, Douglas G Altman. Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets, PLoS Medicine, 2008, Volume 5, Issue 9, DOI: 10.1371/journal.pmed.0050184