Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets
Altman DG (2008) Key issues in
conducting a meta-analysis of gene expression microarray datasets. PLoS Med 5(9):
e184. doi:10.1371/journal.pmed.0050184
Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets
Adaikalavan Ramasamy 0 1
Adrian Mondry 0 1
Chris C. Holmes 0 1
Douglas G. Altman 0 1
0 Adaikalavan Ramasamy and Douglas G. Altman are with the Centre for Statistics in Medicine, University of Oxford , Oxford , United Kingdom. Adaikalavan Ramasamy and Chris C. Holmes are with the Department of Statistics, University of Oxford , Oxford , United Kingdom. Adrian Mondry is with Imperial College Healthcare NHS Trust , London , United Kingdom
1 Funding: AR and DGA are funded by Cancer Research UK. AM is supported by Imperial College Healthcare NHS Trust. CCH is partly supported by the UK Medical Research Council and the University of Oxford. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript
Summary Points have led to the generation of many highly complex datasets that often try to address similar biological questions. from independent but related studies, is a relatively inexpensive option that has the potential to increase both the statistical power and generalizability of single-study analysis. general, is desirable, and is much enhanced when raw data are available. in conducting meta-analysis of microarray datasets: (1) Identify suitable microarray studies; (2) Extract the data from studies; (3) Prepare the individual datasets; (4) Annotate the individual datasets; (5) Resolve the many-to-many relationship between probes and genes; (6) Combine the study-specific estimates; (7) Analyze, present, and interpret results. reviewing such a meta-analysis. of high-throughput biological data analysis.
-
Mof tens of thousands of genes in tissue samples
icroarray technology measures the mRNA levels
simultaneously in a high-throughput and
costeffective manner. Since its introduction over a decade ago [1],
it has found widespread use in the fields of molecular genetics
and functional genomics. It has been applied in order to
understand underlying biological mechanisms [2], to discover
novel subgroups of diseases [35], to examine drug response
[6,7], to classify patients into disease groups [3], and to
predict disease outcomes [810]. Some molecular signatures
discovered with microarray technology are now being
evaluated in prospective randomized clinical trials [11,12].
Despite their great promise, microarray-based studies may
report findings that are not reproducible [13] or not robust
to the mildest of data perturbations [14,15]. Common causes
include improper analysis or validation, insufficient control of
false positives, and inadequate reporting of methods [16,17].
The situation is exacerbated by the small sample sizes relative
to large numbers of potential predictors; typically tens of
thousands of probes are investigated in only tens or hundreds
of biological samples.
Generalizability across studies [18] also needs to be
assessed before considering widespread practical application.
For example, the findings of a study using historical controls
from a particular geographical region may not be applicable
to newer cohorts of patients or different regions.
Combining information from multiple existing studies can
increase the reliability and generalizability of results. The use
of statistical techniques to combine results from independent
but related studies is called meta-analysis. However,
the term meta-analysis is also widely used to describe the
whole study process (as we do here), not just the statistical
techniques, for which an alternative term is a systematic
review. Through meta-analysis, we can increase the statistical
power to obtain a more precise estimate of gene expression
differentials, and assess the heterogeneity of the overall
estimate. Meta-analysis is relatively inexpensive, since it makes
comprehensive use of already available data.
Indeed, the advantages of meta-analysis of gene expression
microarray datasets have not gone unnoticed by researchers
in various fields [1928]. Several meta-analysis techniques
have been proposed in the context of microarrays
[19,22,2940]. However, no comprehensive framework exists
on how to carry out a meta-analysis of microarray datasets.
There is a considerable literature to guide the whole review
process, including statistical methods for clinical trials and
epidemiological studies [4143]. As yet, however, there is
little guidance for conducting a meta-analysis of microarray
curation. We discuss the sixth issuechoosing a meta-analysis
techniqueusing the two-class comparison as an example.
The seventh issue of analyzing, presenting, and interpreting
data is discussed briefly using an illustrative meta-analysis of
25 datasets. We provide a practical checklist, shown in Table
1, that should enable the reader to make informed decisions
on how to conduct a meta-analysis, and to understand better
the underlying concepts that make this approach so attractive
for analysis of microarray data.
Issue 1: Identify Suitable Microarray Datasets
The first step in any research project is to clearly define the
objectives (Step 1). Meta-analysis could be used to identify
genes expressed differentially between two groups [19,22,29,3
0,32,33,35,37,38,40], to robustify cross-platform classification
[34], to identify overlaps between samples from heterologous
datasets [30], to identify co-expressed genes, or to reconstruct
gene networks [31,36,39].
Having a detailed review protocol can further help to
clarify the research objectives and methods and to minimize
bias from unplanned data-driven analysis. We suggest
developing the review protocol by outlining the solutions
to the steps in the checklist shown in Table 1. For example,
Step 7 (Check the selected study against inclusion-exclusion
criteria) might be expanded in the review protocol as follows:
Two reviewers will check the eligibility of the identified
studies, with disagreements resolved by a third reviewer. A
log of excluded studies, with reasons for exclusions, will be
maintained. The protocol can be turned into a useful project
management tool by incorporating timelines and division of
labor.
The inclusion-exclusion criteria (Step 2) are eligibility
criteria for studies that will help achieve the stated objectives.
These criteria could be biological (e.g., specific disease, type
of outcome, type of tissues) or technical (e.g., density of array,
minimum number of arrays). The retrieved articles must be
evaluated as to whether they met the inclusion criteria.
Once the inclusion-exclusion criteria have been defined,
one needs to perform a comprehensive literature search
(Step 3) to identify suitable studies, usually based on
Identify suitable microarray studies (Issue 1)
1 Formulate objectives and a review protocol.
2 Define inclusion-exclusion criteria and suitable keywords. (...truncated)