Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification
Richard Simon
0
1
2
Michael D. Radmacher
0
1
2
Kevin Dobbin
0
1
2
Lisa M. McShane
0
1
2
0
Journal of the National Cancer Institute
,
Vol. 95, No. 1, January 1, 2003
1
Oxford University Press
2
Affiliations of authors: R. Simon, K. Dobbin, L. M. McShane, Biometric Research Branch, National Cancer Institute, National Institutes of Health
,
Bethesda, MD; M. D. Radmacher
,
Departments of Biology and Mathematics, Kenyon College
,
Gambier, OH. Rockville Pike, MSC 7434, Bethesda, MD 20892-7434 (
DNA microarrays have made it possible to estimate the level of expression of thousands of genes for a sample of cells. Although biomedical investigators have been quick to adopt this powerful new research tool, accurate analysis and interpretation of the data have provided unique challenges. Indeed, many investigators are not experienced in the analytical steps needed to convert tens of thousands of noisy data points into reliable and interpretable biologic information. Although some investigators recognize the importance of collaborating with experienced biostatisticians to analyze microarray data, the number and availability of experienced biostatisticians is inadequate. Consequently, investigators are using available software to analyze their data, many seemingly without knowledge of potential pitfalls. Because of serious problems associated with the analysis and reporting of some DNA microarray studies, there is great interest in guidance on valid and effective methods for analysis of DNA microarray data. The design and analysis strategy for a DNA microarray experiment should be determined in light of the overall objectives of the study. Because DNA microarrays are used for a wide variety of objectives, it is not feasible to address the entire range of design and analysis issues in this commentary. Here, we address statistical issues that arise from the use of DNA microarrays for an important group of objectives that has been called class prediction (1). Class prediction includes derivation of predictors of prognosis, response to therapy, or any phenotype or genotype defined independently of the gene expression profile.
-
EXPERIMENTAL OBJECTIVES DRIVE DESIGN
AND ANALYSIS
Good DNA microarray experiments, although not based on
gene-specific mechanistic hypotheses, should be planned and
conducted with clear objectives. Three commonly encountered
types of study objectives are class comparison, class
prediction, and class discovery (1).
Class comparison is the comparison of gene expression in
different groups of specimens. The major characteristic of class
comparison studies is that the classes being compared are
defined independently of the expression profiles. The specific
objectives of such a study are to determine whether the expression
profiles are different between the classes and, if so, to identify
the differentially expressed genes. One example of a class
comparison study is the comparison of gene expression profiles of
stage I breast cancer patients who are long-term survivors with
the gene expression profiles of those who have recurrent disease.
Another example is the comparison between gene expression
profiles in breast cancer patients with and without germline
BRCA1 mutations (2).
Class prediction studies are similar to class comparison
studies in that the classes are predefined. In class prediction studies,
however, the emphasis is on developing a gene expression-based
multivariate function (referred to as the predictor) that
accurately predicts the class membership of a new sample on the
basis of the expression levels of key genes. Such predictors can
be used for many types of clinical management decisions,
including risk assessment, diagnostic testing, prognostic
stratification, and treatment selection. Many studies include both class
comparison and class prediction objectives.
Class discovery is fundamentally different from class
comparison or class prediction in that no classes are predefined.
Usually the purpose of class discovery in cancer studies is to
determine whether discrete subsets of a disease entity can be
defined on the basis of gene expression profiles. This purpose is
different from determining whether the gene expression profiles
correlate with some already known diagnostic classification.
Examples of class discovery are the studies by Bittner et al. (3)
that examined gene expression profiles for advanced melanomas
and by Alizadeh et al. (4) that examined the gene expression
profiles of patients with diffuse large B-cell lymphoma. Often
the purpose of class discovery is to identify clues regarding the
heterogeneity of disease pathogenesis.
LIMITATIONS OF CLUSTER ANALYSIS FOR
CLASS PREDICTION
One of the most common errors in the analysis of DNA
microarray data is the use of cluster analysis and simple fold
change statistics for problems of class comparison and class
prediction. Although cluster analysis is appropriate for class
discovery, it is often not effective for class comparison or class
prediction. Cluster analysis refers to an extensive set of methods
for partitioning samples into groups on the basis of the
similarities and differences (referred to as distances) among their gene
expression profiles. Because there are many ways of measuring
distances among gene expression profiles involving thousands
of genes and because there are many algorithms for partitioning,
cluster analysis is a very subjective analysis strategy.
Cluster analysis is considered an unsupervised method of
analysis because no information about sample grouping is used.
The distance measures are generally computed with regard to the
complete set of genes represented on the array that are measured
with sufficiently high signals, or with regard to all the genes that
show meaningful variation across the sample set. Because
relatively few genes may distinguish any particular class, the
distances used in cluster analysis will often not reflect the influence
of these relevant genes. This feature accounts for the poor results
often obtained in attempting to use cluster analysis for class
prediction studies.
Cluster analysis also does not provide statistically valid
quantitative information about which genes are differentially
expressed between classes. Investigators often use simple average
fold change measures or visual inspection of a cluster image
display to identify differentially expressed genes. However,
average fold change indices do not account for variability in gene
expression across samples within the same class; some twofold
average effects represent statistically significant differences and
some do not. Neither fold change indices nor visual inspection of
cluster image displays enable the investigator to deal with
multiple comparison issues in a statistically valid manner. For
example, in examining expression levels of thousands of randomly
varying genes, there may be many genes that spuriously appear
to be differentially expressed between two classes on the basi (...truncated)