Genomic DNA as a cohybridization standard for mammalian microarray measurements
Published online June 9, 2004
Nucleic Acids Research, 2004, Vol. 32, No. 10 e81
DOI: 10.1093/nar/gnh078
Genomic DNA as a cohybridization standard for
mammalian microarray measurements
Brian A. Williams, Richele M. Gwirtz and Barbara J. Wold*
Division of Biology, MC 156-29, California Institute of Technology, Pasadena, CA 91125, USA
Received March 7, 2004; Revised and Accepted May 4, 2004
ABSTRACT
INTRODUCTION
DNA microarrays have quickly become an indispensable tool
for transcriptome analysis (1). Mechanical spotting of DNA on
glass slides has emerged as a widely used microarray platform
because it affords ¯exibility of array design and relative
economy. However, this technology also has some signi®cant
shortcomings. Because feature geometry and the amount of
DNA per feature vary within a gene chip, and also from one
chip to another, measurements must be made as internal
ratiometric comparisons of one RNA sample with a reference
(or `denominator') RNA (2,3). This is done by simultaneous
*To whom correspondence should be addressed. Tel: +1 626 395 4916; Fax: +1 626 449 0756; Email:
Nucleic Acids Research, Vol. 32 No. 10 ã Oxford University Press 2004; all rights reserved
A persistent design problem for ratiometric microarray studies is selecting the `denominator' RNA
cohybridization standard. The ideal standard should
be readily available, inexpensive, invariant over time
and from laboratory to laboratory, and should represent all genes with a uniform signal. RNA references
(both commercial `universal' and experimentspeci®c types), fall short of these goals. We show
here that mouse genomic DNA is a reliable microarray cohybridization standard which can meet
these criteria. Genomic DNA was superior in
universality of coverage (>98% of genes from a
16 000 feature mouse 70mer microarray) to the
Stratagene Universal Mouse Reference RNA standard. Ratios for genes in very low abundance in the
Stratagene standard were more unstable with the
Stratagene standard than with genomic DNA. Genes
with mid-range, and therefore presumably optimal
RNA denominator values, showed comparable
reproducibility with both standards. Inferred ratios
made between two different experimental RNAs
using a genomic DNA standard were found to correlate well with companion, directly measured ratios
(Spearman correlation coef®cient = 0.98). The
advantage in array feature coverage of genomic
DNA will likely increase as newer generation microarrays include genes which are expressed exclusively in minor tissue or developmental domains
that are not represented in mixed tissue RNA
standards.
hybridization of experimental and reference samples, where
each RNA population is transcribed into cDNA with a
different ¯uorophore (typically Cy3 for one and Cy5 for the
other). While this is very effective for direct comparisons of
just two RNA samples, the full power of large-scale expression analysis comes from comparisons of multiple (tens to
hundreds or even thousands) of different RNA samples. To do
this using spotted microarrays, the ratio observed for each
feature on the array is compared across all gene chips in a
study, each of which has used the same denominator RNA
sample (converted to labeled cDNA or cRNA).
Although this design has proved very successful, the
requirement for internal ratiometric measurement presents a
thorny set of problems that come from properties of the
reference hybridization standard. For example, instability and
error is expected for RNAs not represented in the reference or,
alternatively, for RNAs so prevalent in the reference that they
saturate their corresponding features (detectors). Moreover,
the reference RNA sample composition is not standard from
one study to another, usually having been selected based on
different criteria for each study. Once a standard is selected,
the vagaries of biology make it dif®cult to reproduce precisely
from one preparation to another. This means that global
comparisons between studies done in the same laboratory over
a long time or between different laboratories are compromised. These issues have so far been dealt with using strategies
that range from selecting a single tissue standard, such as
whole spleen RNA for a study of B cells done by the Alliance
for Cell Signaling (AFCS) (http://www.signaling-gateway.
org) to making a denominator mixture of RNAs by pooling
aliquots from each sample in a given study (4,5), to attempting
to make a `general mixture' of RNA from diverse cell lines
(e.g. the Stratagene Universal Reference RNA standards)
(4,6).
Genomic DNA should, in principle, be a more general,
invariant and inexpensive solution (1,7). Major virtues of the
genome as a cohybridization `standard' include complete
sequence representation, sequence stability over time and
from one preparation to another, uniform prevalence for most
genes and very low cost. These features mean that it is also
applicable to any array, independent of which subset of genes
is arrayed or which strand, in the case of oligonucleotides, is
represented.
It is also clear that genomic DNA presents problems and
challenges of its own. In the large vertebrate genomes that are
our principle interest, mRNA coding sequences are highly
diluted by non-coding DNA. This is expected to adversely
e81 Nucleic Acids Research, 2004, Vol. 32, No. 10
MATERIALS AND METHODS
Oligonucleotide arrays
70mer oligonucleotides representing 13 443 expressed sequences from the mouse genome (Operon Array Ready Oligo Set
version 1.0) were printed on SurModics 3-D Link glass slides
using a robotic printing apparatus assembled according to
instructions from the Pat Brown Laboratory website (http://
cmgm.stanford.edu/pbrown/mguide/index.html). The Operon
70mers were resuspended in SurModics print buffer at a
concentration of 20 pmol/ml. Samples of xenotypic DNA (408
features) and sequences informatically determined to be absent
from the mouse genome (320 features) served as negative
controls. An additional 1436 print buffer features served as
blanks for carryover control, and a select group of positive
control genes was included for quantitative comparisons and
statistical analysis, bringing the ®nal array size to 16 192
features (herein referred to as the 16K array). A 32 pin print
head out®tted with MicroQuill 2000 print pins (Majer
Precision Engineering) was used to array the features in 32
sectors, each 23 3 22 features in dimension. Slides were
post-processed according to the manufacturer's protocol.
Hybridizations were carried out in 53 SSC, 50% formamide
and 100 ng/ml yeast tRNA, at 46°C for 72 h. Coverslips were
removed in 43 SSC, 0.1% SDS; the slides were then washed
twice in 13 SSC, 0.1% SDS at 67°C for 5 min, then in 0.23
SSC at room temperature for 1 min, and again in 0.13 SSC for
1 min at room temperature, before spin drying at 900 r.p.m. for
3 min in an IEC Centra GP8 centrifuge using a 216 rotor.
Hybridized arrays were scanned on an Axon 4000 duallaser scanning instrument (Axon In (...truncated)