Intensity‐based analysis of two‐colour microarrays enables efficient and flexible hybridization designs
Published online February 24, 2004
Nucleic Acids Research, 2004, Vol. 32, No. 4 e41
DOI: 10.1093/nar/gnh038
Intensity-based analysis of two-colour microarrays
enables ef®cient and ¯exible hybridization designs
Peter A. C. 't Hoen1,*, Rolf Turk1, Judith M. Boer1, Ellen Sterrenburg1,
ReneÂe X. de Menezes3, Gert-Jan B. van Ommen1 and Johan T. den Dunnen1,2
1
Center for Human and Clinical Genetics, 2Leiden Genome Technology Center and 3Department of Medical
Statistics, Leiden University Medical Center, Wassenaarseweg 72, 2333 AL Leiden, The Netherlands
Received December 17, 2003; Revised January 30, 2004; Accepted January 31, 2004
ABSTRACT
INTRODUCTION
DNA microarrays are widely used to measure genome-wide
changes in mRNA expression levels across conditions such as
developmental stages, disease states, drug treatment and gene
disruption (1±5). Affymetrix GeneChips, prepared by photolithography, and spotted cDNA and 50±70mer oligonucleotide
microarrays are currently the most frequently used platforms.
The GeneChip is a one-colour system based on the immuno¯uorescent detection of biotinylated nucleic acids. The
*To whom correspondence should be addressed. Tel: +31 71 527 6611; Fax: +31 71 527 6075; Email:
Nucleic Acids Research, Vol. 32 No. 4 ã Oxford University Press 2004; all rights reserved
In two-colour microarrays, the ratio of signal intensities of two co-hybridized samples is used as a
relative measure of gene expression. Ratio-based
analysis becomes complicated and inef®cient in
multi-class comparisons. We therefore investigated
the validity of an intensity-based analysis procedure. To this end, two different cRNA targets were
hybridized together, separately, with a common
reference and in a self±self fashion on spotted
65mer oligonucleotide microarrays. We found that
the signal intensity of the cRNA targets was not
in¯uenced by the presence of a target labelled in the
opposite colour. This indicates that targets do not
compete for binding sites on the array, which is
essential for intensity-based analysis. It is demonstrated that, for good-quality arrays, the correlation
of signal intensity measurements between the different hybridization designs is high (R > 0.9).
Furthermore, ratio calculations from ratio- and
intensity-based analyses correlated well (R > 0.8).
Based on these results, we advocate the use of
separate intensities rather than ratios in the analysis
of two-colour long-oligonucleotide microarrays.
Intensity-based analysis makes microarray experiments more ef®cient and more ¯exible: It allows for
direct comparisons between all hybridized samples,
while circumventing the need for a reference sample
that occupies half of the hybridization capacity.
difference in perfect and mismatch probe intensities is used
for gene expression measurements (6). Spotted microarrays
are commonly hybridized with two samples labelled with two
different ¯uorophores. For these arrays, the ratio of the signal
intensities in the two channels is a relative measure of gene
expression.
Normalization is essential to remove systematic biases in
microarray data. For two-colour arrays, normalization
algorithms can be applied to (log-transformed) ratios (7)
(e.g. using a LOWESS algorithm). Alternatively, ANOVA
models that account for array, dye and spot effects can be
applied to the individual signal intensities on all the arrays (8).
In both cases, after normalization, the ratio of the cohybridized samples is usually calculated to minimize the
in¯uence of spatial variation in spot morphology and
hybridization ef®ciency on the experimental outcome.
Furthermore, some suggest that ratio-based analysis is
important because of possible competitive hybridization of
the two targets due to saturation of binding sites on the array
(9).
Ratio-based analysis can be applied to experiments with a
reference or loop design (10,11). A disadvantage of the
reference design is that half of the acquired data represent only
one sample that is often not biologically relevant, thereby
doubling the number of arrays required (10,11). A loop design
has other disadvantages (11). The calculated ratios have
variable levels of precision since some samples are more
directly related than others, and the set of hybridizations
cannot be extended. This has important implications for
studies in which not all samples become available at the same
time; new samples could only be included in the experiment
via forming new subloops, and only if biological material from
the earlier samples is still available.
An intensity-based analysis in which the signal intensities in
the two channels are kept separately, also after normalization,
would allow for hybridization designs that are more ef®cient
than the reference design and more ¯exible than the loop
design. We designed a set of experiments to determine
whether an intensity-based analysis would be justi®ed for our
spotted long-oligonucleotide microarrays. Our aims are 2fold: ®rst, to investigate whether hybridization patterns are
suf®ciently uniform across arrays; secondly, to verify if there
is evidence for competition between targets for binding sites
e41 Nucleic Acids Research, 2004, Vol. 32, No. 4
PAGE 2 OF 6
on the array. We run two parallel statistical analyses, one
ratio-based and the other one intensity-based, and compare
their results.
Table 1. Overview of used Hyb-designs and ratio calculations
MATERIALS AND METHODS
ComRef
Hyb Design
Array
Cy3
Cy5
Ratio
CoHyb
1
2
3
4
5
6
7
8
9
10
11
12
A
B
A
REF
B
REF
A
±
B
±
A
B
B
A
REF
A
REF
B
±
B
±
A
A
B
R1
R2
R3
R4
R5
R6
Microarray and target preparation
Feature extraction and data analysis
Feature extraction was performed with GenePix 3.0 software
(Axon Instruments Inc.). Spots with intensities lower than
background or aberrant spot shape were ¯agged by the
software and checked manually. Only spots that were not
¯agged on any of the analysed arrays were taken into account
in further analyses, leaving 2224 data points per array. Local
background-subtracted median signal intensities were used as
intensity measures. Scaled gene expression ratios in samples A
and B were calculated after transformation (natural logarithm)
of the background-corrected intensities and subtraction of the
average of the LN-transformed intensities (linear scaling).
OneColour
SelfSelf
R7
R8
R1±R8 are calculated from scaled LN-transformed background-subtracted
intensities.
Average ratios are then calculated according to:
LN(RatioCoHyb): 0.5*[LN(R1) ± LN(R2)]
LN(RatioComRef): 0.5*[LN(R5) ± LN(R6)] ± 0.5*[LN(R3) ± LN(R4)]
LN(RatioOneColour): 0.5*[LN(R8) ± LN(R7)].
each individual target (A and B) separately. F-statistics and
corresponding p values are based upon the F2 statistic
available in the MAANOVA package, which is a shrunk
version of the classic F-statistic. To avoid distributional
assumptions, the package offers the possibility of computing
p values for hypothesis tests via permutation methods. We
have chosen to perfo (...truncated)