Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm
Christina O hrmalm
2
Magnus Jobs
1
2
Ronnie Eriksson
2
Sultan Golbob
2
Amal Elfaitouri
2
Farid Benachenhou
2
Maria Strmme
0
Jonas Blomberg
2
0
Nanotechnology and Functional Materials
, The A
1
School of Health and Social Sciences
, Ho gskolan Dalarna, Falun
2
Clinical Virology, Department of Medical Sciences, Uppsala University and Academic Hospital
, 751 85 Uppsala
One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes.
-
Microbial genomes can be highly variable because of high
mutation rates. Because of this extreme variability, it is
often difficult to identify regions within a specific virus
genome that are sufficiently evolutionarily conserved to
serve as targets for specific detection primers and
probes. RNA viruses are especially variable. The influenza
virus, a negative sense, single-stranded RNA (ssRNA)
virus with a highly variable RNA genome, for example,
has been known to cause the diagnostic problem that is at
the basis of this article, because of a high rate of mutation
and genetic drift. In such situations, optimal detection
primers and probes would be broadly targeted yet
specific, and remain functional even if the genome
sequence changed because of genetic drift.
Diagnostic nucleic acid hybridization probes are
constructed from the most conserved portions of
genes from viruses commonly causing infection. Long
probes have a large inherent tolerance to microbial
variation. The introduction into the introduction into
the probe design of a base that can hybridize with all
four normal bases (a universal base), or of multiple
nucleotides (degenerations; wobbles) in a single position,
can induce tolerance to natural viral variation
(mismatch).
The naturally occurring (14) nucleotide (nt)
deoxyribose-Inosine (dInosine) is one of many more or
less generally hybridizing nt (5,6) known as universal
bases. All four normally occurring DNA bases can
hybridize to dInosine. The general trend in decreasing
hybridization stability is I:C > I:A > I:T & I:G > I:I when using 1 M
NaCl, 10 mM sodium cacodylate and 0.5 mM EDTA pH 7
(7,8). However, dInosine is readily available and can be
recognized as a G by polymerases (5,9,10). Alternative
universal bases, e.g. 5-nitroindole, also exist.
3 M tetramethylammonium chloride buffer (TMAC) is
a hybridization buffer that selectively raises the stability of
A:T base pairs to approximately that of G:C base pairs
(1114). It was used in these studies to reduce the effect of
sequence composition when comparing different probes of
the same length. The term nucleation site is used in this
article to indicate a stretch of contiguous perfectly
matching nt, capable of initiating hybridization (15).
The aim of these studies was to improve understanding
of the design of probes to be used in a TMAC buffer
system, by investigating variability and the inclusion
of dInosines, other universal bases and wobbles.
Specifically, we examined (i) variation (i.e. mismatch)
tolerance, (ii) sensitivity to different mismatch
distributions, (iii) utilization of dInosine as an nt analog, and
(iv) specificity; we also present a new algorithm for
prediction of hybridization results. In additional experiments,
the question of the use of degeneracy versus a universal
base was addressed. Furthermore, we investigated the use
of the derived design criteria for detection of rotavirus
RNA in clinical samples.
MATERIAL AND METHODS
A 70-mer nucleic acid hybridization probe, named the
InflA probe, was constructed from the most conserved
portion of the matrix gene in segment 7 of the
Influenza A H3N2 virus. Properties important for the
design of variation (mismatch)-tolerant yet specific
probes were investigated by studying the interaction
between a set of virus-derived probes and complementary
targets with different degrees and distributions of
mismatch.
The 70-mer DNA probes were coupled to color-coded
microspheres, hybridized with biotinylated target nucleic
acids, incubated with streptavidinphycoerythrin and
analyzed in a Luminex 200TM system. The hybridization
reaction was performed at a standard non-saturating
concentration (0.2 nM target) in 3 M TMAC buffer.
In further experiments, in an attempt to make a probe
with an extended mismatch tolerance, a series of
dInosinecontaining probes were synthesized and analyzed for
hybridization in the 3 M TMAC buffer system. A
limited number of experiments comparing dInosine with
an alternative universal base (5-nitroindole) or with
wobbles (degenerations) were also conducted.
Many viruses harbor synonymous mutations (sm),
which means that the third base in a codon can wobble
without changing the amino acid in the protein. Targets
with regions where every third base was mutated, to
resemble the common phenomenon of sm (i.e. the nt
sequence is varied without affecting the coding for a
specific amino-acid code), were of special interest, since
this is often the cause of the variation in coding viral
sequences.
Rotavirus is a dsRNA virus that causes gastroenteritis.
Clinical fecal samples, previously confirmed to contain
rotavirus, were used to test the design strategies of this
report. An asymmetric PCR, with biotinylated rev-primer
in excess, was set up for the VP6 segment. The primers
were designed using knowledge gained from this report,
i.e. both degeneration and dInosine were used to create
moderately degenerated probes with uninterrupted
matching stretches as long as possible, in the most
conserved regions. The biotinylated reverse (rev-) primer
was made 77-nt long to be as tolerant to variation as
possible, while the shorter forward (fw-) primer (23 nt)
contained four locked nucleic acids (LNA) to increase
the hybridization strength (represented by melting
temperature, Tm) to match that of the rev-primer.
Single stranded 70-mer oligonucleotides with a C12
aminolink at the 50 end, with or without dInosines/
50-nitroindole/N wobbles, were obtained from
Biomers.net (Ulm, Germany). The design of the probes
was based on the programs BLASTn (16), ClustalX (17)
and ConSort (J.Blomberg et al., unpublished results).
Briefly, matching viral sequences were retrieved from the
GenBank database at NCBI, NIH, using the viral
sequence of interest as a query, and alignments were
performed by a BLASTn search. BLASTn and ClustalX
alignments were analyzed in ConSort to define the
most suitable probe sequence. ConSort provides the
frequency of variation and the variation of nt composition in
each base position, the number of aligned sequences, and a
majority consensus sequence. The proposed probe
sequence was further analyzed for its predicted Tm and
probable homodimer and hairpin interactions using
Mfold at the IDT OligoAnalyser site (http://eu.idtdna
.com/analyzer/Applications/OligoAnalyzer/) [http://
mfold.bioinfo.rpi.edu/ (18)] and Visual OMPTM 7.0
(DNA Software). Visual OMPTM 7.0, which uses the
Nearest Neighbor (NN) algorithm [DNA Software (19)],
was used to estimate the change in Gibbs free energy
associated with hybridization of the two strands ( G)
for the interaction between probes and targets at 45
and 55 C in 3M TMAC. IDT OligoAnalyser was used
to estimate G for the interaction between probes
and targets using 50-mM Na+ and 2-mM Mg2+. Each
combination was tested with the heterodimer-formation
function (http://eu.idtdna.com/analyzer/Applications/
OligoAnalyzer/).
To create the 70-nt InflA probes, nt 725794 of the
matrix protein 2 gene segment 7 of Influenza A H3N2,
accession nr CY023083, was used as query in BLASTn
(GenBank database at NCBI, NIH). The Norovirus
probe sequence comprises nt 646715 of the capsid gene
of the Norwalk-like virus, accession nr AY274264. The
probes for detection of rotavirus were made using
sequence EU372725 of the Human rotavirus A strain
CMH171/01 inner capsid protein (VP6) gene as query.
The probe region chosen, nt 112178, was analyzed
using the haplotype function of ConSort . Two differently
degenerated probes containing dInosines were designed
from each of the two major haplotypes, which potentially
covered all rotavirus group A variations recorded in
GenBank.
Synthetic targets, complementary to the consensus
sequence of the InflA, Norovirus and Rotavirus probes,
with various numbers of mismatches, were purchased as
70-mer oligonucleotides with biotin attached to a
2-aminoethoxy-ethoxyethanol linker at the 50 end
(Biomers.net, Ulm, Germany).
Probe coupling to microspheres
Specific synthetic 50 amine-C12 modified 70-mer probes
for influenza A were designed and solid-phase coupled
to xMAP carboxylated color-coded microspheres
(Luminex Corp., Austin TX, USA), according to the
protocol of the Luminex corporation (Austin TX, USA).
Briefly, 2.5 106 stock microspheres were collected by
centrifugation and resuspended in 25 ml of 0.1 M MES,
pH 4.5 (2N-morpholino-ethanesulfonic acid, Sigma).
Subsequently, 0.2 nmol of the probe and freshly made
2 ml 10 mg/ml EDC [1-ethyl-3-(3-dimethylaminopropyl)
carbodiimide (water-soluble carbodiimide; Pierce; sold
by Nordic Biolabs AB, Sweden)] in H2O were added to
the microspheres and the suspension was incubated in the
dark for 30 min at room temperature. Care was taken to
store the EDC in a dry condition, in aliquots. After
addition of another 2 ml (10 mg/ml) EDC in H2O and
repeated incubation, the microspheres were washed with
0.5 ml of 0.02% Tween-20. The coupled microspheres
were pelleted by centrifugation at 8000 g for 2 min and
resuspended in 0.5 ml of 0.1% SDS. After a second spin at
8000 g for 2 min, the final pellet was resuspended in 50 ml
of TE, pH 8.0.
Hybridization of biotinylated target DNA to probe linked
microspheres
An amount of 5 ml of 2.0 nM synthetic biotin-labeled
target was mixed with hybridization buffer consisting of
33 ml 3 M TMAC buffer (3 M tetramethylammonium
chloride, 0.1% Sarkosyl, 50 mM TrisHCl, pH 8.0,
4 mM EDTA, pH 8.0; Sigma) and 12 ml 1 TE-buffer
pH 8.0 and 0.05 ml ( 2500 microspheres) for each
probe-coupled Luminex microsphere. The mixture was
heated at 95 C for 2 min to denature the DNA targets
and probes, followed by hybridization at 45 or 55 C
for 30 min while shaking on the Thermostar (BMG
LabTech; Offenburg, Germany) microplate incubator.
An amount of 2 ml (0.05 mg/ml) of
streptavidinR-phycoerythrin (QIAGEN, Hilden, Germany) was
added to the mixture, which was further incubated at
45 or 55 C for 15 min before analysis for internal
microsphere and R-phycoerythrin reporter fluorescence on the
Luminex 200TM system (Luminex corporation, Austin,
TX). The amount of biotinylated target that hybridizes
to the microsphere-bound probes is directly proportional
to the Median Fluorescence Intensity (MFI) reported by
the instrument (in all experiments, fluorescence was
measured from a minimum number of each type of
microsphere: a set of 100 beads). The term total MFI describes
the hybridization signal from a perfectly matching probe
target duplex, while the percentage (%) of the total MFI
describes the ratio between the hybridization signal of a
mismatching probetarget duplex and the total MFI of
that particular probe. Titration experiments established
that 60% of the maximum hybridization capacity of
the microsphere-bound probes was reached using the
conditions under which total MFI was measured
(i.e. microsphere-bound probes were not saturated). An
MFI of 100 was used as the lower limit of detection
(LLOD).
Clinical samples containing Rotavirus
Fecal samples were obtained from children with
gastroenteritis, from the Childrens Hospital ward at Uppsala
Academic Hospital. Samples were handled anonymously
according to the rules of the ethical committee at the
Academic Hospital. The study used samples that were
positive in a rotavirus antigen detection test; 100 ml of
the sample was diluted in 900 ml 1 TE buffer. After
centrifugation, 400 ml was added to a lysis buffer and total
nucleic acid was extracted as described by the
manufacturer (easyMag , bioMe rieux). The samples were eluted in
110 ml buffer and stored at 70 C.
Asymmetric RTPCR and sequencing of Rotavirus
dsRNA in clinical samples
A reverse transcriptase (RT)PCR was set up to amplify
and biotin-label the nucleic acid of human rotavirus A
from these clinical samples.
The 545 sequences obtained from the rotavirus query in
BLASTn were analyzed in the ConSort program to
construct fw- and rev-primers. The fw-primer (nt 123), 50-G
GCTTTW+AAA+CGAA+GTC+TTCR-30 (+A, +C,
+G, +T are LNA residues) and the biotinylated
rev-primer (nt 502426), 50-TATGGAAATATATTAGG
TTTATGAAAAACAAATCCIGTACGTTGTCTTCT
ITTITGIARRTTCCAITTITCIATRTA-30, resulted in a
PCR product of 502 nt. After nucleic acid extraction in
the easyMag , the extracts were heated at 97 C for
5 min, followed by snap cooling on ice for 2 min to
obtain ssRNA from the rotaviral dsRNA. The PCR
reaction contained 5 ml of nucleic acid extract, 25 ml 2
RTPCR iScript buffer, 200-nM Fw-primer, 600-nM
biotinylated rev-primer, nuclease-free water and 1 ml
iScript RT enzyme (total volume 50 ml). The samples
were run at 50 C for 30 min, 94 C for 10 min, 50 cycles
at 94 C for 30 s, 55 C for 30 s and 72 C for 30 s, ending
with 72 C for 7 min. An amount of 5 ml of the PCR
product was used in the hybridization experiment with
the microsphere-coupled Rotavirus probes (sequences
shown in Supplementary Table S4A and B. The PCR
products were sequenced with a 3130 Genetic Analyzer
(Applied Biosystem) by utilizing the same fw- and
rev-primers described above.
Development of a new algorithm
The discovery of the importance of long uninterrupted
perfect matches and the long-range effects of mismatch
(see Results section), which are not embodied in
current NN hybridization theories (19), led us to
formulate a simple new descriptive theory. The new algorithm
includes aspects of NN theory (8,2022) but extends this
to longer hybridizing segments and includes the effects
of dInosine in the long oligonucleotides. Visual OmpTM
7.0 predicted that some 70-mer combinations would not
hybridize, yet they did hybridize. In order to better predict
hybridization, we investigated predictive strategies that
took into account the matching nt, its neighbors, the
length of the matching region, and the cooperativity of
neighboring matching regions. We finally settled for an
algorithm which attempts to model the nucleation and
zipping stages during the hybridization process. The new
model was termed NucZip (Figure 5B). The results
obtained using this model was then correlated with those
obtained using Visual OMPTM 7.0 and both were
compared with experimental data.
How NucZip works
NucZip simulates the hybridization process, starting with
potential nucleation sites and then proceeding to zip in
both directions. The NucZip algorithm (written by JB in
Visual FoxPro) was implemented as a module (procedure)
in the ConSort sequence analysis program. The
procedure is relatively simplistic, and does not contain the
thermodynamic and secondary structure analyses
provided by more sophisticated programs such as Visual
OMPTM 7.0.
The algorithm starts by searching for perfectly matching
hexa-, hepta-, octa- and nonamers, as potential nucleation
sites (the Nuc part). Every matching oligomer is given the
number of matching nt as its NucScore. If the oligomer
contains dInosine, the number of dInosines is subtracted
from the score but the segment is still counted as
uninterrupted, regardless of the dInosines. The NucScore is then
used to select the two highest scoring potential nucleation
sites which will undergoing zipping, up and downstream.
The score for the Zip portion of the model is
obtained from the number of consecutive matching
trimers, tetramers etc, up to pentadecamers, each counted
with equal weight, within a contiguous matching segment.
Thus, ZipScoremodex = k = 1k = kmax( n = 3n = 15 Sn),
where kmax is the number of uninterrupted matching
segments, including the chosen nucleation site, and Sn
is the number of successive segments of length n (varying
from 315), i.e. the number of full length trimers, full length
tetramers etc. up to full length pentadecamers, which
fit into the matching segment. The same Zip scoring
system was performed in two modes, counting dInosines
either as matching (mode 1) or as mismatching (mode 2).
In the second mode, dInosines will shorten the length
of matching segments, decreasing the score. The final
ZipScore was calculated as a weighted mean of
ZipScoremode1 and ZipScoremode2, where the weighting
factor was based on the empirical data presented in
the current report (Figures 24). Since dInosine
hybridizes more strongly to C than to the other nts,
the algorithm adds a contribution based on the number
of dI:C pairs weighted by an InoCfactor. The upstream
and downstream ZipScores were obtained as:
ZipScore downstream =ZipScoremode1 dInosinefactor
(ZipScoremode1 ZipScoremode2)+(dInoCnr * dInoCfactor)
and ZipScore upstream = ZipScoremode1 dInosinefactor
(ZipScoremode1 ZipScoremode2)+(dInoCnr * dInoCfactor).
The ZipScores from up- and downstream zipping were
then added; the final NucZipScore = ZipScore
downstream+ZipScore upstream.
In this way, the contribution from longer matching
segments was factored in with the contribution from the
nearest neighbors (approximated by the trimer part of the
algorithm). Figure 8 summarizes the principle of this
computational work.
The probability of a match in a degenerated nt position
is approximately predicted from the ConSort analyses of
target sequences. The behavior of the relatively few
oligonucleotides with degeneration that were tested is
approximately in line with NucZip reasoning, which places a
premium on long uninterrupted matching stretches.
However, a larger number of degenerated probes need
to be analyzed before the contribution of probabilities of
contiguous stretches extending through degenerated
positions can be estimated and included into NucZip. The
programming code is included in Supplementary Data.
The distribution of mismatches has a profound effect on
hybridization
ConSort was used to demonstrate the variation in the nt
sequence of segment 7 from 7333 genomes of Influenza A
(GeneBank database at NCBI, NIH). All H and N
influenza A types, HxNy, are represented in the alignment
(Figure 1). In comparison with the InflA probe designed
to match a 70-nt region of the Influenza A H3N2 virus, the
H5N1 virus differed in 5-nt positions, and the H1N1 virus
in three other nt positions (Table 1). Thus, if a detection
probe could tolerate mismatches in nine positions,
including the variant nt positions of the H1N1 and the
H5N1 viruses, it would fully cover 67 different H and N
combinations of Influenza A, i.e. nearly all recorded
variants in the chosen region, as demonstrated in the
BLASTn search.
The InflA probe was tested against 70 nt target
molecules with 3, 5, 7, 9, 11, 12, 13, 14, 15, 16 and 21 point
mutations (pm) and 21 grouped mutations (gm)
(Figure 2AD; nt sequences of probes and targets can be
found in Supplementary Tables S1A and B). The positions
of the pm were based on the variations found by
comparing H3N2, H5N1 and H1N1 viruses. Two targets
had the same number of mutations, but different
distributions: the 21 pm target had 21 evenly distributed pm, and
the 21-gm target had seven groups of three mutations
interspersed by 5 to 7 conserved nt. The InflA probe,
coupled to Luminex microspheres, was allowed to
hybridize with one of the biotinylated ssDNA targets in
3 M TMAC at two different temperatures: 45 C
(Figure 2A) and 55 C (Figure 2B). The hybridization,
given as MFI, was analyzed in the Luminex flow meter.
Introducing an increasing number of evenly distributed
pm in the target had a negative effect on hybridization, as
reflected in decreasing MFI; see Figure 2A (45 C) and B
(55 C). A few mismatches (3 pm and 5 pm) in the 70mer
target did not affect hybridization with the probe to any
great extent. A target with 9 pm (containing one 15-nt, one
Avg % conserved nt/position
Alignment of 7333 genomes of Influenza segment 7
Sequence 5030 (70 nt)
The grey nucleotides represent the differences between the sequences. The H5N1 virus subtype differed in 5 nt and the H1N1 virus in three additional
nt positions from the H3N2 virus. The sequence having 9 pm is present in 64 different subtypes (HxNy*). The results of analysis of 7333 genomes of
Influenza A using BLASTn (16) and ConSort (Blomberg,J., unpublished) are summarized. The H3N2 sequence was used as the InflA probe
sequence in this article.
12-nt, one 7-nt and four 5-nt matching regions) still
hybridized at 68% (MFI 4496, 45 C, Figure 2A) and
71% (MFI 4001, 55 C, Figure 2B) of the total MFI (i.e.
of the MFI of the perfectly matching InflA target/InflA
probe: MFI 6604, 45 C, Figure 2A; and MFI 5657, 55 C,
Figure 2B) while the 15 pm target (containing one 6 nt,
four 5 nt, and two 4 nt matching regions) hybridized at
13% (MFI 854, 45 C) and 1% (MFI 54, 55 C) of the
total MFI. The InflA probe failed to hybridize with the
16-pm target (containing one 6-nt, three 5-nt, and two 4-nt
matching regions) at either temperature, providing MFI
values that were 6% (MFI 403, 45 C) and 0.5% (MFI 29,
55 C) of the total MFI.
The longest perfectly matching sequence between the
mismatches in the InflA probe/21-pm target combination
was 3 nt in length. No hybridization was detected at either
temperature. In contrast, for the InflA probe/21-gm target
combination, where the distribution of the 21 mutations
created seven stretches of 57 perfectly matching
nt between the mismatches, hybridization with the
InflA probe was restored (Figure 2AD). The MFI
signal increased to 20 and 30% (Figure 2A and C, 45 C)
I 5000
F
M
I
F
M4000
11pm 12apm 12bpm 13pm
D Probes 5
InflA 21pm 21gm NTC InflA 21pm 21gm NTC
and 7 and 8% (Figure 2B and C, 55 C) of the total MFI,
when the InflA probe hybridized with the 21-gm target
instead of the 21-pm target.
In conclusion, a target with one to nine evenly
distributed mismatches, preserving multiple contiguous
matching stretches of at least 5 nt, has little reducing
effect on hybridization with a 70-mer probe, while a
target with >14 evenly distributed pm hybridizes
inefficiently or not at all. Hybridization can be improved by
utilizing a less stringent (lower) hybridization temperature.
Furthermore, grouped mismatches tend to strengthen the
hybridization compared to evenly distributed mismatches.
Consequently, for hybridization between 70-mer strands
with 1020 mismatches, the distribution of the
mismatching nt affects the hybridization more than the number of
mismatches, indicating that the length and number of
perfectly matching stretches are of greatest importance.
Introducing dInosines into the probes at sites of variation
restores hybridization with highly variable targets
A panel of five 70-mer probes, Ino321, containing 3, 5, 7,
9 or 21 dInosines, was designed based on the InflA probe
sequence. The dInosines were placed to match the
positions of the pm in the above pm targets (Figure 2D).
Introduction of 39 dInosines in the probe resulted in
only a small reduction in MFI when binding to the InflA
target; i.e. the MFI signal decreased in the order InflA
probe & Ino3 > Ino5 > Ino7 > Ino9. The Ino9 probe
hybridized with the InflA target by as much as 76%
(MFI 4970, 45 C) and 59% (MFI 3340, 55 C) of the
total MFI, comparable with the hybridization of the
InflA probe to the 9-pm target. In fact, all the Ino39
probes hybridized as efficiently with targets that had up
to the same number of pm as the number of dInosines,
including two mismatches not covered by dInosines, as
they did with the InflA target. When the Ino probes
hybridized with targets containing more than 12 pm, the
probes with many dInosines worked better than the InflA
probe; e.g. the Ino7 and Ino9 probes resulted in a signal
1.21.9 times higher than that for the InflA probe for
targets with 1316 pm at 45 C.
Interestingly, even the Ino21 probe hybridized
quite strongly with the InflA target, at 26 and 39%
(Figure 2A and C, 45 C) and 11 and 13% (Figure 2B
and C, 55 C) of the total MFI. Importantly, the Ino21
probe was able to restore hybridization with the 21 pm
target (11 and 37% of total MFI) to which the InflA
probe had totally failed to bind (Figure 2A and C,
45 C). The Ino21 probe hybridized with all the matching
321 pm targets with almost the same efficiency (mean
29.5% of the total MFI, 1946 MFI, at 45 C, Figure 2A;
and 8% of the total MFI, 435 MFI, at 55 C, Figure 2B),
which is in the same range as the hybridization of the
InflA probe with the 21-gm target (30.4% of the total
MFI, 2005 MFI, 45 C, Figure 2A; and 7.8% of the total
MFI, 444 MFI, 55 C, Figure 2B). The Ino21 probe and
21 gm target combination had 13 mismatches outside the
dInosine positions and failed to hybridize (Figure 2A, B
and C). In conclusion, the presence of dInosine in the
probe decreased hybridization with a perfectly matching
target, but a dInosine-containing probe bound more
strongly than a dInosine-free probe when the target had
many mismatches juxtaposed to the dInosine residues.
A minimum length of perfectly matching nt sequences is
required for hybridization
The importance of uninterrupted matching regions was
analyzed by making targets with different lengths of
perfectly matching sequences at the 50 end or at both the 50
and 30 ends, in combination with a long region of 26, 33 or
74% randomly distributed mutations (Figure 3A, B and
D, nt sequences in Supplementary Table 2A and B).
As expected, hybridization of the InflA probe to targets
with 26% random mutations in a region between two
flanking regions of 5 (26%5F), 7 (26%7F), 9 (26%9F)
or 15 (26%15F) perfectly matching nt showed that
shorter perfectly matching flanking regions reduced the
MFI. All the 26% targets hybridized with the InflA
probe at 45 C but, at 55 C, the 26%5F target failed to
hybridize (5.6% of the total MFI, 271 MFI). Like the
33%12F and the 16-pm target, the 26%5F target
contained 16 mutations (Figures 2A, B, 3A and B). The
33%12F target (45% of the total MFI, Figure 3A), with
its two regions of 12 uninterrupted nt, hybridized much
more strongly than the 26%5F (27% of the total MFI) or
16-pm (6.1% of the total MFI) targets, which both
contained several shorter matching regions, of 6, 5, 4 and 3 nt
(Table 2). The same effect, i.e. that a few long perfectly
matching regions result in better hybridization than
several shorter regions, was also seen with the three
targets that had 14 mismatches: the 26%9F, the
33%15F and the 14-pm targets at both temperatures
(Table 2). The 74%9F target, with only 12 matching nt
dispersed in the central region, did not hybridize to the
InflA probe (1.4% of the total MFI, 86 MFI, 45 C; and
0.6% of the total MFI, 30 MFI, 55 C), while the 26%9F
target, with 38 matching nt (three regions of 5 nt and one
region of 6 nt), did hybridize (44% of the total MFI, 2583
MFI, 45 C; and 24% of the total MFI, 1125 MFI, 55 C).
This shows that the two flanking regions of nine matching
nt did not cause hybridization alone. Furthermore, the
74%12F, with 11 matching nt dispersed in the central
I
F
M4000
I
F
M4000
12nt 15nt NTC InflA
12nt 15nt NTC
region, hybridized at 45 C (43.1% of the total MFI, 2548
MFI) but not at 55 C (3.2% of the total MFI, 165 MFI),
while the 74%15F target, with nine matches dispersed in
the central region, hybridized at both temperatures
(77.7% of the total MFI, 4597 MFI, 45 C, and 55% of
the total MFI, 2768 MFI, 55 C). Thus, the perfectly
matching 50 and 30 flanking regions were long enough to
explain nearly all the hybridization in 74%12F and
74%15F. At 45 C, even a single contiguous stretch of
15 nt (the 74%15nt target, 17.4% of the total MFI,
1029 MFI) was enough to give a hybridization signal
while, at 55 C, the long perfectly matching segment
consisted of 18 nt (74%18nt target, 31% of the total
a% MFI (InflA probe against target/InflA probe against InflA target).
MFI, 2712 MFI). Thus, the perfectly matching regions
have to be a specific length to overcome a high degree of
variation in the remaining sequence.
These experiments, taken in conjunction with the InflA
probe hybridizing to the 21-gm but not to the 21-pm
target, confirm that both the number of mismatches and
their distribution are important. It is reasonable to assume
that perfectly matching sequences of a minimum length
function as nucleation sites (15) which initiate
hybridization between the probe and the target.
Long non-matching ends destabilize duplex formation
Hybridization of the InflA probe with targets containing
74% mismatching 70-mers with one perfectly matching
end of various lengths (74%xnt) was compared with
hybridization of the InflA probe with short, perfectly
matching targets (1222 nt_free; Figure 3A, B and D; nt
sequences in Supplementary Table S2B) to analyze the
effect of long mismatching ends. Utilizing the InflA
probe / InflA target as reference, the 12nt_free target
did not hybridize at 45 C (1.8% of the total MFI, 106
MFI) but the 15-, 18- and 22-nt_free targets gave
successively stronger MFI signals (62, 98 and 123% of the total
MFI; 3669, 5770 and 7269 MFI, respectively) than the
74%15nt, 74%18nt and 74%22nt targets (17.4, 55 and
62% of the total MFI; 1029, 3222 and 3637 MFI,
respectively). Thus, the MFI is higher when the short, perfectly
matching targets (xnt_free) hybridize with the InflA probe
with only one long end protruding from the hybridized
portion of the probe. Previous reports have demonstrated
that 15 nt single dangling ends tend to stabilize duplex
formation (2326). This study shows that two long
mismatching ends destabilize the hybridization of the
matching part of the duplex. The long free mismatching
ends could form intramolecular secondary structures that
could have an effect on the hybridizing duplex.
Alternatively, the Brownian movements of the two long
non-hybridized sections could mechanically stress the
remaining base pairs.
Analyses of probe hybridization with targets containing
regions of sm of various length
The tolerance of the probe against sm was tested using
targets (70 nt) with every third nt harboring an sm
(referring to the reading frame of the matrix 2 protein of
the H3N2 Influenza A) in regions of different length. The
33%9F, 33%12F and 33%15F targets have a region
containing 18, 16, or 14 sm between two flanking
regions of 9, 12, or 15 perfectly matching nt, respectively.
The 33%9nt, 33%12nt and 33%15nt targets have a
region of 20, 19 or 18 sm in combination with one
region of 9, 12 and 15 perfectly matching nt at the
50 end (Figure 3D, nt sequences in Supplementary
Table 2A and B).
As demonstrated in Figure 3A and B (and confirmed in
Figure 3C), the targets 33%9nt and 33%9F failed to
hybridize with the InflA probe, while the 33%15F target
hybridized at both hybridization temperatures (60% of
the total MFI, 3517 MFI, at 45 C; 39% of the total
MFI, 1934 MFI, at 55 C). The 33%12F target with its
two 12 nt flanking regions only hybridized at 45 C (45%
of the total MFI, 2635 MFI). The 33%15nt, with one
contiguous region of 15 perfectly matching nt also hybridized
only at 45 C (23% of the total MFI, 1352 MFI; MFI
result taken from Figure 3A and B). Thus, for a 70-mer
probe to hybridize with a target containing a relatively
long stretch of sm, it must have (i) one uninterrupted
perfectly matching region of at least 15 nt at 45 C, and longer
than 15 nt at 55 C, or (ii) two uninterrupted matching
regions of at least 12 nt at 45 C or 15 nt at 55 C.
dInosine-containing probes restore hybridization to
targets containing sm
The targets containing sm were further tested against a
set of dInosine-containing probes: one probe (the Ino18
probe) contained 18 dInosines matching the sm in
the 33%9F target, two probes (21Ino_9nt50 and
21Ino_9nt30) contained 21 dInosines positioned as sm
leaving a region of matching 9 nt at either the 50 or
30 end; and one probe (Ino24) contained 24 dInosines in
every third base throughout the whole 70-mer probe.
Importantly, when dInosines matching the positions of
the sm in the targets were included in the probes, the
Ino18 probe hybridized with all the sm-containing
targets, although with slightly varying MFI (Figure 3A,
B and confirmed in C). The 33%9F target, which did not
hybridize with the InflA probe, was able to hybridize with
the Ino18 probe (66% of the total MFI, 3894 MFI at
45 C, Figure 3A; 55% of the total MFI, 2808 MFI
at 55 C, Figure 3B). Even the 33%915nt targets, with
two mismatching nt at the 30end not covered by dInosines,
hybridized well with the Ino18 probe (4851% of the total
MFI, 28182994 MFI at 45 C,Figure 3A; and 1632% of
the total MFI, 7781463 MFI at 55 C, Figure 3B). Thus,
as shown in the previous series of pm matched to
dInosine, when the sequence is interrupted too frequently
by mismatches, leaving no suitable regions of perfect
match, a probe with dInosines at the positions of variation
will restore the hybridization by effectively creating the
required longer matching region.
Introducing a dInosine in every third position
throughout the Ino24 probe decreased the hybridization
dramatically for all the 33% targets (616% of the total MFI,
361920 MFI at 45 C; 1% of the total MFI, 5379 MFI at
55 C). When utilizing the 21Ino_9nt50 probe with the
33%xnt targets, the MFI decreased because of the
mismatches created by the two sm at the 30 end of the
target (410% of the total MFI, 232415 MFI at 45 C).
In comparison, the 21Ino_9nt50 probe and the 33%xF
targets, with no mismatch, hybridized at 2643% of the
total MFI (16922898 MFI at 45 C). Furthermore, the
21Ino_9nt30, which did cover the two sm at the 30 end of
33%xnt, hybridized more strongly (1021% of total the
MFI, 450695 MFI at 45 C) than the 21Ino_9nt50probe
(410% of the total MFI, 232415 MFI at 45 C). Both
these results demonstrate that using the less stringent
temperature (45 C) permits hybridization even when a
large number of Inosines is present (21 dInosines in a
70-mer probe), as long as a matching region of at least
9 nt is formed in the hybrid.
The hybridization of a long probe containing dInosines is
comparable with that of a long degenerated probe with the
same number of N wobbles, under lower stringency
conditions
The effects of probes with dInosine or wobbles in the same
positions were also investigated in 3M TMAC. The
presence of a dInosine in a specific position instead of a
wobble would theoretically decrease the degeneration of
the probe and subsequently increase the concentration of
the particular probe variant. A probe with 21 N wobbles,
wobbN_21, at the same positions as the dInosines in the
Ino21 probe, was tested. The surprising result was that the
probe containing N wobbles hybridized very well with the
InflA target (29% of the total MFI, 2220 MFI, Figure 2C)
and the 21-pm target (29% of the total MFI, 2190 MFI,
Figure 2C). This is in the same range as hybridization of
the Ino21 probe with the InflA (26% of the total MFI,
1976 MFI, Figure 2C) and 21-pm (11% of the total MFI,
794 MFI, Figure 2C) targets (39 and 37% of the total MFI
versus the InflA and the 21-pm targets, see Figure 2A).
These results also demonstrate that the wobb_N21 probe
is not affected to the same extent as the Ino21 probe by
increasing the hybridization temperature from 45 to 55 C
(Figure 2C). The test was repeated by comparing an Ino18
with a wobbN_18 probe (Figure 3C and D). At 45 C, the
Ino18 probe hybridized at least as well as the wobbN_18
probe while, at 55 C, the wobbN_18 probe hybridized
better than the Ino18 probe. Interestingly, a probe
containing 24 wobbles still hybridized better with all 33%xnt
and 33%xF targets (3045% of the total MFI at 45 C;
1322% of total MFI at 55 C; Figure 3C) compared with
Ino24 (616% of the total MFI at 45 C; 1% of the total
MFI at 55 C; Figure 3A and B). Obviously, the 70-mer
probes can accommodate multiple degenerate positions
and still hybridize because the majority of probe molecules
will contain several long perfectly matching stretches
created by chance. This is further deliberated under
Discussion section.
The hybridization of a long probe containing dInosines
is stronger than that of a long probe containing
the same amount of 5-Nitroindole, at either high
or low temperatures
5-Nitroindole is a second-generation universal base nt
analog that was chosen for comparison with the
first-generation dInosine with respect to hybridization
properties in 3M TMAC. According to Loakes and
Brown (1994), 5-nitroindole is less destabilizing than its
4- and 6-isomers (27) and than 3-nitropyrrole (9). A probe
with 18 5-nitroindole residues (5-NitroInd_18) was
designed; the nt analogs were distributed to match the
pattern of the dInosines in the Ino18 probe (Figure 3D).
The probes were allowed to hybridize with the InflA target
and the set of targets with sm, 33%_xF and 33%_xnt
(Figure 3C). At 45 C, hybridization of 5-NitroInd_18
with the InflA (44% of the total MFI), 33%_xF
(2337% of the total MFI), and 33%_xnt (47% of the
total MFI) targets resulted in hybridization signals that
were much lower than those seen with the Ino18/InflA
(73% of the total MFI), Ino18/33%_xF (5478% of the
total MFI), and Ino18/33%_xnt (4759% of the total
MFI) probes (Figure 3C, 45 C). Increasing the
temperature to 55 C destabilized the 5-NitroInd_18 probe even
more, resulting in hybridization of only 18% of the
total MFI. In conclusion, dInosine functions much
better than 5-nitroindole as a universal nt analog, under
3 M TMAC buffer conditions.
Hybridization of a probe containing dInosine is
sensitive to mismatches neighboring the dInosine
position, aiding specificity
It has been shown above that when the dInosine and the
mismatch have the same distribution pattern, i.e. the
dInosine masks the mismatch, hybridization can be
restored (Figures 2A, B, 3A and B). We analyzed how
many mismatches outside the rescuing position of
dInosine (mismatch outside dInosine; mmoi) a probe can
tolerate. Norovirus, Ino18 and InflA probes were tested
against a set of targets whose sequences were designed to
range successively from a Norovirus sequence to the InflA
sequence, allowing different amounts of mismatch and
mmoi to be analyzed. Norovirus is a highly variable,
positive-sense RNA virus belonging to the Caliciviridae,
which causes winter vomiting disease. The Norovirus
sequence chosen (the capsid gene of the Norwalk-like
virus, accession nr AY274264), after Blastn with the
InflA probe sequence, has a short region of 8 nt that
perfectly matches the end region of the InflA probe and has
10 dispersed matching nt (Figure 4A). The Norovirus
target (70) 0_36_52 (0.8) (this code is explained below,
and in the legend to Figure 4) was gradually changed to
resemble the InflA target (0.8) 51_0_0 (70) by altering the
central nt sequence. A set of targets was also created
where the nine nt at the 50 end were changed into an
InflA sequence and the central region was gradually
changed from the Norovirus to the Influenza sequence,
starting with (0.61) 9_27_42 (9.8). The targets were
named according to the number of matching and
mismatching nt in comparison with the three probes: (nt
matching those of the Norovirus probe at 50 and 30)
mismatching nt versus the Norovirus probe_Ino18
probe_InflA probe (nt matching those of the InflA probe
at 50 and 30) (see Figure 4A; nt sequences in
Supplementary Table S3A and B).
Two targets had 26 mismatches, with different
dispersion patterns, after hybridization with the Norovirus or
InflA probes; the distributions are shown in the
(10.10) 26_10_26 (0.8) and (0.11) 26_10_26 (9.8) targets in
the upper and lower panels of Figure 4A. The InflA probe
did not hybridize with either of them, while the Noro
probe hybridized weakly with both: 13% of the total
MFI for the (0.11) 26_10_26 (9.8) target and 20% of the
total MFI for the (10.10) 26_10_26 (0.8) target. The%
MFI for the Noro probe was calculated by comparing
the MFI with that of the Noro probe/Noro target
hybridization. The stronger hybridization to the Noro probe
than to the InflA probe can be explained by the
distribution and length of the matching sequences between the 26
mismatches; the perfectly matching regions of 8 and 9 nt
were not long enough to induce hybridization to the InflA
probe, while one region of 11 nt (together with a 5-nt and
a 6-nt region) or two longer flanking regions of 10 nt in the
two targets was enough to induce hybridization with the
Noro probe.
When the Ino18 probe (middle panel of Figure 4A) was
used with a target with 1 mmoi [i.e. (0.10) 35_1_17 (9.8)] or
even 10 mmoi [(10.10) 26_10_26 (0.8)] located at the 50 and
30 ends of the target, outside the central region containing
dInosines, there was no or little inhibition of hybridization
at 45 C (71% of the total MFI, 4741 MFI, and 41% of the
total MFI, 2773 MFI, respectively). Interestingly, when
the 10 mmoi were evenly distributed within the region of
18 dInosines, as in the (0.11) 26_10_26 (9.8) target,
hybridization was lost (0.9% of the total MFI, 59 MFI). The
sensitivity to mmoi adjacent to dInosine was therefore
investigated further. Targets with increasing numbers (2,
4 or 5) of mmoi neighboring the positions of dInosines
successively reduced the hybridization signals: 2319 MFI
(35% of the total) for the (10.10)_24_12_28_(0.8) target,
927 MFI (13%) for the (10.10)_22_14_30_(0.8) target, and
375 MFI (5.4%) for the (10.10)_21_15_31_(0.8) target, all
at 45 C. Thus, dInosine is sensitive to neighboring
mismatches. A comparison of the sensitivity of a
dInosine-free probe (InflA) and a probe containing
dInosine (Ino18) to neighboring mismatches showed that
the dInosine-free InflA probe can hybridize to a target
with 17 evenly distributed mismatches between two
perfectly matching flanking sections of 9 and 8 nt, respectively
[the (0.10) 35_1_17 (9.8) target, 24% of the total MFI,
1614 MFI at 45 C]. However, 7 mmoi adjacent to the
dInosines completely destroyed hybridization between
the Ino18 probe and the (10.10) 19_17_33 (0.8) target:
0.5% of the total MFI, 34 MFI, at 45 C. Results
for 55 C are shown in Supplementary Figure S4B
and Table 3. The Ino18 probe failed to hybridize
when 2 mmoi were placed next to the dInosines
mm versus Noro probe_mm versus Ino18 probe_mm versus InflA probe
(Matching nt at 5' and 3' of Noro)
(Matching at in 5' and 3' of InflA)
Target mismatch versus Noro probe
(70) 0_36_52 (0.8)
(14.11) 7_29_45 (0.8)
(11.11) 17_19_35 (0.8)
(10.10) 19_17_33 (0.8)
(10.10) 21_15_31 (0.8)
(10.10) 22_14_30 (0.8)
(10.10) 24_12_28 (0.8)
(10.10) 26_10_26 (0.8)
(0.61) 9_27_42 (9.8)
(0.11) 16_20_36 (9.8)
(0.11) 26_10_26 (9.8)
(0.10) 35_ 1_17 (9.8)
(0.8) 52_ 0_ 0 (70)
(70) 0_36_52 (0.8)
(14.11) 7_29_45 (0.8)
(11.11) 17_19_35 (0.8)
(10.10) 19_17_33 (0.8)
(10.10) 21_15_31 (0.8)
(10.10) 22_14_30 (0.8)
(10.10) 24_12_28 (0.8)
(10.10) 26_10_26 (0.8)
(0.61) 9_27_42 (9.8)
(0.11) 16_20_36 (9.8)
(0.11) 26_10_26 (9.8)
(0.10) 35_ 1_17 (9.8)
(0.8) 52_ 0_ 0 (70)
(70) 0_36_52 (0.8)
(14.11) 7_29_45 (0.8)
(11.11) 17_19_35 (0.8)
(10.10) 19_17_33 (0.8)
(10.10) 21_15_31 (0.8)
(10.10) 22_14_30 (0.8)
(10.10) 24_12_28 (0.8)
(10.10) 26_10_26 (0.8)
(0.61) 9_27_42 (9.8)
(0.11) 16_20_36 (9.8)
(0.11) 26_10_26 (9.8)
(0.10) 35_ 1_17 (9.8)
(0.8) 52_ 0_ 0 (70)
Target mismatch versus Ino18 probe
Target mismatch versus InflA probe
Number of nt of InflA or Noro origin in
the respective target, using the Ino18 probe
[(10.10) 24_12_28 (0.8); 3.1% of the total MFI, 192 MFI at
55 C]. Thus, the hybridization capacity of a
dInosine-containing sequence is severely reduced when
the mismatch is adjacent to the dInosine.
Figure 4A also shows how introduction of many
dInosines affects the specificity. There was no
hybridization between the influenza probe Ino18 with its 18
dInosines and the Norovirus target (70) 0_36_52 (0.8).
Furthermore, although the dInosines mask 16 mismatches
in the (0.11) 26_10_26 (9.8) target, the 10 mmoi that are in
close proximity to the dInosines abolish hybridization
(0.9% of total MFI, 59 MFI). In contrast, the other
target containing 26 mismatching nt and 10 distant
mmoi, (10.10) 26_10_26 (0.8), did hybridize to the Ino18
probe (41% of total MFI, 2773 MFI). Thus, the cross
hybridization of a foreign (unrelated) dInosine-containing
probe is dependent to a certain extent on the amount of
mismatch, but is even more dependent on the distribution
of mmoi.
Figure 4B and Supplementary Table S6 show the origin
of the nt that are not covered by the dInosines when using
the Ino18 probe (MFI values from Figure 4A). They
demonstrate the number of InflA-matching nt outside the
dInosine position (moi) needed for hybridization and the
number of Norovirus moi causing cross hybridization. At
least 3738 InflA moi were needed to induce hybridization
with the Ino18 probe but, as mentioned above, the
distribution is at least as important as the actual number
of matching and mismatching nt. Of the fewer than
3031 nt that were of Norovirus origin in a target that
hybridized with the Ino18 probe, 16 nt were common to
both Norovirus and Influenza virus. If more than 31 nt
were of Norovirus origin, hybridization to the Influenza
Ino18 probe failed.
Detection of rotavirus from clinical samples using
dInosine- and degeneration-containing probes
The region chosen for the 77-nt rotavirus probe, positions
112178 in the alignment, was analyzed using the
haplotype function of ConSort , which decomposes highly
variable stretches into a small number of less variable
stretches (haplotypes). This resulted in two major
haplotypes and probes which potentially covered all rotavirus
group A variations recorded in GenBank. The two
haplotype probes contained 14 and 8 dInosines, respectively, in
combination with four degenerations. They were called
Ino14_w4 and Ino8_w4 (Figure 5C, Inosine as yellow
and wobbles as light grey boxes). Two additional probes
with fewer dInosines and more degenerations were also
created: Ino11_w7 and Ino5_w7 (Figure 5C). The
consensus sequence and the pattern of variation of the region
chosen for the probe are shown in Figure 5A. The
sequences of the four probes are shown in Supplementary
Table S4A and B and (schematically) in Figure 5C.
The degenerated LNA-containing fw-primer and the
long degenerated dInosine-containing biotinylated
rev-primer generated a single band of the correct size,
502 nt, when analyzed by electrophoresis using
EtBr-stained agarose gel in all five clinical samples (data
not shown). All four microsphere-bound probes detected
the consensus synthetic rotavirus target as well as the
amplified rotavirus nucleic acid from all five clinical
samples. Interestingly, the four probes hybridized almost
equally well within each sample. It was found that an
asymmetric PCR of the clinical samples was necessary in
order to obtain an MFI of reasonable strength from the
probes (data not shown). This was probably because the
complementary strand of the PCR product outcompeted
the probe due to an affinity between the two strands that
was higher than that between a dInosine-containing
degenerated probe and the target strand. The data in
Figure 5B are from one of the experiments using
samples run in duplicate. The PCR products were
sequenced (Supplementary Table S4), revealing that the
Ino14_w4 and Ino11_w7 probes, which belonged to the
same haplotype, covered the variations in all positions in
all samples. However, the other pair of probes (Ino8_w4
and Ino5_w7) had one mismatch against clinical samples 1
and 2 and two mismatches against clinical samples 3,
4 and 5, as well as the consensus synthetic rotavirus
target (magenta colored boxes in targets in Figure 5C,
Supplementary Table S4). In conclusion, the long
dInosine-containing degenerated probes worked well as
variation-tolerant probes, covering variations, accepting
a few mismatches, and still remaining specific (neither of
the Rotavirus probes hybridized with the InflA target).
Development of scores for local and distant cooperative
interactions
Once we had these experimental data, we tried to develop
a unifying view of them. The G predicted by the Visual
OmpTM 7.0 software was compared with the percentage of
the total MFI for each probe and target combination,
including dInosine-containing probes (Figure 2A, 3A
(70) 0_36_52 (0.8)
(1.14) 7_29_45 (0.8)
(11.11) 17_19_35 (0.8)
(10.10) 19_17_33 (0.8)
(10.10) 21_15_31 (0.8)
(10.10) 22_14_30 (0.8)
(10.10) 24_12_28 (0.8)
(10.10) 26_10_26 (0.8)
(0.61) 9_27_42 (9.8)
(0.11) 16_20_36 (9.8)
(0.11) 26_10_26 (9.8)
(0.10) 35_1_17 (9.8)
(0.8) 52_0_0 (70)
Nr of mmoi within 18sm region, %MFIa
5 mmoi, 5.4%
4 mmoi, 13%
2 mmoi, 35%
0 mmoi, 41%
0 mmoi, 71%
0 mmoi, 86%
26 mmoi, 0.4%
19 mmoi, 0.5%
9 mmoi, 0.4%
7 mmoi, 0.5%
26 mmoi, 0.4%
19 mmoi, 0.6%
9 mmoi, 0.9%
0 mmoi, 18.2%
0 mmoi, 40%
0 mmoi, 59%
26 mmoi, 0.7%
19 mmoi, 0.5%
9 mmoi, 0.6%
7 mmoi, 0.4%
5 mmoi, 0.5%
4 mmoi, 0.6%
2 mmoi, 3.1%
26 mmoi, 0.6%
19 mmoi, 0.6%
9 mmoi, 0.5%
The number of mmoi affected the outcome of hybridization (hybr) between the Ino18 probe and the respective target at 45 C
and 55 C.
a%MFIMFI of Ino18 probe and respective target / MFI of InflA probe and InflA target.
and 4A). For the sake of simplicity, data from
5-nitroindole and N-wobble-containing probes were
omitted. The results demonstrated that some probe/
target combinations that hybridized well in practice,
had very low predicted G values (Figure 6B), e.g. the
InflA probe hybridizing to the 74%12F (43% of the
total MFI, G = 20.62), 74%15F (77% of the total
MFI, G = 26.7), 74%18 nt (55% of the total MFI,
G = 29.24), and 74%22 nt (61% of the total MFI,
G = 34.55) targets.
When the results of the new NucZip scoring system
were scored against the % MFI in Figure 6A, which
shows all the target and probe combinations plotted in
Figure 6B, it was found that they were more highly
correlated with the experimental data than the predicted
G. To investigate these differences, each outlier in
Figure 6B was connected to its plot position in
Figure 6A; see Figure 7A and B. Figure 7 shows that
probetarget combinations containing many mismatches
and dInosines were the main causes of the lower
correlation between predicted and observed hybridization in
Figure 6B. However, hybridizations between a long
probe and a short target were not included in Figures 6
and 7. Nor were data from probes containing
5-nitroindole or N-wobbles, because a full investigation
such as this would require many more observations and
would be out of the scope of this article. The NucZip
results are further discussed in the Discussion section.
Thus, when the actual degree of hybridization for the
entire data set (265 probetarget combinations) was
matched with the predicted G in Visual OmpTM 7.0,
an only moderately precise correlation was obtained.
The hybridization of combinations involving many
mismatches and many dInosines was poorly predicted.
However, when the NucZip algorithm was used, a
higher degree of correlation was observed. The adjusted
determination coefficient (Ra2) was 0.8636, indicating that
87% of the variation was explained by the NucZip
algorithm, while the best fit of MFI% to the Visual OmpTM
7.0 predictions gave a determination coefficient of 0.7505,
indicating that 75% of the variation was explained by NN
theory (as embodied in Visual OmpTM 7.0). NN theory
was thus insufficient for predicting hybridization under
the hybridization conditions of our study. A high
number of mismatches and dInosines gave hybridization
predictions in Visual OmpTM 7.0 that were too low
(Figure 6B). The NucZip algorithm, which takes into
account the length of matching segments and
cooperativity effects within and between matching
oligonucleotide segments, increased the accuracy of
hybridization prediction. dInosines were scored intermediate
between matches and mismatches.
Other hybridization prediction algorithms are available
on the Internet. However, when we compared the delta G
predictions obtained from IDT Oligo Analyzer (http://
eu.idtdna.com/analyzer/Applications/OligoAnalyzer/),
which uses a proprietary algorithm, with our experimental
data, the correlation was poor (Supplementary Figure 2).
Although the exact experimental conditions (3M TMAC
and 45 C) were not represented, this is not likely to have
caused the low correlation.
Nucleic acid hybridization is fundamental to many
molecular biology applications, and is expected to grow
in significance as nanomedicine joins molecular medicine
at the cutting edge of research (28). In particular,
biomedical applications of hybridization such as detection of
variable viral target sequences are highly dependent on a
170nt C Rotavirus haplotype-based probes
5 3
214 aligned Rotaviruses
Target mismatch in clinical sample 1 and 2 against probe:
yS InflA
n
t
h
e
itc Rota
precise understanding of the process involved. A probe
that has a broad detection spectrum should be as
specific as current narrower probes while retaining the
ability to cover the biologically or clinically relevant
sequence variants of specific microbes. The design of
long mismatch-tolerant probes demands knowledge
about hybridization in the presence of mismatches,
degeneracy and nt analogs.
In pursuit of this level of understanding, and in order
to obtain reliable hybridization data, we chose to use the
Luminex suspension array system in our studies. The
inherent ability of the system to report the median of a
high number of measurements (i.e. measuring
hybridizations signals from 100 different beads) provides highly
reliable data. Moreover, hybridization equilibrium is
reached more rapidly using the suspension array system
(taking around 15 min) than by solid phase hybridizations
such as micro arrays (often overnight). A probe length of
70 nt was selected for our studies because of the elevated
mismatch tolerance of this length compared with shorter
probes (29). However, the advantage of the extended
length of the probe could possibly be countered by a
loss of specificity. Hybridization studies using long (50
or more nt) probes in a 3 M TMAC buffer system have
not been reported previously; in reports using other
hybridization systems, however, it has been suggested that
50-mer or 70-mer probes should contain no more than 15
20 contiguous nt complementary to non-targets (29,30).
Previously, the hybridization properties of long probes
have been analyzed using microarrays, with overnight
I80
F
M
G (Visual OMP 7.0)
[f0==if4(.x7x9070;bR*ln=(20).(19/3c)0;6yan;dy0R+a2a=*(10.8e6x3p6(].((aBb)s(Pxredxi0ct+edb*ln(2)(1/c))/b)c)))], where a = 274.1515, b = 1288.3093, c = 0.4614, x0 = 639.0365 and
(yat 45 C) of the 70-mer p0robes and 70-mer targets from theGthvreeerspuasnoeblsseprrveesdenhtyebdriidnizFaitgiounre(2%A,M3AFI)a.nTdh4eApewrceerentpalgoettMedFaIgfarionmst athllehpyrberdiidcitzeadtionGs
calculated in Visual OMPTM 7.0 (DNA Software). The predicted G was obtained for the interaction between probes and targets in 3M TMAC
buffer at a hybridization temperature of 45 C. The percentage MFI is the MFI signal of a probe hybridized with a study target divided by the MFI
signal of the same probe hybridized with its perfectly matching target (e.g. InflA probe against InflA target) at the same temperature. Regression lines
were calculated using the SigmaPlot dynamic curve fitting system. A five parameter sigmoidal function gave the highest correlation [f = y0+a/
(1+exp((x x0)/b))c], where a = 122.5959, b = 9.5493, c = 0.2793, x0 = 78.0478, y0 = 14.5630; R = 0.8689, and Ra2 = 0.7505).
hybridization in buffers containing formamide/SSC
(sodium chloride/sodium citrate) at 4253 C (3034).
Letowski et al. found that, under microarray conditions,
mismatches grouped at the 50 or 30 end of a 50-mer probe
affected the binding to a target less than if the mismatches
were distributed throughout or centered in the probe.
Furthermore, the 50-mer probes with mismatches
distributed along the whole probe were more destabilized
than the probes that had mismatches centered in the
duplex (34). When Deng et al. studied mismatches in
50-mer microarray probes with 17 pm in different
distribution patterns, they concluded that the signal intensity
was decreased more by evenly distributed than by
randomly distributed pm (32). When 60-mer
I
F
M 60
%
PredictedDG (Visual OMP 7.0)
oligonucleotides were hybridized in a microarray,
mismatches located near the middle of the probe
resulted in a greater reduction of signal intensity than
those located at the ends (33). Additionally, microarray
experiments with short oligonucleotides (1640 nt) and
one mismatch or a nt insertion in all positions show that
the hybridization signal decreases when the mismatching
portion is centered (35,36). The matching segments are
shorter when mismatches are centered than when they
are located peripherally. Our results, using
microsphere-bound 70-mer probes and Luminex
technology in a buffer containing 3 M TMAC, confirm that the
distribution of the mismatches is of great importance and
that hybridization is stronger when there are a few longer
uninterrupted sequences than when there are many short
sequences. These effects are formulated in the NucZip
algorithm.
Furthermore, it has previously been shown that two
different oligonucleotides of 18 nt, complementary to the
inner and outer portions of a 25 nt probe, could hybridize
in solution with equal efficiency but, when the probe was
coupled to a solid phase via a C6 linker, the 18-nt target
complementary to the outer part of the 25-nt probe bound
more efficiently than the 18-nt target complementary to
the inner part, close to the solid phase (3739), cf (33). On
our probes, the 50 end was coupled to the microspheres via
an amino C12 linker. We did not observe any significant
differences between matching stretches close to the bead
surface and far from it and conjecture that perhaps the
long linker allowed for greater accessibility.
It is possible to create a probe against a target with high
nt variation, such as an RNA virus, by using degenerated
bases at variable positions but the degeneracy of the
probe is dramatically increased by each wobbling base,
thus decreasing the effective probe concentration. For
instance, a sequence with two wobbling bases present
in nine positions of variation would give a degeneracy of
512 unique sequence combinations, while a target with 14
variations including A, T, C or G would demand a set of
268 106 unique probes (degeneracy 268 106). Honore
et al. successfully used 1823 nt probes with a degeneracy
of up to 512 in 3 M TMAC buffer (40). In the more
stringent PCR buffers, the usage of probes with a degeneracy
greater than 10 is not often reported (41).
Degenerated primers have the property of being
forgiving (4143). This is because the amplimer from a
previously successful primer is a target for the same pool
of primers in the next round of amplification, leading to
an accumulation of amplifiable targets. However, the
situation for a probe is different. A degenerate probe will
always face the same target variation. Therefore, universal
bases like dInosine may be more useful than degenerated
sites for probes, as long as the hybridization strength
(represented by Tm or G) is good enough. The introduction
into a probe of a universal base like dInosine instead of a
wobbling base reduces the complexity of the
oligonucleotide mixture and increases the actual number of
hybridizing oligonucleotides.
Previously, Honore et al. introduced up to three
dInosines in radioactively labelled short oligonucleotide
probes, 1823 nt, in dot-blot hybridization, using a
buffer containing 3 M TMAC (40). The aim was to
reduce the degeneracy in probes used for screening
cDNA libraries. They found that dInosines had a
slightly destabilizing effect on hybridization, especially
when hybridizing against A, G and T, but that this
could be minimized by reducing the hybridization
temperature. However, the behavior of dInosine in long
probes in 3 M TMAC has, to our knowledge, not
previously been systematically explored. The 3 M TMAC is
known to increase the binding contribution of the A:T
base pairs, resulting in a similar contribution to Tm to
that from the G:C base pairs. The high ionic strength
makes this an environment of relatively low hybridization
stringency. The general trend shown in our study (e.g.
Figure 2A and B), that dInosine in the probe decreases
hybridization in 3 M TMAC, indicates that TMAC did
not enhance the binding strength of dInosine base pairs
as much as that of A:T base pairs; cf (8). In a segment with
dInosines at every third base, such as in the Ino18 probe,
every matching nt neighbors a dInosine, i.e. it will not
bind as strongly as a probe containing neighboring
matches. Clearly, dInosine matches cause less
destabilization than mismatches, and allow hybridization of probe/
target combinations with many short matching segments,
like the InflA probe / 21-pm target combination. It is
reasonable to assume that when a probe fails to hybridize
due to a high number of mismatches in the target,
dInosines at these positions will restore hybridization,
since dInosine appears to bridge adjacent matching
stretches, increasing their ability to nucleate.
Our results confirm the findings of Honore et al., despite
differences in (i) methods of detection, (ii) hybridization
time, and (iii) length of probes. The experience gathered
in this work indicates that dInosine base pairing can be
considered intermediate between a match and a mismatch,
when carried out in 3 M TMAC. Furthermore, our results
indicate that dInosine causes less destabilization when
hybridized with a C and an A, than when hybridized
with a G and a T (7,8,40), in 3 M TMAC; e.g. the Ino18
probe hybridized more strongly with the 33%9nt
and 33%15nt targets than with the 33%12nt target
(Figure 3C and D and Supplementary Table S5). To
lessen this effect and to be able to use the same
hybridization temperature for a panel of probes containing no or
different amounts of dInosine, it is preferable for the
probes to be long, like the 70-mer probes investigated
here.
The effects of the universal base dInosine were also
compared with those of N wobbles. Thus, at the lower
temperature, a dInosine-containing probe hybridized
more strongly and, at the higher temperature, the N
wobble probe hybridized more strongly. To understand
how the highly degenerated probes, wobbN_21,
wobbN_18 and wobb_N24, hybridized so well, we
calculated the probability of randomly achieving an
extension of the matching regions at the 50 and 30 ends of the
wobbN_18 probe (Table 4). The probability that the
closest N wobble to either the 50 or 30 end would be a
perfect match is 0.5. Thus, 50% of the pool of degenerated
probes have a 3 nt longer perfect match (12+9 or 9+12
matching nt at the 50 and 30 flanking regions) which,
according to our results with non-degenerated probe/target
combinations, should lead to rather good hybridization of
the wobbN_18. In fact, wobbN_18 hybridization was
similar in strength to that of the InflA probe to the
33%12F or 33%15nt targets. Furthermore, the
probability of having several additional 5-nt matching regions in
the central region is also high, probably giving rise to
many more combinations in the same pool that matched
Matching nt in
flanking regions
(Nxx)12
a
a
0.25 2 = 0.5
(0.252) 2 = 0.125
(0.253) 2 = 0.03125
0.252 = 0.0625
The sequences of the wobbN_18 probe are displayed with x as the nt matching the target sequence and N as the wobbling nt A, C, G or T.
The probability of increasing the length of matching flanking 50 and 30 regions of the probe is calculated for a few examples.
aThere is a high probability of several additional matching regions of 5 nt in the central region, which will contribute to hybridization.
x, matching nt; N, wobble of A, C, G, or T.
even better. By restricting the wobbles to 3 (e.g. a D or a
B) or 2 (e.g. a Y or a T) nt, the probability of a match
becomes even greater. Thus, in a highly degenerated probe
with at least one continuous region of perfectly matching
nt, a large part of the pool will extend this region and
contribute to nucleation, zipping and hybridization. The
behavior of the highly degenerated probes is encouraging
and in accordance with the NucZip model, which predicts
that the high likelihood of several matching stretches of
5 nt or longer will result in significant hybridization.
One of the aims of this study was to investigate the
binding capacity of dInosine in 3M TMAC.
5-NitroIndole was chosen as a comparative universal
base. The results shown in Figure 3C demonstrate that
5-NitroIndole in the 5-NitroInd_18 probe had a much
greater destabilizing effect than dInosine in the Ino_18
probe, without the same capacity to rescue hybridization
with a target containing many sm. Furthermore, the
5NitroInd_18 probe was more affected than the Ino_18
probe when the hybridization temperature was raised
from 45 to 55 C. It is concluded that, under 3 M TMAC
buffer conditions, dInosine is a better choice than
5-NitroIndole when designing a variation-tolerant probe
with as little degeneration as possible.
Recently, Majlessi et al. (15) studied the nucleation
process during double helix formation of short probes,
1828 nt, with RNA or DNA targets of varying lengths
and number of mismatches, in a buffer containing lithium
succinate and lithium lauryl sulfate at pH 5.1.
Hybridization is initiated by random collisions, but
occasionally the complex is stable enough to nucleate the
hybridization process. After investigating their model, they
suggested that one nucleation region of 9 nt is not enough
for further zipping and formation of a double helix.
Instead, the first nucleation site needs a second nucleation
site so that they can then cooperatively induce the zipping
mechanism. They reported that inactivation of one of the
9-nt sites reduced hybridization >2-fold. Interestingly, one
complete turn of a dsDNA molecule consists of 10.4 nt
(44). It is conceivable that the first (often temporary)
contact between two single nucleic acid strands
(nucleation) should not exceed one turn of the dsDNA helix in
length, to avoid torsional disturbance and faulty
interlocking of the strands. From this point of view, the nucleation
site should be long enough to minimize false contacts, and
short enough to have minimal steric effects on the strands.
Nucleation sites of 69 nt fulfil these criteria. The chance
of two random single strands matching at a
hexanucleotide is 1/16 394, and at a nonanucleotide is 1/
1 048 576. A matching nonanucleotide will thus reduce the
ratio of random successful to unsuccessful nucleations, i.e.
those which do not lead to further hybridization in the
subsequent zipping phase, a million-fold. The subject is
far beyond the scope of this article; however, our data,
using 70-mer probes in 3 M TMAC buffer, reveal that a
target with two separate regions of 9 nt was enough for
efficient hybridization when several shorter regions of
56 nt were available between mismatches during the
hybridization process (InflA probe/26%9F, 2583 MFI).
Increasing the number of mismatches, i.e. shortening the
matching regions between the 9 nt flanking regions, caused
failure of hybridization (InflA probe/33%9F, 133 MFI
and InflA probe/74%9F target, 86 MFI). In contrast,
the Ino18 probe, with dInosines covering the mismatches
and nine matching nt at the 50 and 30 ends, hybridized
strongly (Ino18/33%9F, 3894 MFI). Furthermore, the
InflA probe/74%15nt or InflA probe/74%18nt
combinations showed that one region of 1518 nt was enough
to induce and sustain hybridization reproducibly, even if
the rest of the 70-mer probe contained 74% mismatches.
Having two perfectly matching regions of 15 nt (15F)
compared to one region of 15 nt (15 nt) gave 2.6-fold
higher MFI for 33%15F compared to 33%15 nt and
4.5-fold higher MFI for 74%15F compared to 74%15 nt,
at 45 C. The Ino24 probe, with no dInosine-free matching
trimers, hybridized inefficiently with all targets (361920
MFI at 45 C), showing that dInosine is relatively
inefficient in creating nucleating regions. Thus, two nucleation
sites of 9 nt are enough to cause hybridization in 3 M
TMAC when they are placed next to each other to form
a longer region or when there are enough shorter
matching regions of 56 nt between them. Alternatively,
dInosines could bridge the mismatches between the 9-nt
regions.
Earlier work has indicated that a region of 15 nt in a
50-mer probe or 20 nt in a 70-mer probe could cause
significant cross hybridization in microarray hybridization
experiments (30,31) and our data agree with this. A
70-mer probe is able to hybridize with a region of 15 nt
in an otherwise highly mismatching target in 3 M TMAC
(74%15 nt, Figure 3A). To summarize, the current study
shows that a dInosine-free probe of 70 nt needs (i) at least
three regions of at least six perfectly matching nt, (ii) two
stretches of 1215 perfectly matching nt, or (iii) one
stretch of 1518 perfectly matching nt to result in
measurable hybridization. Probes with a high number of
dInosines positioned at sites of variation need shorter
matching regions than dInosine-free probes. It is
suggested that this is probably because dInosine participates
in nucleation and zipping during the hybridization
process. Thus, a probe with 18 dInosines which match
mismatches in the target needs either (i) two regions of
9 nt if the hybridization temperature is 55 C, or (ii) one
region of nine perfectly matching nt at 45 C.
As also shown in the study, the risk of cross
hybridization when using an nt analog like dInosine is minimal,
since dInosine is sensitive to a mismatch in the position
next to it and >5 mmoi will reduce hybridization. On the
other hand, one should be aware that if an unintended
target has many mismatches covered by dInosine and
only a limited number of mmoi (<5 mmoi), this could
lead to cross hybridization and false positivity.
Furthermore, the assumption that sm all differ at
the third codon base is an oversimplification. Some
synonymous codons also differ at the first and second bases.
Thus, even if the Ino18 probe could hybridize when 11 of
17 trimers were intact, with perfect matches at bases 1 and
2, the tolerance to mmoi of a highly dInosine-substituted
probe like Ino18 is limited. Around half of its six surplus
trimers must be reserved for sm occurring at codon
positions 1 and 2. This leaves three trimers available for
non-synonymous mismatches. However, a long probe
is more likely than a short probe to have matches not
neighboring mismatches.
The NN theory was developed for hybridization of
short oligonucleotides in solution (45). Its application to
surface-bound oligonucleotides has not been precisely
studied. Hooybergs et al. studied hybridization of
30-mer surface-bound oligonucleotides with a 20-mer
linker to 30-mer targets in solution, with no, one or two
evenly spaced mismatches (46). Although NN theory was
approximately corroborated, NN factors had to be
recalculated to give an approximate fit to experimental
data. Moreover, the adsorptive (Langmuir) behavior
deviated from expectation at high target concentrations.
Thus, many unresolved questions regarding hybridization
behavior remain.
The concept of NucZip, with both local and distal
cooperativity contributions, is an attempt to predict the
hybridization behavior of most 70-mer probetarget
combinations under the given conditions (Figures 6A and 8).
The NucZip algorithm is now under revision to include
probes containing 1, 2, 3 or 4 nt wobbles, as well as taking
into account the results with the universal base
5-nitroindole. The algorithms, schematically described in
Figure 8, (i) were based on our experimental data
(described above) and (ii) included highly matching
nucleation sites extending beyond the neighboring nt. Thus,
unlike the NN theory, NucZip takes the effects of
matches at longer distances into account. The unique
contiguous tri- to pentadecamers
Total ZipScoremode2: 25+12+5+1=43 ZipScoredownstream: 49-dInosinefactor*(49-43)+(dInoCnr*dInoCfactor)
property of 3 M TMAC to provide a roughly equal
contribution to hybridization by A:T and G:C pairs justifies
simple computational approaches. The well known
additivity of binding contributions per nt inherent in
G calculations according to the NN theory (22,45)
indicates that hybridization can be treated in a relatively
simplistic way. Our concepts were based on the finding that
the longer the sequence of uninterrupted matching nt, the
more stable is the hybridization. Our calculations were
thus focused more on binding than on destabilization,
i.e. the use of positive rather than negative contributions.
One of the weaknesses of the NN theory is that it adds all
contributions, positive or negative, to a grand sum. The
negative contributions are subtracted for the whole
molecule, instead of in the local context where they
belong. In our approach, the binding is first assessed
locally, mimicking nucleation, and then extended
cooperatively to the whole molecule, mimicking the zipping
process. It is binding that keeps the hybrid together, and
it is thus logical to focus on binding.
NN theory does, to some degree, predict that the
distribution of the mismatches is an important factor insofar as
it affects the nearest neighbors. However, the
concentration of this theory on the nearest neighbor disregards the
importance of longer matching stretches. The importance
of mismatch distribution and uninterrupted matches is
exemplified by the stronger signal for hybridization of
the InflA probe with the 21-gm target (30% of the total
MFI, 2005 MFI at 45 C; 8% of the total MFI, 443 MFI at
55 C) compared to the abolished signal when using a
21-pm target. This is demonstrated in Table 2 by the
InflA probe hybridizations with targets containing 14 or
16 mismatches in different distribution patterns, where
long uninterrupted perfectly matching sequences
favoured hybridization. The cooperativity in
hybridization beyond the nearest neighbor that was noticed when
two or more matching trimers neighbored each other
strengthened hybridization more than when the same
numbers of matching trimers were separated by
mismatches. In Figure 3A (45 C), the InflA probe
hybridized better with 74%12nt (10% MFI), 74%15nt
(26% MFI), or 75%18nt (57% MFI) targets than with
the 74%9F target, with its 9 nt+9 nt of perfect match
(3% MFI). Furthermore, at the higher temperature
(Figure 3B, 55 C), the 74%18nt (37% MFI) and
74%22nt (66% MFI) targets resulted in good
hybridization while the 74%12F target, with its 12 nt +12 nt
perfectly matching regions, failed (3% MFI).
Three matching trimers account for approximately one
turn [10.4 nt (44)] of the helix of a dsDNA molecule. It is
thus likely that two long DNA strands become entangled
or conformationally committed when at least one turn of
the helix has been completed. Nucleation has to proceed
rapidly, without too much torsion and entanglement, to
allow many encounters in a short time. We envision that
contact between two strands extending to 9 nt allows rapid
comparison between strands with only local torsional
disturbance. If the brief contact does not achieve binding
strength over a certain threshold (the nucleation
threshold), the strands separate and new comparisons are made.
If the nucleation threshold is exceeded at initial contact,
binding extends further up and downstream, i.e. zipping
occurs. This proceeds as long as the binding strength
remains sufficient. The strands are held together
chemically by basebase interactions and topologically by the
multiple turns of intertwinement. It was a challenge to
model this process. In the NucZip model, the program
first tests for possible nucleation sites and selects the two
highest scoring sites for further evaluation. Zipping is then
performed up and downstream from each suggested
nucleation site. The zipping algorithm symbolizes the
successive cooperativity of binding by adding the number of
successive tri- to pentadecamers for each matching
segment, each of which ends either in a mismatch or at
the end of one or two of the oligonucleotides. The
contribution of added dInosine molecules is counted less than
those of proper matches. The highest scoring nucleation
point is chosen as the result. The results using degenerated
probes were in line with this NucZip theory. A long
matching segment created by a wobble position increased
hybridization strength beyond the contribution of the
additional single match.
At present, the NucZip model is intended for equally
long nt segments. Exceptions to the model found in the
experimental section of this work can be illustrated by the
remarkable difference in hybridization between the short
perfectly matching targets of 1522 nt (15 nt_free,
18 nt_free and 22 nt_free) and the longer 70-mer targets
(74%_15 nt, 74%_18 nt and 74%_22 nt) (Figure 3A, B
and D). Several groups have analyzed the effect of short
so-called dangling ends of 15 nt (2326), and report that
they appear to stabilize hybridization of the duplex.
Doctycz et al. observed that the dangling nt closest to
the duplex contributed most to the stabilizing effect (47).
The lengths of our long mismatching ends ranged from 55
to 48 nt with a 74%-nt mismatch and no matching trimers.
We speculate that the destabilization seen in a duplex with
two long mismatching ends, one from the probe and one
from the target, compared with a duplex between a long
and a short oligonucleotide, with only one long protruding
oligonucleotide, is due to shearing stress on the matching
nt, or to competition from intra-strand secondary
structures at the separate ends.
Although it can predict many aspects of long
oligonucleotide hybridization in 3 M TMAC at 45 C, the
NucZip concept has to be amended to include other
temperatures, oligonucleotide concentrations and buffers in
order to be generally useful. The NN model, which was
developed with great precision by SantaLucia et al.
(8,2022) and is used in the Visual OMPTM 7.0
computer program, is much more sophisticated than our
procedure. It was considered out of the scope of this
article to evaluate the Visual OMPTM 7.0 program in
detail; however, importantly, its hybridization conditions
can be adjusted. When Visual OMPTM 7.0 was used to
calculate G (Figure 6), the buffer and temperature
conditions were set at 3 M TMAC and 45 C. Comparison of
the NucZip and Visual OMPTM 7.0 models showed that
the distribution of the targets with the long dangling ends
(74%_Xnt) was corrected to a certain extent by NucZip;
however, neither NucZip nor Visual OMP 7.0 accurately
predicted the hybridization behavior of the short perfectly
matching probetarget combinations of the InflA probe
with the 15nt_free, 18nt_free and 22nt_free targets.
In conclusion, we have demonstrated that the
distribution of mismatches greatly affects probe hybridization. A
minimum number of continuous, perfectly matching
stretches (nucleation sites) is needed to initiate
hybridization. Thus, if the target contains many variations and no
long uninterrupted matching segments, use of a
dInosine-containing probe, which partially overcomes
the obstacles caused by mismatches, will be beneficial.
With respect to hybridization prediction algorithms, a
simple statement of percentage mismatch, as used in
many such algorithms, does not adequately reflect the
hybridization properties of a long duplex. The insertion of
dInosines in the positions of variation could detect target
nucleic acid that a consensus probe would fail to catch.
Hence, while high dInosine content in a probe decreases
the probes binding capacity, it also covers mismatches,
as required when creating mismatch-tolerant,
broaddetection probes against highly variable target nucleic
acid sequences such as those seen in RNA viruses. The
dInosine probes can be made even more forgiving by
using a lower hybridization temperature. Furthermore,
the probability of cross hybridization is low because the
mmoi neighboring the dInosines have a destabilizing effect
on hybridization. While the aim of this study was to
improve our understanding of variability and the effects
of dInosines, other universal bases and wobbles in the
design of probes to be used in a 3 M TMAC buffer
system, with more exploratory work the results may also
be relevant to other hybridization systems and could aid
the development of hybridization-based diagnostic tools,
including nanotechnological applications such as the
volume-amplified magnetic nanobead detection assay
(48,49).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
Knut and Alice Wallenberg foundation (grant #
KAW2007.0130); Academic Hospital, Uppsala, and the
Research Council of the Uppland/Va stmanland/O rebro/
Dalarna region (grant # RFR-86541). Funding for open
access charge: Knut and Alice Wallenberg foundation.
Conflict of interest statement. None declared.
REFERENCES