Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes
Article
https://doi.org/10.1038/s41467-024-45385-x
Tigerfish designs oligonucleotide-based in
situ hybridization probes targeting intervals
of highly repetitive DNA at the scale of
genomes
Received: 16 March 2023
1234567890():,;
1234567890():,;
Accepted: 22 January 2024
Check for updates
Robin Aguilar1, Conor K. Camplisson1, Qiaoyi Lin1, Karen H. Miga
William S. Noble 1,4 & Brian J. Beliveau 1,5,6
2,3
,
Fluorescent in situ hybridization (FISH) is a powerful method for the targeted
visualization of nucleic acids in their native contexts. Recent technological
advances have leveraged computationally designed oligonucleotide (oligo)
probes to interrogate > 100 distinct targets in the same sample, pushing the
boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions
presents significant technical challenges. Consequently, many open questions
remain about the organization and function of highly repetitive sequences.
Here, we introduce Tigerfish, a software tool for the genome-scale design of
oligo probes against repetitive DNA intervals. We showcase Tigerfish by
designing a panel of 24 interval-specific repeat probes specific to each of the
24 human chromosomes and imaging this panel on metaphase spreads and in
interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to
highly repetitive DNA.
Fluorescent in situ hybridization (FISH) is a powerful technique that
can reveal the spatial positioning and abundance of DNA and RNA
molecules in fixed samples with subcellular resolution. Since their
introduction in 19691, ISH and later FISH2–4 methods have been refined
to improve their detection efficiency and sensitivity5. One important
technical development has been the introduction of synthetic DNA
oligonucleotides (oligos) as a source of probe material6. Oligo-based
probes offer important advantages over more traditional probes
deriving from isolated genomic material, as oligo probes can be
designed to have specific thermodynamic properties and programmed to contain stretches of exogenous sequences that can serve
as ‘readout’ domains via the ‘secondary’ hybridization of a labeled,
complementary oligo. These advantages have led to the introduction
of a growing set of ‘spatial genomics’ and ‘spatial transcriptomics’
methods that use complex ‘probe sets’ of many distinct oligo
species7–10 in combination with iterative rounds of secondary hybridization to visualize dozens or more genomic regions11–14 and thousands
or more RNA species15–17, respectively, in the same cell or tissue sample.
The rapid adoption of oligo probes as a source of FISH probe
material has also catalyzed the parallel development of computational
tools for oligo probe design. These tools—which include OligoArray18,
PROBER19, Chorus20, mathFISH21, OligoMiner22, iFISH23, ProbeDealer24,
Chorus225, and PaintSHOP26—aim to identify short windows of genomic
sequence that have suitable thermodynamic and sequence properties
1
Department of Genome Sciences, University of Washington, Seattle, WA, USA. 2Department of Biomolecular Engineering, University of California Santa Cruz,
Santa Cruz, CA, USA. 3UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA. 4Paul G. Allen School of Computer Science and
Engineering, University of Washington, Seattle, WA, USA. 5Brotman Baty Institute for Precision Medicine, Seattle, WA, USA. 6Institute for Stem Cell and
e-mail: ;
Regenerative Medicine, University of Washington, Seattle, WA, USA.
Nature Communications | (2024)15:1027
1
Article
to serve as FISH probes. Once identified, ‘candidate’ probes are next
screened for specificity to predict whether they will have off-target
sites in addition to their intended target. This specificity screening
typically relies on using alignment programs such as BLAST27 or
Bowtie228 to search for regions with high sequence similarity to the
candidate probes, the use of k-mer counting programs such as
Jellyfish29 to assess whether the candidate probes contain k-mers (i.e.,
substrings) with high abundance in the genome of interest, or a
combination of both approaches. After this specificity screening,
candidate probes with predicted off-target binding are filtered and a
final set of target-specific oligo probes is returned.
A key advantage of oligo probes is that they can be designed
specifically to avoid targeting repetitive sequences. Repetitive
sequences are frequent sources of unwanted background when performing in situ hybridization experiments due to their high copy
number, and a set of “suppressive hybridization” methods using
unlabeled repetitive DNA from the C0t-1 fraction30 as a blocking agent
have been introduced to abrogate this background when using probes
derived directly from genomic material31–33. Such blocking agents are
generally not needed when using oligo probes, however, as computational oligo probe design methods either avoid discovering candidate probes in sequence annotated as being repetitive by tools like
RepeatMasker18–20,22,26,34 or purposefully filter candidate probes that
align many times to the genome18,20–26 or contain highly abundant kmers22,23,25,26. As a result, while computational oligo probe design tools
are able to operate at the scale of whole plant and mammalian genomes to produce repositories of tens of millions of oligo probes23,26, a
substantial fraction of large and complex genomes remains intentionally uncovered due to the presence of repetitive sequences.
Repetitive DNA accounts for ~50% of the human and mouse genomes and often even higher percentages in the genomes of
plants30,35,36. Broadly, repetitive DNA falls into two categories: 1)
Interspersed repeats such as SINE, LINE, and ALU elements that often
occur as short, spatially isolated intervals within larger blocks of nonrepetitive sequence35; 2) long tandem repeat arrays such as alpha
satellite, human satellites 1–3, and the 45 S ribosomal DNA at which a
single monomer is repeated many times to form multi-megabase
intervals of repetitive sequence that are frequently located in pericentromeric regions and on the short arms of acrocentric
chromosomes36,37. Collectively, repetitive DNA sequences are central
to a set of diverse and essential cellular and organismal functions,
including the recruitment of the chromosome segregation machinery
during mitosis, the encoding of essential information such as the 47 S
rRNA38 and the replication-dependent histone genes39, and the protection of chromosome ends40. Moreover, repetitive sequences are an
important source of novel genic and regulatory sequences41 and are
hypothesized to be actively involved in potent evolutionary processes
such as meiotic drive and speciation42. Thus, more detailed studies of
highly repetitive DNA regions and their transcription products
through low-cost targeted assays such as FISH may hel (...truncated)