Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-024-45385-x.pdf

Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes

Article https://doi.org/10.1038/s41467-024-45385-x Tigerﬁsh designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes Received: 16 March 2023 1234567890():,; 1234567890():,; Accepted: 22 January 2024 Check for updates Robin Aguilar1, Conor K. Camplisson1, Qiaoyi Lin1, Karen H. Miga William S. Noble 1,4 & Brian J. Beliveau 1,5,6 2,3 , Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate > 100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of speciﬁc probes against such regions presents signiﬁcant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerﬁsh, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerﬁsh by designing a panel of 24 interval-speciﬁc repeat probes speciﬁc to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerﬁsh extends the powerful toolkit of oligo-based FISH to highly repetitive DNA. Fluorescent in situ hybridization (FISH) is a powerful technique that can reveal the spatial positioning and abundance of DNA and RNA molecules in ﬁxed samples with subcellular resolution. Since their introduction in 19691, ISH and later FISH2–4 methods have been reﬁned to improve their detection efﬁciency and sensitivity5. One important technical development has been the introduction of synthetic DNA oligonucleotides (oligos) as a source of probe material6. Oligo-based probes offer important advantages over more traditional probes deriving from isolated genomic material, as oligo probes can be designed to have speciﬁc thermodynamic properties and programmed to contain stretches of exogenous sequences that can serve as ‘readout’ domains via the ‘secondary’ hybridization of a labeled, complementary oligo. These advantages have led to the introduction of a growing set of ‘spatial genomics’ and ‘spatial transcriptomics’ methods that use complex ‘probe sets’ of many distinct oligo species7–10 in combination with iterative rounds of secondary hybridization to visualize dozens or more genomic regions11–14 and thousands or more RNA species15–17, respectively, in the same cell or tissue sample. The rapid adoption of oligo probes as a source of FISH probe material has also catalyzed the parallel development of computational tools for oligo probe design. These tools—which include OligoArray18, PROBER19, Chorus20, mathFISH21, OligoMiner22, iFISH23, ProbeDealer24, Chorus225, and PaintSHOP26—aim to identify short windows of genomic sequence that have suitable thermodynamic and sequence properties 1 Department of Genome Sciences, University of Washington, Seattle, WA, USA. 2Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA. 3UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA. 4Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. 5Brotman Baty Institute for Precision Medicine, Seattle, WA, USA. 6Institute for Stem Cell and e-mail: ; Regenerative Medicine, University of Washington, Seattle, WA, USA. Nature Communications | (2024)15:1027 1 Article to serve as FISH probes. Once identiﬁed, ‘candidate’ probes are next screened for speciﬁcity to predict whether they will have off-target sites in addition to their intended target. This speciﬁcity screening typically relies on using alignment programs such as BLAST27 or Bowtie228 to search for regions with high sequence similarity to the candidate probes, the use of k-mer counting programs such as Jellyﬁsh29 to assess whether the candidate probes contain k-mers (i.e., substrings) with high abundance in the genome of interest, or a combination of both approaches. After this speciﬁcity screening, candidate probes with predicted off-target binding are ﬁltered and a ﬁnal set of target-speciﬁc oligo probes is returned. A key advantage of oligo probes is that they can be designed speciﬁcally to avoid targeting repetitive sequences. Repetitive sequences are frequent sources of unwanted background when performing in situ hybridization experiments due to their high copy number, and a set of “suppressive hybridization” methods using unlabeled repetitive DNA from the C0t-1 fraction30 as a blocking agent have been introduced to abrogate this background when using probes derived directly from genomic material31–33. Such blocking agents are generally not needed when using oligo probes, however, as computational oligo probe design methods either avoid discovering candidate probes in sequence annotated as being repetitive by tools like RepeatMasker18–20,22,26,34 or purposefully ﬁlter candidate probes that align many times to the genome18,20–26 or contain highly abundant kmers22,23,25,26. As a result, while computational oligo probe design tools are able to operate at the scale of whole plant and mammalian genomes to produce repositories of tens of millions of oligo probes23,26, a substantial fraction of large and complex genomes remains intentionally uncovered due to the presence of repetitive sequences. Repetitive DNA accounts for ~50% of the human and mouse genomes and often even higher percentages in the genomes of plants30,35,36. Broadly, repetitive DNA falls into two categories: 1) Interspersed repeats such as SINE, LINE, and ALU elements that often occur as short, spatially isolated intervals within larger blocks of nonrepetitive sequence35; 2) long tandem repeat arrays such as alpha satellite, human satellites 1–3, and the 45 S ribosomal DNA at which a single monomer is repeated many times to form multi-megabase intervals of repetitive sequence that are frequently located in pericentromeric regions and on the short arms of acrocentric chromosomes36,37. Collectively, repetitive DNA sequences are central to a set of diverse and essential cellular and organismal functions, including the recruitment of the chromosome segregation machinery during mitosis, the encoding of essential information such as the 47 S rRNA38 and the replication-dependent histone genes39, and the protection of chromosome ends40. Moreover, repetitive sequences are an important source of novel genic and regulatory sequences41 and are hypothesized to be actively involved in potent evolutionary processes such as meiotic drive and speciation42. Thus, more detailed studies of highly repetitive DNA regions and their transcription products through low-cost targeted assays such as FISH may hel (...truncated)