Transposable element insertions in 1000 Swedish individuals

PLOS ONE, Jul 2023

The majority of rare diseases are genetic, and regardless of advanced high-throughput genomics-based investigations, 60% of patients remain undiagnosed. A major factor limiting our ability to identify disease-causing alterations is a poor understanding of the morbid and normal human genome. A major genomic contributor of which function and distribution remain largely unstudied are the transposable elements (TE), which constitute 50% of our genome. Here we aim to resolve this knowledge gap and increase the diagnostic yield of rare disease patients investigated with clinical genome sequencing. To this end we characterized TE insertions in 1000 Swedish individuals from the SweGen dataset and 2504 individuals from the 1000 Genomes Project (1KGP), creating seven population-specific TE insertion databases. Of note, 66% of TE insertions in SweGen were present at >1% in the 1KGP databases, proving that most insertions are common across populations. Focusing on the rare TE insertions, we show that even though ~0.7% of those insertions affect protein coding genes, they rarely affect known disease casing genes (<0.1%). Finally, we applied a TE insertion identification workflow on two clinical cases where disease causing TE insertions were suspected and could verify the presence of pathogenic TE insertions in both. Altogether we demonstrate the importance of TE insertion detection and highlight possible clinical implications in rare disease diagnostics.

Transposable element insertions in 1000 Swedish individuals

PLOS ONE RESEARCH ARTICLE Transposable element insertions in 1000 Swedish individuals Kristine Bilgrav Saether1,2, Daniel Nilsson ID1,2,3, Håkan Thonberg1,3, Emma Tham1,3, Adam Ameur ID4, Jesper Eisfeldt ID1,2,3*, Anna Lindstrand1,3 1 Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden, 2 Science for Life Laboratory, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden, 3 Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden, 4 Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Bilgrav Saether K, Nilsson D, Thonberg H, Tham E, Ameur A, Eisfeldt J, et al. (2023) Transposable element insertions in 1000 Swedish individuals. PLoS ONE 18(7): e0289346. https:// doi.org/10.1371/journal.pone.0289346 Editor: Ruslan Kalendar, University of Helsinki: Helsingin Yliopisto, FINLAND Received: March 2, 2023 Accepted: July 9, 2023 Published: July 28, 2023 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0289346 Copyright: © 2023 Bilgrav Saether et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The SweGen transposable element (TE) database is available at the SweFreq project website https://swefreq.nbis. se/. The 1000 Genomes Project TE database is available at Zenodo, via doi: 10.5281/zenodo. * Abstract The majority of rare diseases are genetic, and regardless of advanced high-throughput genomics-based investigations, 60% of patients remain undiagnosed. A major factor limiting our ability to identify disease-causing alterations is a poor understanding of the morbid and normal human genome. A major genomic contributor of which function and distribution remain largely unstudied are the transposable elements (TE), which constitute 50% of our genome. Here we aim to resolve this knowledge gap and increase the diagnostic yield of rare disease patients investigated with clinical genome sequencing. To this end we characterized TE insertions in 1000 Swedish individuals from the SweGen dataset and 2504 individuals from the 1000 Genomes Project (1KGP), creating seven population-specific TE insertion databases. Of note, 66% of TE insertions in SweGen were present at >1% in the 1KGP databases, proving that most insertions are common across populations. Focusing on the rare TE insertions, we show that even though ~0.7% of those insertions affect protein coding genes, they rarely affect known disease casing genes (<0.1%). Finally, we applied a TE insertion identification workflow on two clinical cases where disease causing TE insertions were suspected and could verify the presence of pathogenic TE insertions in both. Altogether we demonstrate the importance of TE insertion detection and highlight possible clinical implications in rare disease diagnostics. Introduction In recent years, high-throughput genomics-based approaches such as whole genome sequencing (WGS) has been a great success with overall diagnostic yields of between 20 and 55% for mixed rare diseases patient groups [1]. Although highly successful, an average of 60% of rare disease patients investigated with clinical WGS still do not receive a molecular diagnosis [2]. The reasons for this could be due to the specific disease-causing gene is not described or that the underlying cause is not (mono)genetic. However, equally important is a lack of knowledge regarding the human genome composition and structure [3], such as noncoding variants, as PLOS ONE | https://doi.org/10.1371/journal.pone.0289346 July 28, 2023 1 / 15 PLOS ONE 7875363. All raw WGS data is publicly available. The SweGen dataset is available at https://swefreq. nbis.se/ upon signing a data access agreement. The 1000 Genomes Project data is openly available at https://www.internationalgenome.org/. Funding: The work was supported by the Swedish Research Council (2017-02936, 2019- 395 02079), Karolinska Institutet and the Stockholm Region (FoUI-961630, FoUI-954569). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Transposable element insertions in 1000 Swedish individuals well as epigenetic modifications [4, 5]. To correctly interpret individual phenotype-genotype interactions we first need to study the normal genome architecture. The human genome contains an abundance of genetic variation, including single nucleotide variants (SNVs), small insertions and deletions (INDELS) as well as structural variation (SV). SVs, generally defined as genetic variations larger than 50 base-pair (bp), contribute the most to the genetic difference between individuals, as each SV comprise many bases [6]. There are multiple SV subtypes such as copy number variants, insertions, inversion, translocations [7], and one important, but understudied SV subgroup are the transposable element (TE) insertions. TE are DNA sequences that compose around 50% of the human genome [8]. TE are DNA sequences that can change their position in the genome. Two main classes are described, DNA transposons (Class II TE), which move through a cut and paste mechanism, make up less than 2% of human genomes, and the more common retrotransposons (RTs, Class I TE) that move through RNA intermediates. These are further subdivided into long terminal repeat (LTR) or non-LTR retrotransposons, characterized by the presence or absence of LTRs [9]. The main LTR-RT in humans is the 9.5 kb human endogenous retrovirus (HERV), that contains the four genes gag, pro, pol and env flanked by LTRs. HERVs are mostly found in heterochromatin and are often silenced by epigenetic signaling. Even so, HERV activity and transcription has been described in human auto-immunity disorders as well as in cancer [3]. The three main non-LTR RTs in humans are (I) the 6 kb long-interspersed nuclear element 1 (L1), (II) the 300 bp short-interspersed elements, SINEs (Alu) and (III) the 2 kb SINE-VNTR-Alu (SVA). Of these, only the L1 elements are autonomous with functional transposon activity [10]. They encode an RNA II promoter and encode proteins with both an endonuclease and reverse transcriptase enabling reverse transcription into cDNA and integration into the genome, a process called target primed reverse transcription (TPRT). The L1 proteins are cispreferent and p (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0289346&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0289346

Kristine Bilgrav Saether, Daniel Nilsson, Håkan Thonberg, Emma Tham, Adam Ameur, Jesper Eisfeldt, Anna Lindstrand. Transposable element insertions in 1000 Swedish individuals, PLOS ONE, 2023, Volume 18, Issue 7, DOI: 10.1371/journal.pone.0289346