Transposable element insertions in 1000 Swedish individuals
PLOS ONE
RESEARCH ARTICLE
Transposable element insertions in 1000
Swedish individuals
Kristine Bilgrav Saether1,2, Daniel Nilsson ID1,2,3, Håkan Thonberg1,3, Emma Tham1,3,
Adam Ameur ID4, Jesper Eisfeldt ID1,2,3*, Anna Lindstrand1,3
1 Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden, 2 Science for
Life Laboratory, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden,
3 Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden, 4 Science for Life
Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Bilgrav Saether K, Nilsson D, Thonberg H,
Tham E, Ameur A, Eisfeldt J, et al. (2023)
Transposable element insertions in 1000 Swedish
individuals. PLoS ONE 18(7): e0289346. https://
doi.org/10.1371/journal.pone.0289346
Editor: Ruslan Kalendar, University of Helsinki:
Helsingin Yliopisto, FINLAND
Received: March 2, 2023
Accepted: July 9, 2023
Published: July 28, 2023
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0289346
Copyright: © 2023 Bilgrav Saether et al. This is an
open access article distributed under the terms of
the Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The SweGen
transposable element (TE) database is available at
the SweFreq project website https://swefreq.nbis.
se/. The 1000 Genomes Project TE database is
available at Zenodo, via doi: 10.5281/zenodo.
*
Abstract
The majority of rare diseases are genetic, and regardless of advanced high-throughput
genomics-based investigations, 60% of patients remain undiagnosed. A major factor limiting
our ability to identify disease-causing alterations is a poor understanding of the morbid and
normal human genome. A major genomic contributor of which function and distribution
remain largely unstudied are the transposable elements (TE), which constitute 50% of our
genome. Here we aim to resolve this knowledge gap and increase the diagnostic yield of
rare disease patients investigated with clinical genome sequencing. To this end we characterized TE insertions in 1000 Swedish individuals from the SweGen dataset and 2504 individuals from the 1000 Genomes Project (1KGP), creating seven population-specific TE
insertion databases. Of note, 66% of TE insertions in SweGen were present at >1% in the
1KGP databases, proving that most insertions are common across populations. Focusing
on the rare TE insertions, we show that even though ~0.7% of those insertions affect protein
coding genes, they rarely affect known disease casing genes (<0.1%). Finally, we applied a
TE insertion identification workflow on two clinical cases where disease causing TE insertions were suspected and could verify the presence of pathogenic TE insertions in both.
Altogether we demonstrate the importance of TE insertion detection and highlight possible
clinical implications in rare disease diagnostics.
Introduction
In recent years, high-throughput genomics-based approaches such as whole genome sequencing (WGS) has been a great success with overall diagnostic yields of between 20 and 55% for
mixed rare diseases patient groups [1]. Although highly successful, an average of 60% of rare
disease patients investigated with clinical WGS still do not receive a molecular diagnosis [2].
The reasons for this could be due to the specific disease-causing gene is not described or that
the underlying cause is not (mono)genetic. However, equally important is a lack of knowledge
regarding the human genome composition and structure [3], such as noncoding variants, as
PLOS ONE | https://doi.org/10.1371/journal.pone.0289346 July 28, 2023
1 / 15
PLOS ONE
7875363. All raw WGS data is publicly available.
The SweGen dataset is available at https://swefreq.
nbis.se/ upon signing a data access agreement.
The 1000 Genomes Project data is openly available
at https://www.internationalgenome.org/.
Funding: The work was supported by the Swedish
Research Council (2017-02936, 2019- 395 02079),
Karolinska Institutet and the Stockholm Region
(FoUI-961630, FoUI-954569). The funders had no
role in study design, data collection and analysis,
decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Transposable element insertions in 1000 Swedish individuals
well as epigenetic modifications [4, 5]. To correctly interpret individual phenotype-genotype
interactions we first need to study the normal genome architecture.
The human genome contains an abundance of genetic variation, including single nucleotide variants (SNVs), small insertions and deletions (INDELS) as well as structural variation
(SV). SVs, generally defined as genetic variations larger than 50 base-pair (bp), contribute the
most to the genetic difference between individuals, as each SV comprise many bases [6]. There
are multiple SV subtypes such as copy number variants, insertions, inversion, translocations
[7], and one important, but understudied SV subgroup are the transposable element (TE)
insertions. TE are DNA sequences that compose around 50% of the human genome [8].
TE are DNA sequences that can change their position in the genome. Two main classes are
described, DNA transposons (Class II TE), which move through a cut and paste mechanism,
make up less than 2% of human genomes, and the more common retrotransposons (RTs,
Class I TE) that move through RNA intermediates. These are further subdivided into long terminal repeat (LTR) or non-LTR retrotransposons, characterized by the presence or absence of
LTRs [9].
The main LTR-RT in humans is the 9.5 kb human endogenous retrovirus (HERV), that
contains the four genes gag, pro, pol and env flanked by LTRs. HERVs are mostly found in heterochromatin and are often silenced by epigenetic signaling. Even so, HERV activity and transcription has been described in human auto-immunity disorders as well as in cancer [3]. The
three main non-LTR RTs in humans are (I) the 6 kb long-interspersed nuclear element 1 (L1),
(II) the 300 bp short-interspersed elements, SINEs (Alu) and (III) the 2 kb SINE-VNTR-Alu
(SVA). Of these, only the L1 elements are autonomous with functional transposon activity
[10]. They encode an RNA II promoter and encode proteins with both an endonuclease and
reverse transcriptase enabling reverse transcription into cDNA and integration into the
genome, a process called target primed reverse transcription (TPRT). The L1 proteins are cispreferent and p (...truncated)