Mapping and phasing of structural variation in patient genomes using nanopore sequencing
ARTICLE
DOI: 10.1038/s41467-017-01343-4
OPEN
Mapping and phasing of structural variation in
patient genomes using nanopore sequencing
1234567890
Mircea Cretu Stancu1, Markus J. van Roosmalen1, Ivo Renkens1, Marleen M. Nieboer1, Sjors Middelkamp1,
Joep de Ligt 1, Giulia Pregno2, Daniela Giachino 2, Giorgia Mandrile2, Jose Espejo Valle-Inclan1,
Jerome Korzelius1, Ewart de Bruijn1, Edwin Cuppen3, Michael E. Talkowski4,5,6, Tobias Marschall 7,8,
Jeroen de Ridder1 & Wigard P. Kloosterman1
Despite improvements in genomics technology, the detection of structural variants (SVs)
from short-read sequencing still poses challenges, particularly for complex variation. Here we
analyse the genomes of two patients with congenital abnormalities using the MinION
nanopore sequencer and a novel computational pipeline—NanoSV. We demonstrate that
nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations,
which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genomewide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a
large proportion of which are retrotransposon insertions. We provide a first exploration of
patient genome sequencing with a nanopore sequencer and demonstrate the value of longread sequencing in mapping and phasing of SVs for both clinical and research applications.
1 Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, 3584 CG Utrecht, The Netherlands.
2 Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano 10043, Italy. 3 Department of Genetics and Cancer
Genomics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, 3584 CG Utrecht, The Netherlands. 4 Center for Genomic
Medicine, Massachusetts General Hospital, Boston, MA 02114, USA. 5 Department of Neurology, Harvard Medical School, Boston, MA 02115, USA.
6 Program in Population and Medical Genetics and Stanley Center for Psychiatric Research, The Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142,
USA. 7 Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany. 8 Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.
Mircea Cretu Stancu and Markus J. van Roosmalen contributed equally to this work. Correspondence and requests for materials should be addressed to
W.P.K. (email: )
NATURE COMMUNICATIONS | 8: 1326
| DOI: 10.1038/s41467-017-01343-4 | www.nature.com/naturecommunications
1
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01343-4
S
econd-generation DNA sequencing has become an essential
technology for research and diagnosis of human genetic
disease. Sequencing of human exomes has resulted in dramatic increases in novel gene discovery for Mendelian disorders1,
while whole-genome sequencing has revealed that a myriad of
diseases are caused by genetic changes that can occur both within
genes as well as in the noncoding genome2. As a result, genome
sequencing has seen rapid adoption in clinical decision making,
as the complete picture of a patient’s unique mutation profile
enables personalization of treatment strategies3,4.
Robust methods to detect structural variants (SVs) in human
genomes are essential, as SVs represent an important class of
genetic variation that accounts for a far greater number of variable bases than single nucleotide variations (SNVs)5. Moreover,
SVs have been implicated in a wide range of genetic disorders6.
A particularly revolutionary development in genome sequencing is the use of protein nanopores to measure DNA sequence
directly and in real time7. The first successful implementation of
this principle in a consumer device was achieved in 2014 by
Oxford Nanopore Technologies (ONT) with the introduction of
the MinION8. The MinION can sequence stretches of DNA of up
to hundreds of kilobases in length, which already resulted in the
sequencing of the genomes of several organisms9,10. Because
MinION-based sequencing requires almost no capital investment
and current devices have a very small footprint, mainstream
adoption of these sequencers has the potential to fundamentally
change the current paradigm of sequencing in centralized centers.
An important and natural application of the long reads produced by nanopore sequencing is identifying SVs. Long-read
sequencing is breaking ground for the discovery of SVs at an
unprecedented scale and depth11. The first success has been
achieved using the Pacific BioSciences SMRT long-read sequencing platform12,13, and alternative methods to capture long-range
information have been introduced, such as BioNano optical
mapping14 and 10× Genomics linked-read technology15. While
short-read next-generation sequencing data rely on multiple
(often) indirect sources of information in order to accurately
identify SVs, structural changes can be directly reflected in longread data.
In this work, we demonstrate sequencing of the whole diploid
human genomes of two patients on the MinION sequencer at
11–16× depth of coverage. The two patients suffer from congenital disease resulting from complex chromothripsis. We
employ a novel computational pipeline to demonstrate the feasibility of using MinION reads to detect de novo complex SV
breakpoints, at high sensitivity. The long reads from the MinION
allow efficient phasing of genetic variations (SNVs as well as SVs)
and enable us to resolve the long-range structure of the chromothripsis in the patients. Moreover, we identify a significant
proportion of SVs that are not detected in short-read Illumina
sequencing data of the same patient genomes.
Results
Sequencing of patient genomes with nanopore MinION. As a
first step toward real-time clinical genome sequencing, we evaluated the use of the MinION device to sequence the genomes of
two patients with multiple congenital abnormalities16, henceforth
denoted as Patient 1 and Patient 2, respectively.
We extracted DNA from patient cells and sequenced this on
the MinION. For Patient 1, we used R7, R9, and R9.4 pore
chemistries (Supplementary Table 1) generating a total of 8.2M
template sequencing reads from 122 sequencing runs. For Patient
2, we exclusively used R9.4 runs and performed only 13 runs
(1.89M reads), which required ~5 days of sequencing on seven
parallel MinION instruments at a cost of around $7000
2
(Supplementary Fig. 1), and produced a coverage of 11×. We
observed that 82.1% (Patient 1) and 98.9% (Patient 2) of these
reads could be mapped to the human reference genome and were
useful for further analyses. Read lengths were highly variable for
Patient 1, as a result of differences in library prep methods, with a
mean of 6.9 kb for template reads, while for Patient 2 we reached
an average of 16.2 kb with consistent (...truncated)