Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination

Nucleic Acids Research, Feb 2012

Standard Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization. Sequencing reads that pass through the junction of the two joined ends of a 3–5-kb DNA fragment are not easy to identify and pose problems during mapping and de novo assembly. Longer read lengths increase the possibility that a read will cross the junction. To solve this problem, we developed a mate-paired protocol for use with Illumina sequencing technology that uses Cre-Lox recombination instead of blunt end circularization. In this method, a LoxP sequence is incorporated at the junction site. This sequence allows screening reads for junctions without using a reference genome. Junction reads can be trimmed or split at the junction. Moreover, the location of the LoxP sequence in the reads distinguishes mate-paired reads from spurious paired-end reads. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae. We present an analysis of the library quality statistics and a new bio-informatics tool called DeLoxer that can be used to analyze an IlluminaCre-Lox mate-paired data set. We also demonstrate how the resulting data significantly improves a de novo assembly of the S. cerevisiae genome.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/40/3/e24.full.pdf

Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination

Filip Van Nieuwerburgh 2 Ryan C. Thompson 1 Jessica Ledesma 0 Dieter Deforce 2 Terry Gaasterland 1 Phillip Ordoukhanian 0 Steven R. Head 0 0 Next Generation Sequencing Core, The Scripps Research Institute , La Jolla, CA 92037, USA 1 Laboratory of Computational Genomics, Marine Biology Research Division, Scripps Institution of Oceanography , UCSD 2 Laboratory of Pharmaceutical Biotechnology, Ghent University , Harelbekestraat 72, 9000 Ghent, Belgium Standard Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization. Sequencing reads that pass through the junction of the two joined ends of a 3-5-kb DNA fragment are not easy to identify and pose problems during mapping and de novo assembly. Longer read lengths increase the possibility that a read will cross the junction. To solve this problem, we developed a mate-paired protocol for use with Illumina sequencing technology that uses Cre-Lox recombination instead of blunt end circularization. In this method, a LoxP sequence is incorporated at the junction site. This sequence allows screening reads for junctions without using a reference genome. Junction reads can be trimmed or split at the junction. Moreover, the location of the LoxP sequence in the reads distinguishes mate-paired reads from spurious paired-end reads. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae. We present an analysis of the library quality statistics and a new bio-informatics tool called DeLoxer that can be used to analyze an IlluminaCre-Lox matepaired data set. We also demonstrate how the resulting data significantly improves a de novo assembly of the S. cerevisiae genome. - Paired-end and mate-paired sequencing libraries both are methodologies that, in addition to sequence information, give information about the physical distance between the two reads in the reference genome. The ability to map reads to a reference using distance information is useful to resolve larger structural rearrangements (insertions, deletions, inversions). Distance information also has a major impact on the overall success of de novo assembly with short reads, helping to assemble across repetitive regions: if one read cannot be mapped because it falls in a highly repetitive region, but the paired read is unique, the distance information can be used to map both reads. When the two reads of a pair can be mapped to two different contiguous sequences from an assembly (contigs), they specify the contigs order, orientation and approximate distance in the genome. This ability greatly facilitates de novo genome assembly of complex organisms. The difference between paired-end and mate-paired is typically that mate-paired is used to indicate a longer insert size compared to paired-end, with insert sizes measuring between 2 and 20 kb. Illumina mate-paired libraries Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization and a secondary fragmentation step (1). A biotin molecule on the circularization junction is used to enrich for fragments containing the junction. Still, a typical Illumina mate-paired library will have fragments that lack the junction and map as paired-end reads with short inserts. When sequencing a mate-paired library, Illumina recommends a read length no longer than 36 bases. Although short reads are not ideal in de novo assembly of genomes with a high repeat content or when looking for structural variations, the 36-bp limit aims to decrease the possibility that a sequence read will pass through the junction of the two joined ends of a 3- to 5-kb DNA fragment. When using standard mapping software like the Illumina pipeline, such junction reads are discarded, since they would not align to the reference sequence. To map junction reads, specifically adapted software like the Novoalign mate-paired algorithm can be used to detect junction reads and split the read at the junction. Junction reads are problematic for de novo assembly software, where they can reduce the performance of the assembly. To further reduce the number of junction reads, Illumina recommends a final library size range selection of 400600 bp, which is larger than a typical paired-end library of 200300 bp. Increasing the size range of the library in the mate-paired protocol minimizes the number of sequence reads that will pass through a junction. Roche GS-FLX paired-end libraries In Roche GS-FLX library preparation, the 3- to 20-kb DNA fragments are circularized by a Crerecombinasemediated recombination event between LoxP sites, which are added to both ends of the fragment by ligating circularization adapters (2). The resulting circularized DNA molecules bear one recombined biotinylated circularization adapter sequence at the junction site. This LoxP sequence makes it possible to detect the junction computationally in paired read and split them at the junction without mapping to a reference sequence. Illumina mate-paired libraries using Cre-Lox To sequence Illumina mate-paired libraries with a read length >36 bp without running into the problem of a high percentage of unusable junction reads, we adapted the Illumina mate-paired protocol to use Cre-Lox recombination instead of blunt end circularization. In this way, a Cre-Lox sequence is incorporated between both joined ends at the junction site. This sequence allows screening for junction reads and makes it possible to trim or split those reads at the junction. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae DNA. We present an analysis of the library quality statistics: ratio of mate-paired reads versus paired-end reads, number of junction reads, fragment size statistics, yield of usable mate-paired bases and library diversity. We show that all of the read pairs identified as mate-pairs map to the reference genome with a mean distance of 3 kb. We also present a bioinformatics tool that can be used to analyze an IlluminaCre-Lox mate-paired data set and to produce FASTQ files containing categorized mate-paired, paired-end and LoxP negative reads which are split or trimmed at the junction site to eliminate LoxP adapter sequences. Finally, we show how the sequencing data resulting from the library improves a de novo assembly of the S. cerevisiae genome. The IlluminaCre-Lox mate-paired library preparation protocol presented here is similar to the Illumina Mate Pair Library v2 Sample Preparation Guide for 25 kb libraries (1). The first part of this protocol was modified to allow for Cre-Lox recombination instead of blunt end circularization. The protocol was also changed to achieve a higher yield of DNA that can be used in the PCR library amplification step. Doing so allows for using fewer PCR cycles, increasing library diversity and reducing PCR bias. Instead of nebulization o (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/40/3/e24.full.pdf
Article home page: http://nar.oxfordjournals.org/content/40/3/e24.abstract

Filip Van Nieuwerburgh, Ryan C. Thompson, Jessica Ledesma, Dieter Deforce, Terry Gaasterland, Phillip Ordoukhanian, Steven R. Head. Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination, Nucleic Acids Research, 2012, pp. e24-e24, 40/3, DOI: 10.1093/nar/gkr1000