Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations

Scientific Reports, Aug 2016

To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://www.nature.com/articles/srep31602.pdf

Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations

Abstract To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules. Introduction Transcriptome sequencing using short read technologies (Illumina HiSeq 25001, Ion Proton2) provides valuable information on transcript abundance, rare transcripts and variable transcription start or end sites. Nevertheless, inferring alternatively spliced isoforms of genes from short read data through statistical assignment of the most probable combination of exons is still computationally challenging and not very accurate3. Uneven read coverage4, complex splicing5 and potential sequencing bias6 complicates even more the task. The PacBio RS II7 zero–mode waveguide (ZMW) long-read sequencing technology has proven capable of characterizing the transcriptome in its native, full-length form unravelling novel gene isoforms not previously observed in RNA-seq experiments8. Recently, another long-read DNA sequencing technology based on nanopore sequencing (MinION) was introduced from Oxford Nanopore Technologies Ltd (ONT)9. Similar to PacBio RS II, the ONT MinION has been shown that can resolve the exon structure of mRNA molecules transcribed from genes with a large variety of isoforms10. Here, we assessed the ONT MinION performance for its ability to sequence the cDNAs with good accuracy in terms of cDNA abundance, sequence identity and in full length. To evaluate the performance of the ONT MinION platform, the cDNA of a commercially available defined set of 92 polyadenylated transcripts, that mimic mRNA species (ERCC RNA Spike-In mix), was sequenced with the Illumina HiSeq 2500 or MiSeq instruments, the PacBio RS II platform and the ONT MinION. Additionally, a complex cDNA population from a HEK-293 cell line was sequenced in the same sequencing platforms and the agreement in the cDNA abundance of the different transcript isoforms across the three platforms was assessed. The results indicate that the ONT MinION platform can sequence quantitatively cDNA molecules similar with the Illumina and PacBio RS II platforms paving the way for full length sequencing of cDNA molecules with nanopores. Results Sequencing the ERCC cDNA molecules on the ONT MinION platform We sequenced between 9,525 and 197,014 ERCC cDNA molecules in four different ONT MinION flow cells as presented in Table 1. We used two different versions of the ONT MinION flowcells an old version r7 and a newer version r7.3. The different types of reads produced from the ONT MinION platform has been described in detail in other studies11. Briefly, the ONT MinION platform can sequence both strands of the same DNA molecule at once due to the presence of a hairpin adaptor. The strand that is sequenced first is called the “template read”. The strand that is sequenced second is called the “complement read”. If both strands of the same molecule are sequenced, a consensus sequence (“2D read”) is produced from the “template read” and the “complement read”. The template reads group, the complement reads group and the 2D reads group are referred to as “read types” in this manuscript. After the sequencing, the ONT analysis pipeline separates the sequenced reads in two groups the “pass” and the “failed” group. The reads in the “pass” and “fail” groups are referred to as “high” and “low” quality read categories, respectively, in this manuscript. The “pass” group contains high quality 2D reads (average base quality score of the “2D read” >=9) along with their “template” and “complement” read sequences. Based on the data from this manuscript the median identity of the high quality template, the high quality complement and the high quality 2D reads is 67.9–70.7%, 68.7–69.7%, 83.5–87.5% respectively (Supplementary Fig. S1). The “failed” group contains the rest of the reads. The median identity (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/srep31602.pdf

Spyros Oikonomopoulos, Yu Chang Wang, Haig Djambazian, Dunarel Badescu, Jiannis Ragoussis. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations, Scientific Reports, 2016, Issue: 6, DOI: 10.1038/srep31602