Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

PLOS ONE, Dec 2019

A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0161583&type=printable

Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

September Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data Yonglong Pan 0 1 Xiaoming Wang 0 1 Lin Liu 0 1 Hao Wang 0 1 Meizhong Luo 0 1 0 National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University , Wuhan, 430070 , China 1 Editor: Frank Alexander Feltus, Clemson University , UNITED STATES A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. - OPEN ACCESS Data Availability Statement: All relevant data are within the paper and its Supporting Information files. The main result data could be accessed publicly by visiting the website of http://gresource.hzau.edu.cn/ fgm. The raw data were uploaded to the database of European Nucleotide Archive on EMBL-EBI (https:// www.ebi.ac.uk/ena/) [study accession number: PRJEB12942]. Introduction Since 2005, the number of registered genome sequencing projects has doubled every two years, reaching 11,472 as of September, 2011 [ 1 ]. Recent projects have expended a tremendous amount of effort to sequence more complex genomes [ 2 ]. Many projects aimed to generate reference genome sequences for the genus or species of interest. A reference genome sequence is an important tool to explore genome structure and function, identify genomic variations, infer information about species evolution, and guide the genome assembly of closely related species [ 3–8 ]. However, in all cases, the high quality of a reference genome sequence is critical to ensure reliable outcomes [9]. Two approaches, clone-by-clone (CBC) and whole genome shotgun (WGS), were developed for whole genome sequencing [ 10–13 ]. WGS has been widely used along with high-throughput sequencing such as next-generation sequencing (NGS) technologies [14]. Due to the highdata collection and analysis, decision to publish, or preparation of the manuscript. throughput and cost-effective nature, many genomes have been sequenced using WGS/NGS. However, this approach suffers from the key problem that the NGS reads are too short to reliably locate and order scaffolds on chromosomes and complete chromosome assemblies, especially when a genome is large and contains an abundance of repetitive sequences, large gene families, and extensive segmental duplications [ 5 ]. As the development of the single-molecule sequencing or the third generation sequencing technology, longer sequencing reads and more continuous contigs could be obtained [ 15–17 ]. However, the technology alone is still difficult to complete sequences of complex genomes at the present. CBC does not suffer from these problems and is considered a “gold standard” for genome sequencing [ 18, 19 ]. In the CBC approach, a physical map is first constructed using large-insert clones, mainly bacterial artificial chromosomes (BACs) [20] and used as a framework for the allocation of assembled sequences to chromosomes [ 10, 12, 21 ]. Physical clone maps are also important tools for locating genes for map-based cloning [ 22, 23 ], assembling genomic repeats [24] and filling gaps [ 25 ]. Fingerprinting technology has been widely used for physical clone mapping [ 26–28 ]. In this technology, large insert clones such as BACs are fingerprinted with restriction enzyme(s), and the shared restriction bands are used to identify overlaps between clones [ 29 ]. This technology has been implemented in automated and high-throughput systems [ 26 ]. However, it is costly and has a limited resolution for large genome mapping [ 30 ]. Optical mapping [ 31 ], nanochannel genome mapping [ 32 ] and whole genome profiling (WGP) [ 30 ] methods have been developed as alternatives to con (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0161583&type=printable

Yonglong Pan, Xiaoming Wang, Lin Liu, Hao Wang, Meizhong Luo. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data, PLOS ONE, 2016, Volume 11, Issue 9, DOI: 10.1371/journal.pone.0161583