A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

Genome Biology, Jan 2015

Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://genomebiology.com/content/pdf/s13059-015-0582-8.pdf

A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

Chapman et al. Genome Biology A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome Jarrod A Chapman 0 Martin Mascher Aydn Bulu Kerrie Barry 0 Evangelos Georganas Adam Session 1 Veronika Strnadova Jerry Jenkins 0 Sunish Sehgal Leonid Oliker Jeremy Schmutz 0 Katherine A Yelick Uwe Scholz Robbie Waugh Jesse A Poland Gary J Muehlbauer Nils Stein Daniel S Rokhsar 0 1 0 Department of Energy Joint Genome Institute , 2800 Mitchell Drive, Walnut Creek, CA 94598 , USA 1 Department of Molecular and Cell Biology, University of California , Berkeley, CA 94720 , USA Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population. - Background The feasibility of whole-genome shotgun (WGS) assembly of large and complex eukaryotic genomes was once a much-debated question [1,2]. The advent of nextgeneration sequencing and the comparative ease and speed with which WGS assemblies can be constructed for mammalian and many other genomes allowed sequencing projects to move beyond these concerns, accepting high quality draft genomes with nearly complete gene spaces. Some genomes, however, are larger and more complex than the typical mammalian genome, including those of salamanders (>20 gigabases (Gbp)) [3], hexaploid wheat (16 Gbp) [4,5], and conifers (20 Gbp) [6]. To mitigate some of the computational challenges of genome assembly from short next-generation sequencing reads for these more complex genomes, various divide and conquer strategies have been developed. These strategies include chromosome sorting and capture [5], large-insert-clone pooling [6,7], and large-clone tiling paths [5,8]. While each approach reduces the sequence assembly problem to a set of smaller, more tractable problems, they require substantial resource development in advance of sequencing. Many of the arguments against a whole-genome shotgun [2] remain valid today. WGS assemblies are often rough drafts consisting of numerous, small contigs with gaps of unknown size between them. Abundant transposable elements that often form nested structures are prone to collapse in WGS assembly, resulting in an underrepresentation and mis-assembly of repetitive sequences in the final assembly [9]. The experiences derived from sequencing large and highly repetitive plant genomes have made it clear that while WGS assemblies are typically able to deliver a rough draft of the nonrepetitive portion of a genome, true reference sequences with high contiguity and near-complete genome representation are only accessible following the paradigm of clone-by-clone-sequencing [10]. Despite their shortcomings, WGS approaches for large genomes [11] have important advantages that include (1) simplicity of library preparation and (2) uniformity of coverage. However, for very large (>10 Gbp), complex or polyploid genomes substantial computational resources may be required simply to manage the volume of data, and to address the challenge of resolving near-identical genomic sequences that are longer than the scale set by read length and pairing information. While the human WGS assembly [12] and other chromosome-scale mammalian assemblies (for example, mouse [13]) are computational tours de force, they ultimately rely on non-sequence data such as physical maps to assemble the chromosomes. The largest WGS assemblies that have been attempted to date (Norway spruce [6], white spruce [14] and loblolly pine [15], all approximately 20 Gbp) remain highly fragmented and are not yet organized into chromosomes. Importantly, whole genome assemblies of polyploid genomes have not yet been attempted. Instead, artificial diploids in the case of autopolyploids such as potato [16] or the progenitor species of allopolyploids such as wheat [17,18] and rapeseed [19] have been sequenced. Hexaploid bread wheat (Triticum aestivum L., 1C = 16 Gbp, 2n = 6x = 42) is one of the most important agricultural crops, along with rice and maize. It is widely believed, however, that the hexaploid wheat genome is recalcitrant to WGS assembly and genome-wide physical mapping due to a high repeat content and potential difficulties in separating homeologous loci in the different subgenomes, which are not problems with the diploid rice [20] and maize [21] genomes. An early attempt at a WGS assembly resulted in a highly fragmented and genetically unanchored assembly (...truncated)


This is a preview of a remote PDF: http://genomebiology.com/content/pdf/s13059-015-0582-8.pdf

Jarrod A Chapman, Martin Mascher, Aydın Buluç, Kerrie Barry, Evangelos Georganas, Adam Session, Veronika Strnadova, Jerry Jenkins, Sunish Sehgal, Leonid Oliker, Jeremy Schmutz, Katherine A Yelick, Uwe Scholz, Robbie Waugh, Jesse A Poland, Gary J Muehlbauer, Nils Stein, Daniel S Rokhsar. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome, Genome Biology, 2015, pp. 26, 16, DOI: 10.1186/s13059-015-0582-8