Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-13-403.pdf

Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales

Ke Bi 0 3 Dan Vanderpool 1 3 Sonal Singhal 0 2 3 Tyler Linderoth 0 2 3 Craig Moritz 0 2 3 Jeffrey M Good 1 3 0 Museum of Vertebrate Zoology, University of California, Berkeley , 3101 Valley Life Sciences Building, Berkeley, CA 94720-3160 , USA 1 Division of Biological 2 Department of Integrative Biology, University of California, Berkeley , 1005 Valley Life Sciences Building, Berkeley, CA 94720-3140 , USA 3 Sciences, University of Montana , Missoula, MT 59812 , USA Background: To date, exon capture has largely been restricted to species with fully sequenced genomes, which has precluded its application to lineages that lack high quality genomic resources. We developed a novel strategy for designing array-based exon capture in chipmunks (Tamias) based on de novo transcriptome assemblies. We evaluated the performance of our approach across specimens from four chipmunk species. Results: We selectively targeted 11,975 exons (~4 Mb) on custom capture arrays, and enriched over 99% of the targets in all libraries. The percentage of aligned reads was highly consistent (24.4-29.1%) across all specimens, including in multiplexing up to 20 barcoded individuals on a single array. Base coverage among specimens and within targets in each species library was uniform, and the performance of targets among independent exon captures was highly reproducible. There was no decrease in coverage among chipmunk species, which showed up to 1.5% sequence divergence in coding regions. We did observe a decline in capture performance of a subset of targets designed from a much more divergent ground squirrel genome (30 My), however, over 90% of the targets were also recovered. Final assemblies yielded over ten thousand orthologous loci (~3.6 Mb) with thousands of fixed and polymorphic SNPs among species identified. Conclusions: Our study demonstrates the potential of a transcriptome-enabled, multiplexed, exon capture method to create thousands of informative markers for population genomic and phylogenetic studies in non-model species across the tree of life. - Background High-throughput, next generation sequencing (NGS) technologies and associated bioinformatics tools have fundamentally changed the scale at which DNA sequence data can be gathered and analyzed [1]. NGS allows for a massive amount of sequence data to be affordably and quickly obtained. In principle, these approaches can be implemented without prior genomic knowledge of the focus species, thus offering tremendous potential for addressing various novel and longstanding evolutionary questions previously hampered by technology and cost [2]. NGS allows researchers to investigate genome-wide molecular, structural, and regulatory mechanisms underlying adaptation, diversification, and speciation [3]. NGS also enables comparative genome scans for polymorphism which can then be used to infer demography and selection [4]. Molecular phylogenetics also benefits from the increasing accessibility of NGS. Large-scale, multilocus data (i.e., hundreds to thousands of loci) combined with improved analytical tools for inferring gene trees, provides unprecedented opportunities for resolving species phylogenies [5]. Toward this end, a core challenge of population genomic and phylogenetic studies is obtaining a reliable set of orthologous loci from a sufficient number of individuals across populations or species spanning a range of divergences [6]. Even though the cost of NGS continues to fall, most evolutionary labs cannot sequence whole genomes or a large portion of genomic regions from samples spanning divergent clades. Moreover, whole genome data simply is not necessary to answer many research questions. In this context, genome partitioning and targeted re-sequencing of a consistent subset of genomic regions will remain the most cost-effective and analytically straightforward approach for most evolutionary applications. Genome partitioning with targeted DNA capture allows for the selective NGS of thousands of genomic regions [7], facilitating rapid assays of genetic variation. Compared to partitioning methods that search for anonymous markers (i.e. restriction site associated DNA tags, or RADtags [8], DNA capture is expected to be more efficient for finding orthologous markers among divergent genomes [6,9,10]. When applied to exonic regions, DNA capture can also provide information on gene function and evolution. Exon capture involves the hybridization of genomic libraries to short oligonucleotide baits complementary to complete or partial exomes printed on a microarray [7] or attached to magnetic beads in solution [11]. The captured exon-containing DNA fragments of individual or pooled genomic libraries are then eluted from the array and the target-enriched elute is sequenced using an NGS platform. To date, the design of exon capture relies heavily on existing high quality genomic resources (e.g. [12]). However, the genomes of most organisms of ecological and evolutionary interest are yet to be sequenced, which has largely impeded the expansion of DNA capture across the tree of life. In this study, we propose a series of methods (Figure 1) aimed at adapting exon capture based NGS to organisms without pre-existing reference genomes. Here we focused on array-based capture but note that the same general principles should directly extend to an insolution approach. We focused on North American chipmunks of the genus Tamias to test our methods. Tamias are the focus of a comprehensive set of studies that aim to understand their evolutionary history, patterns of hybridization, and gene introgression (e.g., [13,14]). There is no reference genome currently available for this group; at the onset of our study the most closely related genomic resource was a low-coverage (2X) draft genome of the thirteen-lined ground squirrel (Ictidomys tridecemlineatus), which is around 30 million years (My) divergent from Tamias. The house mouse (Mus musculus) and rat (Rattus norvegicus) are the closest high-quality reference genomes, but last shared a common ancestor with chipmunks around 70 My. In this context, we developed genomic resources by first sequencing multi-tissue transcriptomes from one chipmunk species (the alpine chipmunk, Tamias alpinus), and then designed arrays by targeting a subset of exons from the annotated transcripts. Furthermore, to test how Figure 1 An overall work flow of this study. The Tamias phylogenetic tree is modified from [13] by replacing the outgroup species with T. striatus. The Tamias species that were not under investigation in the present study are not shown. increased divergence affects capture efficiency, we included anonymous genomic targets from the thirteenlined ground squirrel on this array. We then tested the feasibility of this approach by using these arrays to capture sequence from four chipmunk species, spanning the range of genetic divergence in this genus. Up to 20 individually indexed genomic libraries from each species wer (...truncated)