Phylogenetic Properties of 50 Nuclear Loci in Medicago (Leguminosae) Generated Using Multiplexed Sequence Capture and Next-Generation Sequencing (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0109704&type=printable

Phylogenetic Properties of 50 Nuclear Loci in Medicago (Leguminosae) Generated Using Multiplexed Sequence Capture and Next-Generation Sequencing

October Phylogenetic Properties of 50 Nuclear Loci in Medicago (Leguminosae) Generated Using Multiplexed Sequence Capture and Next- Generation Sequencing Filipe de Sousa 0 1 2 Yann J. K. Bertrand 0 1 2 Stephan Nylinder 0 1 2 Bengt Oxelman 0 1 2 Jonna S. Eriksson 0 1 2 Bernard E. Pfeil 0 1 2 0 1 Department of Biological and Environmental Sciences, University of Gothenburg , Gothenburg , Sweden , 2 Department of Botany, Swedish Museum of Natural History , Stockholm , Sweden 1 Data Availability Statement: All relevant data are within the paper and its Supporting Information files and all sequence files are available at the European Nucleotide Archive with accession numbers ERS511665 , ERS511666, ERS511667, ERS511668, ERS511669 2 Academic Editor: Sven Buerki, Royal Botanic Gardens , Kew, UNITED KINGDOM Next-generation sequencing technology has increased the capacity to generate molecular data for plant biological research, including phylogenetics, and can potentially contribute to resolving complex phylogenetic problems. The evolutionary history of Medicago L. (Leguminosae: Trifoliae) remains unresolved due to incongruence between published phylogenies. Identification of the processes causing this genealogical incongruence is essential for the inference of a correct species phylogeny of the genus and requires that more molecular data, preferably from low-copy nuclear genes, are obtained across different species. Here we report the development of 50 novel LCN markers in Medicago and assess the phylogenetic properties of each marker. We used the genomic resources available for Medicago truncatula Gaertn., hybridisation-based gene enrichment (sequence capture) techniques and Next-Generation Sequencing to generate sequences. This alternative proves to be a cost-effective approach to amplicon sequencing in phylogenetic studies at the genus or tribe level and allows for an increase in number and size of targeted loci. Substitution rate estimates for each of the 50 loci are provided, and an overview of the variation in substitution rates among a large number of low-copy nuclear genes in plants is presented for the first time. Aligned sequences of major species lineages of Medicago and its sister genus are made available and can be used in further probe development for sequence-capture of the same markers. - Funding: This work was supported by grants from the Swedish Research Council, the Royal Swedish Academy of Sciences (grant 2009-5206), Lars Hiertas Minne fund, The Royal Physiographic Society in Lund, Helge Ax:son Johnsons fund, and the The development and rapidly growing capacity of next-generation sequencing (NGS) has greatly increased the amount of data generated for research in plant biology. Large datasets of molecular sequences are now being collected across various model and non-model organisms by sequencing whole genomes, transcriptomes, or through enrichment of multiple genes at Competing Interests: The authors have declared that no competing interests exist. either specific or anonymous loci [1]. Systematic biology is also set to benefit from these developments, with several projects having already used NGS to obtain data [25]. However, the application of NGS in phylogenetics is still in its infancy and far from routine, partly because there has been no consensus on the choice of sampling strategy [6]. Whole genome sequencing has been used to explore individual variation at the genomic level in plants [78] but, due to its high price, is not expected to be widely applied for plant phylogenetic research in the near future. Anonymous locus approaches, such as restrictionsite-associated (RAD) tags [9] have been successfully used to solve species relationships [3], [10], but do not always result in good overlap among samples, which may compromise the overall cost efficiency of these methods. Furthermore, anonymous loci are likely to have higher levels of paralogy and a short phylogenetic span [11]. Genome skimming approaches [12] can be used to sequence the high-copy fraction of plant genomes (cpDNA, mtDNA, rDNA) and, to some extent, to identify nuclear loci, but in the latter case the amount of information obtained is limited and highly dependent on sequencing depth and genome size. Hybridisation-based enrichment (or sequence capture), on the other hand, appears to have great potential to solve these challenges by selecting, a priori, loci of interest, or those that have suitable parameters for analysis, to generate larger and more informative data sets if compared to other genomic sampling strategies [13]. Sequence-capture has already been used in phylogenetics and phylogeography, in both plants and animals [2], [45], [1417] and is likely to replace PCR as the main target enrichment method in plant sciences [1], [18]. One or more genomes or transcriptomes are necessary for probe design prior to sequence-capture, but for groups where a close reference is lacking, protocol modifications can be made to capture targets that are not phylogenetically close to the reference [19]. Hybridisation-based enrichment can also overcome the problem of degraded genomic DNA, which is often encountered in herbarium and museum material [2021]. Multiplexing of indexed DNA libraries for sequence-capture significantly reduces the amount of work and time required to obtain the same data via PCR amplification of the target, while also reducing sequencing costs when combined with NGS platforms such as Illumina [22]. Multiplexing requires that the size of the target is not excessive, otherwise the read depth (number of reads at a particular site) might be insufficient for proper contig assembly and variant calling. Furthermore, keeping the targets to moderate sizes while generating longer sequences (long loci rather than SNP/Rad-tag data) produces more informative data per locus. Generating large alignments may imply a significant amount of manual work, but enables the inference of more resolved and robust gene trees and consequently the correct assessment of gene tree incongruence, for which SNP and Rad-tag data are severely limited. The cost per base of sequence is vastly lower in NGS than in Sanger sequencing [23] but the overall investment, especially for sample preparation, is still considerable. Therefore, instead of relying solely on exploratory sampling of new loci, it is worth also considering sampling characterised markers that have already been tested for both ease of recovery with sequence capture methods and suitable sequence variability. Targeting previously employed loci is especially important in phylogenetics and phylogeography, which require homologous molecular data from multiple individuals [6], because newly produced sequences can easily be incorporated into pre-existing phylogenies. As more researchers use the same loci across many taxa, large phylogenies can be inferred using data sets with much lower proportions of missing data than is typically the case at presen (...truncated)