A pipeline for high throughput detection and mapping of SNPs from EST databases

Molecular Breeding, Jan 2010

Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs11032-009-9377-5.pdf

A pipeline for high throughput detection and mapping of SNPs from EST databases

A. M. Anithakumari 0 1 2 3 4 Jifeng Tang 0 1 2 3 4 Herman J. van Eck 0 1 2 3 4 Richard G. F. Visser 0 1 2 3 4 Jack A. M. Leunissen 0 1 2 3 4 Ben Vosman 0 1 2 3 4 C. Gerard van der Linden 0 1 2 3 4 0 H. J. van Eck R. G. F. Visser B. Vosman C. G. van der Linden (&) Wageningen UR Plant Breeding, Wageningen University & Research Centre , PO Box 386, 6700 AJ Wageningen, The Netherlands 1 A. M. Anithakumari Graduate School Experimental Plant Sciences , Wageningen UR Plant Breeding, PO Box 386, 6700 AJ Wageningen, The Netherlands 2 J. Tang: formerly at Wageningen UR Laboratory of Bioinformatics , Wageningen, The Netherlands 3 J. A. M. Leunissen Wageningen UR Laboratory of Bioinformatics, Wageningen University & Research Centre , Wageningen, The Netherlands 4 J. Tang Keygene N.V, Wageningen, The Netherlands Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation. - Genetic variation is the basis for the biodiversity of life (Schlotterer 2004). Variations in the DNA sequence of genes and their regulatory regions underlie most of the phenotypic variation that has been exploited in modern crops (Bryan et al. 2000; Masouleh et al. 2009). Breeding strategies aiming to improve crop agronomical performance have gained momentum in the last few decades by the use of molecular marker technologies that visualize DNA polymorphisms (Collard et al. 2005). Molecular markers have proven to be extremely useful in breeding, for genome-wide screens for variation, genotype identification and/or fingerprinting, evolutionary and ecological studies. In breeding programs that are aimed at transferring genes or alleles within or between different species with the aid of molecular markers several steps can be discerned. The first step in this process is the identification of one or more markers closely linked to or within the traits to be introgressed. For this, a high density map of markers on the genome and/or markers in genes that are likely to be involved in the trait of interest can be invaluable tool. SNPs are very well suited for this purpose. Their astonishing abundance has been reported in several discovery projects in many species including humans (Sachidanandam et al. 2001), model species such as Arabidopsis thaliana (Jander et al. 2002) and Drosophila melanogaster (Hoskins et al. 2001) and in crop plants such as barley (Rostoks et al. 2005), maize (Ching et al. 2002), rice (Shen et al. 2004; McNally et al. 2006), soybean (Zhu et al. 2003) and wheat (Ablett et al. 2006). Recent technological advancements in discovery and detection platforms have made SNP markers attractive for high-throughput use not only in model species, but also in crop plants (Rafalski 2002). In species for which no genome sequence is available, large scale SNP discovery has generally relied on sequence variation found in libraries of expressed sequence tags (ESTs) (Somers et al. 2003) or on resequencing (Choi et al. 2007). Several software tools are available for SNP discovery from nucleotide databases, including PolyBayes, AutoSNP, and QualitySNP (Marth 1999; Barker et al. 2003; Tang et al. 2006). QualitySNP is especially useful in extracting reliable SNPs from EST sequence databases that lack quality information, and is in many cases capable of distinguishing paralogs from allelic sequences effectively (Tang et al. 2006). Along with the development of tools to mine a large number of SNPs from nucleotide databases, new SNP genotyping platforms were developed that can analyze a large number of SNPs in parallel in a large set of individuals (Syvanen 2005). An increasing number of reports indicate that the GoldenGate system of Illumina is a reliable and cost-effective SNP genotyping platform. It is capable of multiplexing from 96 to 1536 SNPs in a single reaction (Fan et al. 2003). In this paper we describe a bioinformatics pipeline starting from SNP discovery in ESTs to genotyping using the Illumina GoldenGate assay. Following SNP discovery, the SNP loci are further screened for suitability to be analyzed with the Illumina GoldenGate Genotyping platform. We demonstrate the applicability of this pipeline for potato, which is the third most important food crop in the world. Potato is a heterozygous crop, and commercial varieties are generally tetraploid. For potato, approximately 200,000 ESTs mainly from three cultivars are publicly available. We show here that SNPs identified by QualitySNP from this collection of SNPs can effectively be turned into markers that can be mapped in different diploid potato mapping populations, showing the versatility of the pipeline and the produced SNP markers. Our results indicate that the pipeline produces a large number of SNP markers, and that the selection of SNPs for genotyping on the Illumina GoldenGate genotyping platform yields a high number of reliable functional co-dominant markers that can be easily placed on a genetic map. Materials and methods Mapping populations (a) SH 9 RH: A cross between two diploid heterozygous potato clones SH83-92-488 and RH89-039-16 (SH 9 RH) resulted in an F1 mapping population of 135 individuals (van Os et al. 2006). Using a Selective Mapping strategy (Vision et al. 2000) 57 individuals were selected which captured the highest number of recombination events. (b) C 9 E: This diploid backcross population consisting of 250 genotypes was obtained from the cross between clones C [USW5337.3; (Hanneman RE 1967)] and E [originally named 77.2102.37; (Jacobsen 1980)]. Clone C is a hybrid between S. phureja PI225696.1 and S. tuberosum dihaploid USW42. Clone E is the result of a cross between clone C and the S. verneiS. tuberosum backcross clone VH3-4211 (Jacobsen 1978). A set of 94 randomly selected individuals was used for this study, along with the parents of the cross. DNA extracti (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs11032-009-9377-5.pdf
Article home page: http://link.springer.com/article/10.1007/s11032-009-9377-5

A. M. Anithakumari, Jifeng Tang, Herman J. van Eck, Richard G. F. Visser, Jack A. M. Leunissen, Ben Vosman, C. Gerard van der Linden. A pipeline for high throughput detection and mapping of SNPs from EST databases, Molecular Breeding, 2010, pp. 65-75, Volume 26, Issue 1, DOI: 10.1007/s11032-009-9377-5