Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry

PLOS ONE, Feb 2023

Breeders, collection curators, and other germplasm users require genetic information, both genome-wide and locus-specific, to effectively manage their genetically diverse plant material. SNP arrays have become the preferred platform to provide genome-wide genetic profiles for elite germplasm and could also provide locus-specific genotypic information. However, genotypic information for loci of interest such as those within PCR-based DNA fingerprinting panels and trait-predictive DNA tests is not readily extracted from SNP array data, thus creating a disconnect between historic and new data sets. This study aimed to establish a method for deducing genotypes at loci of interest from their associated SNP haplotypes, demonstrated for two fruit crops and three locus types: quantitative trait loci Ma and Ma3 for acidity in apple, apple fingerprinting microsatellite marker GD12, and Mendelian trait locus Rf for sweet cherry fruit color. Using phased data from an apple 8K SNP array and sweet cherry 6K SNP array, unique haplotypes spanning each target locus were associated with alleles of important breeding parents. These haplotypes were compared via identity-by-descent (IBD) or identity-by-state (IBS) to haplotypes present in germplasm important to U.S. apple and cherry breeding programs to deduce target locus alleles in this germplasm. While IBD segments were confidently tracked through pedigrees, confidence in allele identity among IBS segments used a shared length threshold. At least one allele per locus was deduced for 64–93% of the 181 individuals. Successful validation compared deduced Rf and GD12 genotypes with reported and newly obtained genotypes. Our approach can efficiently merge and expand genotypic data sets, deducing missing data and identifying errors, and is appropriate for any crop with SNP array data and historic genotypic data sets, especially where linkage disequilibrium is high. Locus-specific genotypic information extracted from genome-wide SNP data is expected to enhance confidence in management of genetic resources.

Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry

PLOS ONE RESEARCH ARTICLE Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry Alexander Schaller ID¤a, Stijn Vanderzande¤b, Cameron Peace ID* Department of Horticulture, Washington State University, Pullman, WA, United States of America a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Schaller A, Vanderzande S, Peace C (2023) Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry. PLoS ONE 18(2): e0272888. https://doi.org/10.1371/journal. pone.0272888 Editor: Evangelia V. Avramidou, Institute of Mediterranean Forest Ecosystems of Athens, GREECE Received: July 27, 2022 Accepted: January 24, 2023 Published: February 7, 2023 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0272888 Copyright: © 2023 Schaller et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. ¤a Current address: Department of Environmental Horticulture, University of Florida, Gainesville, FL, United States of America ¤b Current address: Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands * Abstract Breeders, collection curators, and other germplasm users require genetic information, both genome-wide and locus-specific, to effectively manage their genetically diverse plant material. SNP arrays have become the preferred platform to provide genome-wide genetic profiles for elite germplasm and could also provide locus-specific genotypic information. However, genotypic information for loci of interest such as those within PCR-based DNA fingerprinting panels and trait-predictive DNA tests is not readily extracted from SNP array data, thus creating a disconnect between historic and new data sets. This study aimed to establish a method for deducing genotypes at loci of interest from their associated SNP haplotypes, demonstrated for two fruit crops and three locus types: quantitative trait loci Ma and Ma3 for acidity in apple, apple fingerprinting microsatellite marker GD12, and Mendelian trait locus Rf for sweet cherry fruit color. Using phased data from an apple 8K SNP array and sweet cherry 6K SNP array, unique haplotypes spanning each target locus were associated with alleles of important breeding parents. These haplotypes were compared via identity-bydescent (IBD) or identity-by-state (IBS) to haplotypes present in germplasm important to U. S. apple and cherry breeding programs to deduce target locus alleles in this germplasm. While IBD segments were confidently tracked through pedigrees, confidence in allele identity among IBS segments used a shared length threshold. At least one allele per locus was deduced for 64–93% of the 181 individuals. Successful validation compared deduced Rf and GD12 genotypes with reported and newly obtained genotypes. Our approach can efficiently merge and expand genotypic data sets, deducing missing data and identifying errors, and is appropriate for any crop with SNP array data and historic genotypic data sets, especially where linkage disequilibrium is high. Locus-specific genotypic information extracted from genome-wide SNP data is expected to enhance confidence in management of genetic resources. Data Availability Statement: All relevant data are within the manuscript and its Supporting Information files. PLOS ONE | https://doi.org/10.1371/journal.pone.0272888 February 7, 2023 1 / 18 PLOS ONE Funding: This study was supported by USDA’s National Institute of Food and Agriculture-Specialty Crop Research Initiative project “RosBREED: Combining disease resistance and horticultural quality in new rosaceous cultivars” (2014-5118122378) and the USDA National Institute of Food Agriculture Hatch project 1014919, Crop Improvement and Sustainable Production Systems (WSU reference 00011) for CP, SV, and AS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Deducing locus genotypes from SNP array data Introduction Accurate genotypic information on identity, parentage, ancestry, breeding value, and performance potential informs effective germplasm management and use [1]. Historically, fruit breeders and collection curators have relied on meticulous passport and crossing records to be confident about identity, parentage, and ancestry and relied on phenotypic data to estimate genetic potential. Increasingly, locus-specific DNA tests for key traits, often based on simple PCR markers, have been used to determine the genotypes (i.e., allelic combinations) at trait loci of interest for cultivars and selections (e.g., [2–5]). In addition, small panels of neutral genetic markers have routinely been employed by germplasm managers to identify duplicates, infer pedigree relationships among germplasm individuals (mostly parent-child relationships), and to calculate overall relatedness among germplasm individuals. (e.g., [6–10]). Single nucleotide polymorphisms (SNPs) have rapidly become the genetic marker of choice and are replacing previously developed marker types for a given organism. SNP arrays characterizing thousands of loci across the genome have been developed for fruit crops to provide desired genotypic information genome-wide [1, 11–22]. SNP arrays have been used to determine general relatedness among individuals as well as identify specific pedigree relationships [23–27]. SNP arrays have also been used to make genome-wide predictions for apple, cherry, and peach, in which breeding value and performance potential were based on cumulative information from small-effect alleles across the genome and a few large-effect alleles of quantitative trait loci (QTLs) [28–31]. In the RosBREED project [22, 32, 33], SNP arrays were developed and used in apple, cherry, and peach on large breeding germplasm sets that were pedigree-connected and included many important breeding parents and their ancestors [34] to identify and dissect loci influencing fruit quality and disease resistance traits and identify favorable and unfavorable alleles and their associated SNPs [35–45]. The data obtained from these SNP arrays were curated, which included combining SNPs into haploblocks delimited by historic recombination events and establishing the set of observed multi-SNP haplotypes at each haploblock for all genotyped germplasm individuals [46]. SNP arrays are useful new tools but for their routine use in germpl (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0272888&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0272888

Alexander Schaller, Stijn Vanderzande, Cameron Peace. Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry, PLOS ONE, 2023, Volume 18, Issue 2, DOI: 10.1371/journal.pone.0272888