Deducing genotypes for loci of interest from SNP array data via haplotype sharing, demonstrated for apple and cherry
PLOS ONE
RESEARCH ARTICLE
Deducing genotypes for loci of interest from
SNP array data via haplotype sharing,
demonstrated for apple and cherry
Alexander Schaller ID¤a, Stijn Vanderzande¤b, Cameron Peace ID*
Department of Horticulture, Washington State University, Pullman, WA, United States of America
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Schaller A, Vanderzande S, Peace C
(2023) Deducing genotypes for loci of interest
from SNP array data via haplotype sharing,
demonstrated for apple and cherry. PLoS ONE
18(2): e0272888. https://doi.org/10.1371/journal.
pone.0272888
Editor: Evangelia V. Avramidou, Institute of
Mediterranean Forest Ecosystems of Athens,
GREECE
Received: July 27, 2022
Accepted: January 24, 2023
Published: February 7, 2023
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0272888
Copyright: © 2023 Schaller et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
¤a Current address: Department of Environmental Horticulture, University of Florida, Gainesville, FL, United
States of America
¤b Current address: Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
*
Abstract
Breeders, collection curators, and other germplasm users require genetic information, both
genome-wide and locus-specific, to effectively manage their genetically diverse plant material. SNP arrays have become the preferred platform to provide genome-wide genetic profiles for elite germplasm and could also provide locus-specific genotypic information.
However, genotypic information for loci of interest such as those within PCR-based DNA fingerprinting panels and trait-predictive DNA tests is not readily extracted from SNP array
data, thus creating a disconnect between historic and new data sets. This study aimed to
establish a method for deducing genotypes at loci of interest from their associated SNP haplotypes, demonstrated for two fruit crops and three locus types: quantitative trait loci Ma and
Ma3 for acidity in apple, apple fingerprinting microsatellite marker GD12, and Mendelian
trait locus Rf for sweet cherry fruit color. Using phased data from an apple 8K SNP array and
sweet cherry 6K SNP array, unique haplotypes spanning each target locus were associated
with alleles of important breeding parents. These haplotypes were compared via identity-bydescent (IBD) or identity-by-state (IBS) to haplotypes present in germplasm important to U.
S. apple and cherry breeding programs to deduce target locus alleles in this germplasm.
While IBD segments were confidently tracked through pedigrees, confidence in allele identity among IBS segments used a shared length threshold. At least one allele per locus was
deduced for 64–93% of the 181 individuals. Successful validation compared deduced Rf
and GD12 genotypes with reported and newly obtained genotypes. Our approach can efficiently merge and expand genotypic data sets, deducing missing data and identifying errors,
and is appropriate for any crop with SNP array data and historic genotypic data sets, especially where linkage disequilibrium is high. Locus-specific genotypic information extracted
from genome-wide SNP data is expected to enhance confidence in management of genetic
resources.
Data Availability Statement: All relevant data are
within the manuscript and its Supporting
Information files.
PLOS ONE | https://doi.org/10.1371/journal.pone.0272888 February 7, 2023
1 / 18
PLOS ONE
Funding: This study was supported by USDA’s
National Institute of Food and Agriculture-Specialty
Crop Research Initiative project “RosBREED:
Combining disease resistance and horticultural
quality in new rosaceous cultivars” (2014-5118122378) and the USDA National Institute of Food
Agriculture Hatch project 1014919, Crop
Improvement and Sustainable Production Systems
(WSU reference 00011) for CP, SV, and AS. The
funders had no role in study design, data collection
and analysis, decision to publish, or preparation of
the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Deducing locus genotypes from SNP array data
Introduction
Accurate genotypic information on identity, parentage, ancestry, breeding value, and performance potential informs effective germplasm management and use [1]. Historically, fruit
breeders and collection curators have relied on meticulous passport and crossing records to be
confident about identity, parentage, and ancestry and relied on phenotypic data to estimate
genetic potential. Increasingly, locus-specific DNA tests for key traits, often based on simple
PCR markers, have been used to determine the genotypes (i.e., allelic combinations) at trait
loci of interest for cultivars and selections (e.g., [2–5]). In addition, small panels of neutral
genetic markers have routinely been employed by germplasm managers to identify duplicates,
infer pedigree relationships among germplasm individuals (mostly parent-child relationships),
and to calculate overall relatedness among germplasm individuals. (e.g., [6–10]).
Single nucleotide polymorphisms (SNPs) have rapidly become the genetic marker of choice
and are replacing previously developed marker types for a given organism. SNP arrays characterizing thousands of loci across the genome have been developed for fruit crops to provide
desired genotypic information genome-wide [1, 11–22]. SNP arrays have been used to determine general relatedness among individuals as well as identify specific pedigree relationships
[23–27]. SNP arrays have also been used to make genome-wide predictions for apple, cherry,
and peach, in which breeding value and performance potential were based on cumulative
information from small-effect alleles across the genome and a few large-effect alleles of quantitative trait loci (QTLs) [28–31]. In the RosBREED project [22, 32, 33], SNP arrays were developed and used in apple, cherry, and peach on large breeding germplasm sets that were
pedigree-connected and included many important breeding parents and their ancestors [34]
to identify and dissect loci influencing fruit quality and disease resistance traits and identify
favorable and unfavorable alleles and their associated SNPs [35–45]. The data obtained from
these SNP arrays were curated, which included combining SNPs into haploblocks delimited by
historic recombination events and establishing the set of observed multi-SNP haplotypes at
each haploblock for all genotyped germplasm individuals [46].
SNP arrays are useful new tools but for their routine use in germpl (...truncated)