Haplotype-based genome-wide association study identifies loci and candidate genes for milk yield in Holsteins
RESEARCH ARTICLE
Haplotype-based genome-wide association
study identifies loci and candidate genes for
milk yield in Holsteins
Zhenliang Chen1,2, Yunqiu Yao1, Peipei Ma1,2, Qishan Wang1,2*, Yuchun Pan1,2*
1 Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University,
Shanghai, PR China, 2 Shanghai Key Laboratory of Veterinary Biotechnology, Shanghai, PR China
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Chen Z, Yao Y, Ma P, Wang Q, Pan Y
(2018) Haplotype-based genome-wide association
study identifies loci and candidate genes for milk
yield in Holsteins. PLoS ONE 13(2): e0192695.
https://doi.org/10.1371/journal.pone.0192695
Editor: Qin Zhang, China Agricultural University,
CHINA
Received: August 24, 2017
Accepted: January 29, 2018
Published: February 15, 2018
Copyright: © 2018 Chen et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The SNP and
phenotype data are freely available at public
repository Dryad (https://doi.org/10.5061/dryad.
cs133).
Funding: This work was supported by National
Natural Science Foundation of China (31370043,
31672386) to Qishan Wang. The funder had no
role in study design, data collection and analysis,
decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
* (YP); (QW)
Abstract
Since milk yield is a highly important economic trait in dairy cattle, the genome-wide association study (GWAS) is vital to explain the genetic architecture underlying milk yield and to
perform marker-assisted selection (MAS). In this study, we adopted a haplotype-based
empirical Bayesian GWAS to identify the loci and candidate genes for milk yield. A total of
1 092 Holstein cows were sequenced by using the genotyping by genome reducing and
sequencing (GGRS) method. After filtering, 164 312 high-confidence SNPs and 13 476
haplotype blocks were identified to use for GWAS. The results indicated that 17 blocks were
significantly associated with milk yield. We further identified the nearest gene of each haplotype block and annotated the genes with milk-associated quantitative trait locus (QTL) intervals and ingenuity pathway analysis (IPA) networks. Our analysis showed that four genes,
DLGAP1, AP2B1, ITPR2 and THBS4, have relationships with milk yield, while another
three, ARHGEF4, TDRD1 and KIF19, were inferred to have potential relationships. Additionally, a network derived from the IPA containing one inferred (ARHGEF4) and all four
confirmed genes likely regulates milk yield. Our findings add to the understanding of identifying the causal genes underlying milk production traits and could guide follow up studies for
further confirmation of the associated genes, pathways and biological networks.
Introduction
As a highly important trait for breeding, milk yield is directly associated with the economic
factors of dairy farming since increased milk yield allows for greater benefits. With the aid of
huge advances in marker technology, it is possible for us to dissect heritable quantitative traits
such as milk production by mapping the underlying genomic region or quantitative trait locus
(QTL). To date, 2 437 QTL intervals correlated with milk yield have been reported on Animal
QTLdb for cattle (http://www.animalgenome.org, Release 32, Apr 27, 2017). However, the
QTL mapping study traditionally uses a linkage analysis to map QTLs, which results in overlarge intervals that make it difficult to identify the underlying mutation and improve breeding
with the use of marker information [1].
PLOS ONE | https://doi.org/10.1371/journal.pone.0192695 February 15, 2018
1 / 13
Haplotype-based GWAS on milk yield in Holsteins
With the advent of high-throughput, single-nucleotide polymorphisms (SNPs) genotyping,
the genome-wide panels of SNPs allow for a genome-wide association study (GWAS) to
explore the genes associated with the complex traits of interest. Compared to the traditional
QTL mapping methods, the advantage of GWAS lies in its more precise intervals. Therefore,
GWAS has become a widely accepted approach to explore the association between markers
and the trait. There are a few GWASs using single-point analysis to identify the key genes for
milk yield[2, 3]. For example, Jiang et al. performed a GWAS for milk production traits in a
Chinese Holstein population and identified 20 significant genome-wide SNPs for milk yield
[2]. However, though GWASs almost always use single-point analysis, the construction of
haplotype blocks and identification of tag SNPs are quite informative in the identification of
markers [4]. A haplotype analysis with data from a GWAS study proved that it substantially
improved the amount of the phenotypic variance explained, compared with single SNPs from
a particular region of the genome [5]. Indeed, often neglected as a tool, haplotype-based
GWAS may be useful in extracting more information from the dataset and could contribute to
the reduction in the missing heritability problem.
Additionally, the most common and efficient model implemented in GWAS is the linear
model with the random effect of polygene and fixed effects including marker and population
structure such as region, age, etc. However, such models have encountered two issues: the
background noise in genomics and the stringency and high rate of false-negatives after Bonferroni correction. Therefore, we adopted a linear mixed model recently developed by our laboratory, and we assumed a haplotype effect as random and to be normally distributed [6]. By
using an empirical Bayesian (EB) method, the prior variance is the estimate from the same
dataset, and the posterior mean is the best linear unbiased prediction (BLUP) of the marker
effect. The present study conducted a haplotype-based GWAS with an empirical Bayesian
method for milk yield traits in Shanghai Holsteins. We tried to analyze the blocks with 2, 3 and
4 SNPs, find the significant blocks, and identify the associated genes, pathways and networks
important for the milk production trait to guide the improvement of dairy cattle breeding.
Material and methods
Population and phenotypes
Approval by the Institutional Animal Care and Use Committee of Shanghai Jiao Tong University (contract no. 2015-07-0136) was given for all experimental procedures involving animals
in the present study. A total of 1 092 cows were selected from 24 farms in Shanghai Bright Holstan Co., Ltd., with the following criteria: 1) primiparous cows born between 2001 and 2012
with the regular and standard performance of DHI (milk yield, fat percentage, protein percentage and somatic cell count); 2) age at first calving between 24 months and 36 months; and 3)
test day from 5 to 335 DIM. The blood sa (...truncated)