A high-resolution integrated map of copy number polymorphisms within and between breeds of the modern domesticated dog
BMC Genomics
A high-resolution integrated map of copy number polymorphisms within and between breeds of the modern domesticated dog
Thomas J Nicholas 0
Carl Baker 0
Evan E Eichler 0 1
Joshua M Akey 0
0 Department of Genome Sciences, University of Washington , 1705 NE Pacific, Seattle, WA. 98195 , USA
1 Howard Hughes Medical Institute , Seattle, WA , USA
Background: Structural variation contributes to the rich genetic and phenotypic diversity of the modern domestic dog, Canis lupus familiaris, although compared to other organisms, catalogs of canine copy number variants (CNVs) are poorly defined. To this end, we developed a customized high-density tiling array across the canine genome and used it to discover CNVs in nine genetically diverse dogs and a gray wolf. Results: In total, we identified 403 CNVs that overlap 401 genes, which are enriched for defense/immunity, oxidoreductase, protease, receptor, signaling molecule and transporter genes. Furthermore, we performed detailed comparisons between CNVs located within versus outside of segmental duplications (SDs) and find that CNVs in SDs are enriched for gene content and complexity. Finally, we compiled all known dog CNV regions and genotyped them with a custom aCGH chip in 61 dogs from 12 diverse breeds. These data allowed us to perform the first population genetics analysis of canine structural variation and identify CNVs that potentially contribute to breed specific traits. Conclusions: Our comprehensive analysis of canine CNVs will be an important resource in genetically dissecting canine phenotypic and behavioral variation.
-
Background
The domestication of the modern dog from their wolf
ancestors has resulted in an extraordinary amount of
diversity in canine form and function. As such, dogs are
poised to provide unique insights into the genetic
architecture of phenotypic variation and the mechanistic
basis of strong artificial selection. A number of canine
genomics resources have been developed to facilitate
genotype-phenotype inferences, including a high-quality
whole genome sequence and a dense catalog of SNPs
discovered in a wide variety of breeds [1-3]. These
genomics resources have been successfully used to identify
an increasing number of genes that influence hallmark
breed characteristics such as size, coat texture, and skin
wrinkling [4-6]. Additionally, SNP data has been used to
investigate patterns of genetic variation within and
between breeds, establish timing and geography of
domestication, examine relatedness among breeds, and
identify signatures of artificial selection [4,7-9].
In addition to SNPs, it is important to characterize
additional components of canine genomic variation in
order to comprehensively assess the genetic basis of
phenotypic diversity. For example, structural variation in
general, and copy number variants (CNVs) in particular,
has emerged as an important source of genetic variation
in a wide range of organisms including dogs [10-18].
Duplications and deletions of genomic sequence can
have significant impacts on a wide range of phenotypes
including breed-defining traits. For example, a
duplication of a set of FGF genes in Rhodesian and Thai
Ridgebacks leads to the breeds characteristic dorsal hair ridge
[19].
Although the FGF duplication provides a vivid
example of the phenotypic consequences of structural
variation in dogs, it remains unknown whether CNVs are an
appreciable source of variation in morphological,
behavioral, and physiological traits within and between
breeds. Comprehensive discovery of structural variation
in a diverse panel of breeds is an important first step in
more systemically delimiting the contribution of CNVs
to canine phenotypic variation. Previously, we used a
customized aCGH chip to identify nearly 700 CNV
regions located in segmental duplications (SDs) [17].
However, SDs only cover approximately 5% of the dog
genome and thus a large fraction of total genomic space
was unexplored. An additional study using a
genomewide tiling array from NimbleGen identified
approximately 60 CNV regions outside of SDs [10]. However,
the low probe density (~1 probe every 5 kb), limited the
number and size of CNVs that could be identified.
In an effort to more comprehensively interrogate the
canine genome for CNVs, we used a high-density (~1
probe every 1 kb) genome-wide tiling array to discover
additional CNVs in a panel of nine genetically and
phenotypically diverse dogs. In total, we discover over 400
new CNV regions. Moreover, we designed a custom
aCGH chip to genotype all known canine CNVs in 61
dogs from 12 diverse breeds, allowing the first
population genetics analysis of structural variation in dogs to
be performed. The comprehensive CNV resources that
we have developed will be important tools in genetically
dissecting canine phenotypic variation.
Results and Discussion
Genome-wide identification of CNVs using a high-density
aCGH chip
We performed aCGH using a high-density tiling array in
nine breeds (Table 1), a gray wolf, and a self-self
hybridization. These nine breeds and gray wolf samples were
previously studied using a custom array that exclusively
targeted regions containing SDs [17]. In all of the aCGH
hybridizations we used the same reference sample (a
female Boxer distinct from Tasha, the Boxer used for
generating the canine reference sequence), which was
Table 1 Summary of CNVs identified with the
genomewide aCGH chip
Number of CNVs
Total Gain Loss Average Size (kb) Genes
also the reference in our prior SD experiments [17]. The
aCGH chip consists of over 2.1 million probes
distributed across the genome (not including the
uncharacterized chromosome, chrUn) with an average probe density
of 1 kb. CNVs were identified using a circular binary
segmentation algorithm implemented in the program
segMNT, part of NimbleGens NimbleScan software
package. These calls were filtered by log2 values and
number of probes using an adaptive threshold algorithm
where the specific filtering criteria were a function of
the size of the CNV (see Methods).
We identified 1,008 CNVs in 403 unique CNV regions
spanning 30.5 Mb of genomic sequence (Table 1). In the
self-self hybridization, no CNVs were called using the
same analysis and filters. The average number of CNVs
per individual was 101, ranging from 86 (Shetland
Sheepdog and Siberian Husky) to 136 (Gray Wolf). The average
CNV size was approximately 81 kb (Table 1), and the
largest CNV region was located on CFA 34 and spans 3.9
Mb. In total, these 403 CNV regions overlap or contain
401 protein coding genes. After assigning all genes
PANTHER Molecular Function terms, we found that the
most enriched gene classes are similar to those identified
in SDs, namely, defense/immunity, and receptor genes,
but also included oxidoreductase, protease, signaling
molecule, and transporter genes (Additional file 1).
Figure 1 summarizes the location and characteristics
of all known dog CNVs derived from this and previous
studies (...truncated)