Copy Number Variation in Thai Population
Citation: Suktitipat B, Naktang C, Mhuantong W, Tularak T, Artiwet P, et al. (
Copy Number Variation in Thai Population
Bhoom Suktitipat
Chaiwat Naktang
Wuttichai Mhuantong
Thitima Tularak
Paramita Artiwet
Ekawat Pasomsap
Wallaya Jongjaroenprasert
Suthat Fuchareon
Surakameth Mahasirimongkol
Wasan Chantratita
Boonsit Yimwadsana
Varodom Charoensawan
Natini Jinawath
Jeong-Sun Seo, Seoul National University College of Medicine, Republic Of Korea
Copy number variation (CNV) is a major genetic polymorphism contributing to genetic diversity and human evolution. Clinical application of CNVs for diagnostic purposes largely depends on sufficient population CNV data for accurate interpretation. CNVs from general population in currently available databases help classify CNVs of uncertain clinical significance, and benign CNVs. Earlier studies of CNV distribution in several populations worldwide showed that a significant fraction of CNVs are population specific. In this study, we characterized and analyzed CNVs in 3,017 unrelated Thai individuals genotyped with the Illumina Human610, Illumina HumanOmniexpress, or Illumina HapMap550v3 platform. We employed hidden Markov model and circular binary segmentation methods to identify CNVs, extracted 23,458 CNVs consistently identified by both algorithms, and cataloged these high confident CNVs into our publicly available Thai CNV database. Analysis of CNVs in the Thai population identified a median of eight autosomal CNVs per individual. Most CNVs (96.73%) did not overlap with any known chromosomal imbalance syndromes documented in the DECIPHER database. When compared with CNVs in the 11 HapMap3 populations, CNVs found in the Thai population shared several characteristics with CNVs characterized in HapMap3. Common CNVs in Thais had similar frequencies to those in the HapMap3 populations, and all high frequency CNVs (.20%) found in Thai individuals could also be identified in HapMap3. The majorities of CNVs discovered in the Thai population, however, were of low frequency, or uniquely identified in Thais. When performing hierarchical clustering using CNV frequencies, the CNV data were clustered into Africans, Europeans, and Asians, in line with the clustering performed with single nucleotide polymorphism (SNP) data. As CNV data are specific to origin of population, our population-specific reference database will serve as a valuable addition to the existing resources for the investigation of clinical significance of CNVs in Thais and related ethnicities.
-
Funding: The current project was supported by the Thailand Research Fund (http://www.trf.or.th), the Commission on Higher Education, and Mahidol University
(TRF-CHE-MU grant number MRG 5480183) to NJ. BS is supported by Chalermphrakiat grant, Faculty of Medicine Siriraj Hospital, Mahidol University. The funders
had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Copy Number Variation (CNV) is one of the major genetic
variations observed among genomes of individuals. CNVs
constitute more total nucleotides than Single Nucleotide
Polymorphisms (SNP), accounting for almost 12% of the human genome,
and are of important in terms of genetic diversity as well as human
evolution [1]. At present, several conditions with genetic etiologies,
such as autism spectrum disorder, developmental delay, and
nonsyndromic multiple congenital anomalies, are well documented to
have CNVs among the causative variants [2]. For this reason,
array-based technology, which is commonly used for CNV
identification, has been recommended as a first-tier diagnostic
tool for these particular disorders [3]. To make an accurate clinical
interpretation of CNVs, both databases containing reference
CNVs from genetic disease patients and normal controls are
required. Large databases consisting of CNVs and clinical
information of patients with chromosomal disorders such as
DECIPHER [4] and the International Collaboration for Clinical
Genomics (ICCG; http://www.iccg.org/) are actively curated by
working groups. However, most patients are of European descent
due to the availability and easy accessibility of clinical CNV testing
in North America and Europe. Apart from these, there are
currently a few other large public CNV databases containing CNV
information of control subjects from certain ethnic groups, such as
Caucasian, African-American, and Asian American [5,6]. These
general population databases greatly help with clinical
interpretation of CNVs, which can be divided into three main categories:
pathogenic, uncertain clinical significance, or benign [7].
Recently, publications focusing on CNVs of specific ethnicities such as
Koreans [8], Europeans [9], and Chinese [10] emphasize the fact
that there are significant amount of population-specific CNVs. So
far the number of Thai individuals represented in the existing
databases for CNV in general population is very limited [11], and
thus they are by no means the ideal references for CNV
interpretation in Thais. The International Haplotype Map Project
phase III (HapMap3) has made publicly accessible SNP
genotyping and CNV data of more than a thousand subjects from 11
different ethnic groups, e.g. European, African, and East Asian
ancestries [12]. HapMap3 dataset provides an opportunity to
compare genetic variations across populations. Hence, CNVs in a
larger sample of Thai individuals can be characterized and
distinguished from those of East Asian and other populations.
In this study, we combined the genomics data generated from
multiple genome-wide association studies (GWAS) consisting of
3,017 unrelated Thai subjects with no undiagnosed genetic
disorders. We carried out CNV discovery from these dataset
using the two commonly used CNV calling algorithms, PennCNV
[13] and CNV Workshop [14], to identify the most accurate set of
CNVs, and put together the first large reference CNV database for
Thais. Furthermore, we performed population Copy Number
Variation Region (CNVR) frequency comparison between Thais
and 11 HapMap3 populations, and identified unique CNVRs in
Thais as well as CNVs overlapping with genes associated with
Thai population. Genetic similarity between each population was
also explored using hierarchical clustering analysis (HCA) based
on the CNV frequencies. The Thai CNV database should
contribute to a more accurate clinical interpretation of CNVs in
Thai patients and serve as the starting point for future population
genetics and genetic epidemiology studies.
Materials and Methods
Study populations
The study population were compiled from previously published
genome-wide association studies (GWAS) in Thai individuals
[15,16,17,18,19], which were generated under collaborations
between the Ministry of Public Health, Thailand, Thailand
Center of Excellence for Life Sciences (TCELS), and the RIKEN
Center for Genomic Medicin (...truncated)