Whole Genome Distribution and Ethnic Differentiation of Copy Number Variation in Caucasian and Asian Populations

PLOS ONE, Nov 2009

Although copy number variation (CNV) has recently received much attention as a form of structure variation within the human genome, knowledge is still inadequate on fundamental CNV characteristics such as occurrence rate, genomic distribution and ethnic differentiation. In the present study, we used the Affymetrix GeneChip® Mapping 500K Array to discover and characterize CNVs in the human genome and to study ethnic differences of CNVs between Caucasians and Asians. Three thousand and nineteen CNVs, including 2381 CNVs in autosomes and 638 CNVs in X chromosome, from 985 Caucasian and 692 Asian individuals were identified, with a mean length of 296 kb. Among these CNVs, 190 had frequencies greater than 1% in at least one ethnic group, and 109 showed significant ethnic differences in frequencies (p<0.01). After merging overlapping CNVs, 1135 copy number variation regions (CNVRs), covering approximately 439 Mb (14.3%) of the human genome, were obtained. Our findings of ethnic differentiation of CNVs, along with the newly constructed CNV genomic map, extend our knowledge on the structural variation in the human genome and may furnish a basis for understanding the genomic differentiation of complex traits across ethnic groups.

Whole Genome Distribution and Ethnic Differentiation of Copy Number Variation in Caucasian and Asian Populations

et al. (2009) Whole Genome Distribution and Ethnic Differentiation of Copy Number Variation in Caucasian and Asian Populations. PLoS ONE 4(11): e7958. doi:10.1371/journal.pone.0007958 Whole Genome Distribution and Ethnic Differentiation of Copy Number Variation in Caucasian and Asian Populations Jian Li 0 Tielin Yang 0 Liang Wang 0 Han Yan 0 Yinping Zhang 0 Yan Guo 0 Feng Pan 0 Zhixin Zhang 0 Yumei Peng 0 Qi Zhou 0 Lina He 0 Xuezhen Zhu 0 Hongyi Deng 0 Shawn Levy 0 Christopher J. Papasian 0 Betty M. Drees 0 James J. Hamilton 0 Robert R. Recker 0 Jing Cheng 0 Hong-Wen Deng 0 Florian Kronenberg, Innsbruck Medical University, Austria 0 1 School of Medicine, University of Missouri Kansas City, Kansas City, Missouri, United States of America, 2 The Key Laboratory of Biomedical Information Engineering of Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University , Xi'an, Shanxi , People's Republic of China, 3 Vanderbilt Microarray Shared Resource, Vanderbilt University , Nashville , Tennessee, United States of America, 4 Osteoporosis Research Center, Creighton University , Omaha , Nebraska, United States of America, 5 National Engineering Research Center for Beijing Biochip Technology , Changping District, Beijing , People's Republic of China, 6 Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University , Changsha, Hunan , People's Republic of China Although copy number variation (CNV) has recently received much attention as a form of structure variation within the human genome, knowledge is still inadequate on fundamental CNV characteristics such as occurrence rate, genomic distribution and ethnic differentiation. In the present study, we used the Affymetrix GeneChipH Mapping 500K Array to discover and characterize CNVs in the human genome and to study ethnic differences of CNVs between Caucasians and Asians. Three thousand and nineteen CNVs, including 2381 CNVs in autosomes and 638 CNVs in X chromosome, from 985 Caucasian and 692 Asian individuals were identified, with a mean length of 296 kb. Among these CNVs, 190 had frequencies greater than 1% in at least one ethnic group, and 109 showed significant ethnic differences in frequencies (p,0.01). After merging overlapping CNVs, 1135 copy number variation regions (CNVRs), covering approximately 439 Mb (14.3%) of the human genome, were obtained. Our findings of ethnic differentiation of CNVs, along with the newly constructed CNV genomic map, extend our knowledge on the structural variation in the human genome and may furnish a basis for understanding the genomic differentiation of complex traits across ethnic groups. - Funding: Investigators of this work were partially supported by grants from NIH (R01 AR050496-01, R21 AG027110, R01 AG026564, and P50 AR055081). The study also benefited from grants from National Science Foundation of China, Huo Ying Dong Education Foundation, Xian Jiaotong University, and the Ministry of Education of China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Variation within the human genome can take many different forms. One form of structural variation is copy number variation (CNV), in which a DNA segment, ranging from 1 kb to several megabases, is present at a variable copy number in comparison to a reference genome [1]. CNVs are widespread in the human genome, and vary across populations with respect to rate of occurrence [27]. CNVs have been shown to account for nearly 18% of variation in gene expression and, consequently, may play an important role in determining complex traits [8]. CNVs have been associated with certain complex human diseases, such as susceptibility to HIV infection, selected autoimmune diseases, tumors and psychiatric disorders such as mental retardation and autism [914]. Although several studies have been performed to characterize genomic CNVs, comparing results from these studies has been hindered by small sample sizes and different study designs and analytical methods. Consequently, it has been difficult to combine results from different studies to produce an accurate description of genomic CNV characteristics such as the total number, genomic position, gene content, and frequency distribution [7]. It is even more difficult to robustly detect CNV differentiation across ethnic groups, and this has limited the utility of CNVs for association studies and human evolution research. One approach that can minimize the problems listed above is to use large sample sizes comprised of subjects from comparatively homogeneous ethnic backgrounds for each study population [15]. Recent technologic developments such as the availability of high-density SNP microarrays have also been helpful, in terms of providing an efficient and affordable tool for CNV discovery in the human genome. In this study, we utilized the Affymetrix GeneChipH Mapping 500K Array, in which one SNP was placed approximately every 5.8 kb along the human genome, to identify CNVs in both a US Caucasian population and a Chinese Han population. CNVs were identified and characterized based on probe intensities and SNP genotypes, and their ethnic differences were studied. The results extend our understanding on the structural variation in the human genome and may furnish a basis for understanding the genomic differentiation of complex traits across ethnic groups. Brief summaries of CNV and CNVR (copy number variation region, which is a region covered by overlapping CNVs) characteristics in each ethnic group were shown in Table 1, with detailed summaries being presented in the corresponding supplementary tables. Characteristics of CNVs There were 2,381 autosomal CNVs identified in the 1,677 subjects (Table S1), with a median length of 198 kb and a mean length of 298 kb. Although CHI had a smaller sample size, the numbers of CNVs identified in the two ethnic groups were similar: 1,352 CNVs in CAU versus 1,395 CNVs in CHI. Other CNV characteristics that were similar in the two populations include the average number of CNVs per individual (,9 CNVs per individual, ranging from 132, in CAU versus ,10 CNVs per individual, ranging from 244 in CHI (Figure 1A), the median size of CNVs (195 kb in CAU vs. 196 kb in CHI), and the mean size of CNVs (295 kb in CAU vs. 303 kb in CHI) (Figure 1B). Although a great percentage of CNVs were singletons, 27.6% were present more than once in our samples. Specifically, 168 or 7% of the 2,381 CNVs were common CNVs, defined as CNVs with a frequency of 1% or greater in at least one ethnic group (Table S2). There were 638 CNVs identified on the X chromosome in our subjects (Table S1), with a median length of 206 kb and a mean length of 288 kb, similar to those of autosomal chromosomes. For these 638 CNVs, 183 (29%) were detec (...truncated)


This is a preview of a remote PDF: http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0007958&type=printable
Article home page: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0007958

Jian Li, Tielin Yang, Liang Wang, Han Yan, Yinping Zhang, Yan Guo, Feng Pan, Zhixin Zhang, Yumei Peng, Qi Zhou, Lina He, Xuezhen Zhu, Hongyi Deng, Shawn Levy, Christopher J. Papasian, Betty M. Drees, James J. Hamilton, Robert R. Recker, Jing Cheng, Hong-Wen Deng. Whole Genome Distribution and Ethnic Differentiation of Copy Number Variation in Caucasian and Asian Populations, PLOS ONE, 2009, Volume 4, Issue 11, DOI: 10.1371/journal.pone.0007958