Impact of human population history on distributions of individual-level genetic distance (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.humgenomics.com/content/pdf/1479-7364-2-1-4.pdf

Impact of human population history on distributions of individual-level genetic distance

Impact of human population history on distributions of individual-level genetic distance Joanna L. Mountain 1 2 Uma Ramakrishnan 0 0 National Centre for Biological Sciences, GKVK Campus , Bellary Road, Bangalore 560065 , India 1 Department of Genetics, Stanford University , Stanford, CA 94305-5120 , USA 2 Department of Anthropological Sciences, Stanford University , Stanford, CA 94305-2117 , USA Summaries of human genomic variation shed light on human evolution and provide a framework for biomedical research. Variation is often summarised in terms of one or a few statistics (eg FST and gene diversity). Now that multilocus genotypes for hundreds of autosomal loci are available for thousands of individuals, new approaches are applicable. Recently, trees of individuals and other clustering approaches have demonstrated the power of an individual-focused analysis. We propose analysing the distributions of genetic distances between individuals. Each distribution, or common ancestry profile (CAP), is unique to an individual, and does not require a priori assignment of individuals to populations. Here, we consider a range of models of population history and, using coalescent simulation, reveal the potential insights gained from a set of CAPs. Information lies in the shapes of individual profiles - sometimes captured by variance of individual CAPs - and the variation across profiles. Analysis of short tandem repeat genotype data for over 1,000 individuals from 52 populations is consistent with dramatic differences in population histories across human groups. human population genetic structure; genetic similarity; short tandem repeats (STRs); multilocus genotypes - The collective human gene pool, consisting of the genomes of all living people, has much to reveal regarding human population history. Until recently, surveys of human genetic variation have been sparse, in that hundreds or thousands of individuals have been studied for a small number of genetic regions (eg blood groups, Human Lymphocyte Antigens (HLA), mitochrondrial DNA, Y chromosome1 3) and a few individuals have been studied for a large fraction of the genome (eg through the Human Genome Project). In the past few years, however, larger sets of individuals have been studied for hundreds of genetic regions4 and, concomitantly, new data analysis tools have been developed.5 With new data and new tools, we are rapidly gaining a more precise understanding of how genetically similar individuals are, and of how that similarity corresponds to other dimensions of human variation. Summaries of human genetic variation Most differences between genomes take the form of single nucleotide polymorphisms (SNPs) rather than DNA insertions, deletions or multiplications.6 For the autosomes, two DNA sequences chosen at random appear to differ at an average of about one per 1,000 1,500 nucleotide sites.7 9 This level of diversity corresponds to between 2 and 3.2 million nucleotide differences between individual genomes and is about one order of magnitude lower than the diversity detected within Drosophila (fruitfly) populations.7 Numerous studies have indicated that the number of differences between human genomes varies greatly depending on the pair of genomes considered. The most striking and consistent pattern is the higher level of genetic diversity in Africa than in other regions and the relatively low levels of diversity in the Americas. Zhao and colleagues, in examining a 10 kilobase (kb) non-coding region, found an average of 8.5 differences between African samples and an average of 8.2 differences between non-African samples.8 Yu and colleagues found a somewhat lower level of nucleotide diversity (p) of 0.076 per cent among Africans and 0.047 per cent among non-Africans.9 As indicated in the summary of short tandem repeat (STR) data by Rosenberg et al., diversity within African groups (average heterozygosity 0.774) tends to be slightly higher than diversity within Middle Eastern (0.756), European (0.751) and Central and South Asian (0.752) populations.4 Those groups are, in turn, somewhat more diverse than are the East Asian populations (heterozygosity 0.723), which, in their turn, are more diverse than the Oceanic (0.683) and Native American (0.599) populations.4 All differences in heterozygosity for pairs of continents are significant at p , 0.00001, except for Europe versus the Middle East ( p 0.0058), Europe versus Central/South Asia ( p 0.7182) and the Middle East versus Central/South Asia ( p 0.0554) (Noah Rosenberg, personal communication). Human genetic variation is often summarised in terms of hierarchical population genetic structure. In 1972, Lewontin estimated, using blood group and protein polymorphism data, that about 6.3 per cent of genetic variation was explained by differences among seven groups that he termed races.10 Differences between members of the same population accounted for 85.4 per cent of the total genetic variation. The remaining 8.3 per cent was accounted for by the variation between populations, within each of the seven races.10 In recent years, geneticists have replicated Lewontins finding using independent regions of the genome: most estimates of FST (between-group variation) have ranged from 0.05 0.15.4,1114 These estimates indicate that two individuals affiliated with different racially or ethnically identified groups are only slightly more likely to differ at a given neutrally evolving locus than are two individuals affiliated with the same group. A large proportion of human genetic variation is found within racially, ethnically or linguistically identified groups. Notable exceptions, reflecting smaller effective population sizes, include the mitochondrial genome and Y chromosome SNPs, with recent estimates of between-group variation ranging from 0.3 to 0.4.11,15 Although human genetic variation has often been summarised using single statistics such as FST, such single statistics are an inadequate and potentially misleading summary of our species diversity.16,17 FST is most straightforwardly interpreted if the underlying population history is of a single population that instantaneously divides into a number of equally sized, panmictic subpopulations, each of which remains at the same size throughout the subsequent time. Human history is far from fitting such a model. Genetic distances, often represented in the form of population trees,18 provide a more detailed representation of structure.1 Recently, Long and Kittles used a sequential model-fitting approach to infer structure, generating a tree relating a set of human populations.16 The latter study highlights the hierarchical and uneven structure of human genetic variation (see their Figure 2D). A focus on the individual Although a combination of heterozygosity and genetic distance estimates for a set of populations may provide a fairly accurate summary of genetic variation, these statistics describe variation within (...truncated)