Impact of human population history on distributions of individual-level genetic distance
Impact of human population history on distributions of individual-level genetic distance
Joanna L. Mountain 1 2
Uma Ramakrishnan 0
0 National Centre for Biological Sciences, GKVK Campus , Bellary Road, Bangalore 560065 , India
1 Department of Genetics, Stanford University , Stanford, CA 94305-5120 , USA
2 Department of Anthropological Sciences, Stanford University , Stanford, CA 94305-2117 , USA
Summaries of human genomic variation shed light on human evolution and provide a framework for biomedical research. Variation is often summarised in terms of one or a few statistics (eg FST and gene diversity). Now that multilocus genotypes for hundreds of autosomal loci are available for thousands of individuals, new approaches are applicable. Recently, trees of individuals and other clustering approaches have demonstrated the power of an individual-focused analysis. We propose analysing the distributions of genetic distances between individuals. Each distribution, or common ancestry profile (CAP), is unique to an individual, and does not require a priori assignment of individuals to populations. Here, we consider a range of models of population history and, using coalescent simulation, reveal the potential insights gained from a set of CAPs. Information lies in the shapes of individual profiles - sometimes captured by variance of individual CAPs - and the variation across profiles. Analysis of short tandem repeat genotype data for over 1,000 individuals from 52 populations is consistent with dramatic differences in population histories across human groups.
human population genetic structure; genetic similarity; short tandem repeats (STRs); multilocus genotypes
-
The collective human gene pool, consisting of the genomes of
all living people, has much to reveal regarding human
population history. Until recently, surveys of human genetic
variation have been sparse, in that hundreds or thousands of
individuals have been studied for a small number of genetic
regions (eg blood groups, Human Lymphocyte Antigens
(HLA), mitochrondrial DNA, Y chromosome1 3) and a few
individuals have been studied for a large fraction of the
genome (eg through the Human Genome Project). In the past
few years, however, larger sets of individuals have been studied
for hundreds of genetic regions4 and, concomitantly, new data
analysis tools have been developed.5 With new data and new
tools, we are rapidly gaining a more precise understanding of
how genetically similar individuals are, and of how that
similarity corresponds to other dimensions of human variation.
Summaries of human genetic variation
Most differences between genomes take the form of single
nucleotide polymorphisms (SNPs) rather than DNA
insertions, deletions or multiplications.6 For the autosomes, two
DNA sequences chosen at random appear to differ at an
average of about one per 1,000 1,500 nucleotide sites.7 9
This level of diversity corresponds to between 2 and 3.2
million nucleotide differences between individual genomes
and is about one order of magnitude lower than the diversity
detected within Drosophila (fruitfly) populations.7
Numerous studies have indicated that the number of
differences between human genomes varies greatly depending
on the pair of genomes considered. The most striking and
consistent pattern is the higher level of genetic diversity in
Africa than in other regions and the relatively low levels of
diversity in the Americas. Zhao and colleagues, in examining a
10 kilobase (kb) non-coding region, found an average of 8.5
differences between African samples and an average of 8.2
differences between non-African samples.8 Yu and colleagues
found a somewhat lower level of nucleotide diversity (p) of
0.076 per cent among Africans and 0.047 per cent among
non-Africans.9 As indicated in the summary of short tandem
repeat (STR) data by Rosenberg et al., diversity within African
groups (average heterozygosity 0.774) tends to be slightly
higher than diversity within Middle Eastern (0.756), European
(0.751) and Central and South Asian (0.752) populations.4
Those groups are, in turn, somewhat more diverse than are the
East Asian populations (heterozygosity 0.723), which, in
their turn, are more diverse than the Oceanic (0.683) and
Native American (0.599) populations.4 All differences in
heterozygosity for pairs of continents are significant at
p , 0.00001, except for Europe versus the Middle East
( p 0.0058), Europe versus Central/South Asia ( p 0.7182)
and the Middle East versus Central/South Asia ( p 0.0554)
(Noah Rosenberg, personal communication).
Human genetic variation is often summarised in terms of
hierarchical population genetic structure. In 1972, Lewontin
estimated, using blood group and protein polymorphism data,
that about 6.3 per cent of genetic variation was explained by
differences among seven groups that he termed races.10
Differences between members of the same population
accounted for 85.4 per cent of the total genetic variation. The
remaining 8.3 per cent was accounted for by the variation
between populations, within each of the seven races.10 In
recent years, geneticists have replicated Lewontins finding
using independent regions of the genome: most estimates of
FST (between-group variation) have ranged from 0.05
0.15.4,1114 These estimates indicate that two individuals
affiliated with different racially or ethnically identified groups
are only slightly more likely to differ at a given neutrally
evolving locus than are two individuals affiliated with the same
group. A large proportion of human genetic variation is found
within racially, ethnically or linguistically identified groups.
Notable exceptions, reflecting smaller effective population
sizes, include the mitochondrial genome and Y chromosome
SNPs, with recent estimates of between-group variation
ranging from 0.3 to 0.4.11,15
Although human genetic variation has often been
summarised using single statistics such as FST, such single statistics
are an inadequate and potentially misleading summary of our
species diversity.16,17 FST is most straightforwardly interpreted
if the underlying population history is of a single population
that instantaneously divides into a number of equally sized,
panmictic subpopulations, each of which remains at the same
size throughout the subsequent time. Human history is far
from fitting such a model. Genetic distances, often represented
in the form of population trees,18 provide a more detailed
representation of structure.1 Recently, Long and Kittles used a
sequential model-fitting approach to infer structure,
generating a tree relating a set of human populations.16 The latter
study highlights the hierarchical and uneven structure of
human genetic variation (see their Figure 2D).
A focus on the individual
Although a combination of heterozygosity and genetic
distance estimates for a set of populations may provide a fairly
accurate summary of genetic variation, these statistics describe
variation within (...truncated)