A general linear model-based approach for inferring selection to climate
Raj et al. BMC Genetics 2013, 14:87
http://www.biomedcentral.com/1471-2156/14/87
METHODOLOGY ARTICLE
Open Access
A general linear model-based approach for
inferring selection to climate
Srilakshmi M Raj1,2†, Luca Pagani1†, Irene Gallego Romero3, Toomas Kivisild1 and William Amos4*†
Abstract
Background: Many efforts have been made to detect signatures of positive selection in the human genome,
especially those associated with expansion from Africa and subsequent colonization of all other continents.
However, most approaches have not directly probed the relationship between the environment and patterns of
variation among humans. We have designed a method to identify regions of the genome under selection based
on Mantel tests conducted within a general linear model framework, which we call MAntel-GLM to Infer Clinal
Selection (MAGICS). MAGICS explicitly incorporates population-specific and genome-wide patterns of background
variation as well as information from environmental values to provide an improved picture of selection and its
underlying causes in human populations.
Results: Our results significantly overlap with those obtained by other published methodologies, but MAGICS has
several advantages. These include improvements that: limit false positives by reducing the number of independent
tests conducted and by correcting for geographic distance, which we found to be a major contributor to selection
signals; yield absolute rather than relative estimates of significance; identify specific geographic regions linked most
strongly to particular signals of selection; and detect recent balancing as well as directional selection.
Conclusions: We find evidence of selection associated with climate (P < 10-5) in 354 genes, and among these
observe a highly significant enrichment for directional positive selection. Two of our strongest ‘hits’, however,
ADRA2A and ADRA2C, implicated in vasoconstriction in response to cold and pain stimuli, show evidence of
balancing selection. Our results clearly demonstrate evidence of climate-related signals of directional and balancing
selection.
Keywords: Climate, Adaptation, Human evolution, Natural selection, Environmental adaptation, Population genetics
Background
Within the last 100,000 years humans dispersed from Africa to occupy most of the habitable space in the world.
During this process our species has successfully combined
cultural buffering, biological plasticity and adaptation to
cope with the wide range of new ecosystems, pathogens
and climates they encountered [1-3]. Climate, in particular, comprises many diverse elements such as temperature,
humidity, precipitation and solar radiation, so it would be
surprising if many different genes had not been influenced
by natural selection. Indeed, many physiological traits exhibit geographic trends that correlate with climate [4-8].
However, without an explicit link to global patterns of
* Correspondence:
†
Equal contributors
4
Department of Zoology, University of Cambridge, Cambridge, UK
Full list of author information is available at the end of the article
genetic variation, the extent to which these trends reflect
adaptation through natural selection remains unclear.
Many genetic studies on humans have attempted to
identify genes and genomic regions associated with regional adaptation by looking for signatures of selection
[2,9-15]. These studies have relied on a diverse range of
approaches that mostly identify outliers in the empirical
genome-wide data, including searches for markers exhibiting
unusually high levels of geographic differentiation [2,9], for
genomic regions with high linkage disequilibrium and derived allele frequency [10], and for markers where the loss of
genetic variability that occurred when humans migrated out
of Africa has been particularly high or low [11-14]. These approaches suggest that a substantial proportion of the human
genome contains candidates of positive selection [15].
However, it can be difficult to ascribe environmental
or biological factors to any particular signal. Furthermore,
© 2013 Raj et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Raj et al. BMC Genetics 2013, 14:87
http://www.biomedcentral.com/1471-2156/14/87
wherever signatures of selection are sought by considering
patterns of genetic variation in isolation, i.e. without reference to a specific hypothesis, it can become difficult to
separate genuine signals from those that arise from
other sources including genotyping errors and other
artifacts.
One way to increase statistical power when searching
for signatures of selection is to study patterns of genomic
variation across populations in relation to particular environmental characteristics. For example, physiological adaptations to temperature and solar radiation, as well as
several other traits, have been shown to vary along a latitudinal cline [16-18], suggesting selection by climate. Even
modest regional allele frequency differences can provide
evidence of selection if they correlate strongly with one or
more environmental variables, provided the environmental variables are accurately measured and also approximate
the selective pressure over the time of evolution. Explored
earlier by Prugnolle et al. (2005) [19], this approach has
been pioneered by Hancock et al. [20-23], who use a
Bayesian algorithm [24] to search for markers at which
variations in allele frequency correlate more than the genomic average with global variation in one or more climatic
variables. In this approach, absolute significance is not determined. Instead, markers are ranked in terms of their
degree of association. On the one hand this makes the approach sensibly conservative, but on the other it precludes
a meaningful estimate of the proportion of the genome actually influenced by selection.
Here we present a new approach for detecting signatures of selection based on the use of general linear
models to analyze similarity matrices. This framework allows three important advantages. First, data from neighboring markers can be combined into a single genetic
window, thereby reducing greatly the number of independent tests that need to be performed. Second, the
method is flexible, allowing incorporation of possible
cofactors such as geographic distance between populations and interactions between variables. In particular,
by fitting genome-wide genetic relatedness we can
control for variation in the level of shared ancestry between different pairs of individuals or populations.
Third, statistical significance is determined through a
form of Mantel test, based on repeated randomization
(scrambling) of the data at one predictor variable, allowing
absolute estimates of significance rather than empirical (...truncated)