Gene expression drives the evolution of dominance
ARTICLE
DOI: 10.1038/s41467-018-05281-7
OPEN
Gene expression drives the evolution of dominance
1234567890():,;
Christian D. Huber1, Arun Durvasula
2, Angela M. Hancock
3 & Kirk E. Lohmueller1,2,4
Dominance is a fundamental concept in molecular genetics and has implications for understanding patterns of genetic variation, evolution, and complex traits. However, despite its
importance, the degree of dominance in natural populations is poorly quantified. Here, we
leverage multiple mating systems in natural populations of Arabidopsis to co-estimate the
distribution of fitness effects and dominance coefficients of new amino acid changing
mutations. We find that more deleterious mutations are more likely to be recessive than less
deleterious mutations. Further, this pattern holds across gene categories, but varies with the
connectivity and expression patterns of genes. Our work argues that dominance arises as a
consequence of the functional importance of genes and their optimal expression levels.
1 Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA. 2 Department of Human Genetics, David Geffen
School of Medicine, University of California, Los Angeles, CA 90095, USA. 3 Department of Plant Developmental Biology, Max Planck Institute for Plant
Breeding Research, 50829 Cologne, Germany. 4 Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA. These
authors contributed equally: Christian D. Huber, Arun Durvasula. Correspondence and requests for materials should be addressed to
C.D.H. (email: ) or to K.E.L. (email: )
NATURE COMMUNICATIONS | (2018)9:2750 | DOI: 10.1038/s41467-018-05281-7 | www.nature.com/naturecommunications
1
ARTICLE
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05281-7
T
he relationship between the fitness effects of heterozygous
and homozygous genotypes at a locus, termed dominance,
is a major factor that determines the fate of new alleles in a
population, and has far reaching implications for genetic diseases
and evolutionary genetics1–4. Several models have been theorized
for the mechanism of dominance, starting with R.A. Fisher’s
model, which suggests that dominance arises via modifier
mutations at other loci and that these loci are subject to selection5. In response, S. Wright argued that selection would not be
strong enough to maintain these modifier mutations. He proposed a different model (termed the “metabolic theory”), later
extended by Kacser and Burns, predicting most mutations in
enzymes will be recessive because the overall flux through a
metabolic network is fairly robust to decreasing the amount
of one of the enzymes of the pathway by one-half 6,7. Consequently, loss-of-function mutations have a more severe effect
when homozygous than when heterozygous. An alternative
model, posited by Haldane and further developed by Hurst and
Randerson, suggested that recessivity is a consequence of selection for higher amounts of enzyme product because enzymes
expressed at higher levels are able to tolerate environmental
fluctuations and loss of function (LoF) mutations8,9.
The Wright and Haldane models predict that there is a negative
relationship between the dominance coefficient (h) and the selection coefficient (s), such that more deleterious mutations will tend
to be recessive, while Fisher’s model makes no such prediction10.
Drosophila mutation accumulation lines showed evidence of this
negative relationship, providing the first empirical evidence that
Fisher’s theory may not hold10–12. While the predictions of the
Wright and Haldane models may be applicable to enzymes, they
fail to explain the mechanism of dominance in noncatalytic
gene products13. Further, the extent to which these estimates
apply to the majority of mutations occurring in natural
populations remains to be tested. While population genetic
approaches to estimate the degree of dominance from segregating
genetic variation exist14,15, they have not been widely applied
to empirical data.
A major challenge to studying dominance in natural populations
is that h is inherently confounded with the distribution of fitness
effects (DFE), such that different values of h and DFEs can yield
similar patterns in the genetic variation data in a single outcrossing
population. Here, we circumvent this challenge by developing a
novel composite likelihood approach that leverages genetic variation data from outcrossing and selfing species to co-estimate s and
h. Since selection acts immediately on recessive homozygotes in
self-fertilizing organisms, the genetic variation data from a selfing
species allows us to discriminate between different values of h.
Application of our approach to amino acid changing mutations in
Arabiodopisis suggests that most mutations are recessive and
that more deleterious mutations tend to be more recessive than
less deleterious mutations. We then explore which mechanistic
models of dominance can explain key biological properties in
our data. We find that neither Fisher’s model nor the
metabolic theory is consistent with all of the empirical patterns we
observe. Rather, our new model, which predicts that dominance
can arise as the inevitable consequence of genes being expressed at
their optimal levels, can match many of the salient features of
the data.
variation in a population. In an outcrossing species, the main
factor determining the SFS is the difference in fitness between the
homozygous wild-type and the heterozygous genotype, having
fitnesses 1 and 1−hs, respectively (Fig. 1a). This is because random mating rarely produces homozygous-derived genotypes,
since deleterious mutations typically segregate at low frequencies.
On the other hand, for a strongly selfing species, genotypes are
predominantly in a homozygous state due to the high level of
inbreeding. Thus, the main factor determining the SFS in the
selfing species is the difference in fitness between the two
homozygous genotypes, having fitnesses 1 and 1−s, respectively
(Fig. 1b). Therefore, data from the outcrossing species provide
information about the product of h and s, while data from the
selfing species provide information about s independent of h.
Combining information from both species therefore allows us to
estimate dominance with higher accuracy than when considering
either species alone. Here, we leverage this fact by developing a
composite likelihood approach, which uses the SFS of the outcrossing Arabidopsis lyrata and the selfing Arabidopsis thaliana
(Fig. 1c) to co-estimate the DFE and the relation between h and s
for new nonsynonymous mutations on recently published datasets from both species (Methods)16,17.
Results
Inference of dominance using inbred and outbred populations.
We propose to increase power for estimating dominance by
combining data from an outcrossing species with data from a
selfing species. We use the distribution of allele frequencies in a
sample, or site frequency spectrum (SFS), as summary of genetic (...truncated)