Gene expression drives the evolution of dominance (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-018-05281-7.pdf

Gene expression drives the evolution of dominance

ARTICLE DOI: 10.1038/s41467-018-05281-7 OPEN Gene expression drives the evolution of dominance 1234567890():,; Christian D. Huber1, Arun Durvasula 2, Angela M. Hancock 3 & Kirk E. Lohmueller1,2,4 Dominance is a fundamental concept in molecular genetics and has implications for understanding patterns of genetic variation, evolution, and complex traits. However, despite its importance, the degree of dominance in natural populations is poorly quantiﬁed. Here, we leverage multiple mating systems in natural populations of Arabidopsis to co-estimate the distribution of ﬁtness effects and dominance coefﬁcients of new amino acid changing mutations. We ﬁnd that more deleterious mutations are more likely to be recessive than less deleterious mutations. Further, this pattern holds across gene categories, but varies with the connectivity and expression patterns of genes. Our work argues that dominance arises as a consequence of the functional importance of genes and their optimal expression levels. 1 Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA. 2 Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA. 3 Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany. 4 Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA. These authors contributed equally: Christian D. Huber, Arun Durvasula. Correspondence and requests for materials should be addressed to C.D.H. (email: ) or to K.E.L. (email: ) NATURE COMMUNICATIONS | (2018)9:2750 | DOI: 10.1038/s41467-018-05281-7 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05281-7 T he relationship between the ﬁtness effects of heterozygous and homozygous genotypes at a locus, termed dominance, is a major factor that determines the fate of new alleles in a population, and has far reaching implications for genetic diseases and evolutionary genetics1–4. Several models have been theorized for the mechanism of dominance, starting with R.A. Fisher’s model, which suggests that dominance arises via modiﬁer mutations at other loci and that these loci are subject to selection5. In response, S. Wright argued that selection would not be strong enough to maintain these modiﬁer mutations. He proposed a different model (termed the “metabolic theory”), later extended by Kacser and Burns, predicting most mutations in enzymes will be recessive because the overall ﬂux through a metabolic network is fairly robust to decreasing the amount of one of the enzymes of the pathway by one-half 6,7. Consequently, loss-of-function mutations have a more severe effect when homozygous than when heterozygous. An alternative model, posited by Haldane and further developed by Hurst and Randerson, suggested that recessivity is a consequence of selection for higher amounts of enzyme product because enzymes expressed at higher levels are able to tolerate environmental ﬂuctuations and loss of function (LoF) mutations8,9. The Wright and Haldane models predict that there is a negative relationship between the dominance coefﬁcient (h) and the selection coefﬁcient (s), such that more deleterious mutations will tend to be recessive, while Fisher’s model makes no such prediction10. Drosophila mutation accumulation lines showed evidence of this negative relationship, providing the ﬁrst empirical evidence that Fisher’s theory may not hold10–12. While the predictions of the Wright and Haldane models may be applicable to enzymes, they fail to explain the mechanism of dominance in noncatalytic gene products13. Further, the extent to which these estimates apply to the majority of mutations occurring in natural populations remains to be tested. While population genetic approaches to estimate the degree of dominance from segregating genetic variation exist14,15, they have not been widely applied to empirical data. A major challenge to studying dominance in natural populations is that h is inherently confounded with the distribution of ﬁtness effects (DFE), such that different values of h and DFEs can yield similar patterns in the genetic variation data in a single outcrossing population. Here, we circumvent this challenge by developing a novel composite likelihood approach that leverages genetic variation data from outcrossing and selﬁng species to co-estimate s and h. Since selection acts immediately on recessive homozygotes in self-fertilizing organisms, the genetic variation data from a selﬁng species allows us to discriminate between different values of h. Application of our approach to amino acid changing mutations in Arabiodopisis suggests that most mutations are recessive and that more deleterious mutations tend to be more recessive than less deleterious mutations. We then explore which mechanistic models of dominance can explain key biological properties in our data. We ﬁnd that neither Fisher’s model nor the metabolic theory is consistent with all of the empirical patterns we observe. Rather, our new model, which predicts that dominance can arise as the inevitable consequence of genes being expressed at their optimal levels, can match many of the salient features of the data. variation in a population. In an outcrossing species, the main factor determining the SFS is the difference in ﬁtness between the homozygous wild-type and the heterozygous genotype, having ﬁtnesses 1 and 1−hs, respectively (Fig. 1a). This is because random mating rarely produces homozygous-derived genotypes, since deleterious mutations typically segregate at low frequencies. On the other hand, for a strongly selﬁng species, genotypes are predominantly in a homozygous state due to the high level of inbreeding. Thus, the main factor determining the SFS in the selﬁng species is the difference in ﬁtness between the two homozygous genotypes, having ﬁtnesses 1 and 1−s, respectively (Fig. 1b). Therefore, data from the outcrossing species provide information about the product of h and s, while data from the selﬁng species provide information about s independent of h. Combining information from both species therefore allows us to estimate dominance with higher accuracy than when considering either species alone. Here, we leverage this fact by developing a composite likelihood approach, which uses the SFS of the outcrossing Arabidopsis lyrata and the selﬁng Arabidopsis thaliana (Fig. 1c) to co-estimate the DFE and the relation between h and s for new nonsynonymous mutations on recently published datasets from both species (Methods)16,17. Results Inference of dominance using inbred and outbred populations. We propose to increase power for estimating dominance by combining data from an outcrossing species with data from a selﬁng species. We use the distribution of allele frequencies in a sample, or site frequency spectrum (SFS), as summary of genetic (...truncated)