Genotype–phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate

Bioinformatics, Jun 2009

Motivation: Mapping between genotype and phenotype is one of the primary goals of evolutionary genetics but one that has received little attention at the interspecies level. Recent developments in phylogenetics and statistical modelling have typically been used to examine molecular and phenotypic evolution separately. We have used this background to develop phylogenetic substitution models to test for associations between evolutionary rate of genotype and phenotype. We do this by creating hybrid rate matrices between genotype and phenotype. Results: Simulation results show our models to be accurate in detecting genotype–phenotype associations and robust for various factors that typically affect maximum likelihood methods, such as number of taxa, level of relevant signal, proportion of sites affected and length of evolutionary divergence. Further, simulations show that our method is robust to homogeneity assumptions. We apply the models to datasets of male reproductive system genes in relation to mating systems of primates. We show that evolution of semenogelin II is significantly associated with mating systems whereas two negative control genes (cytochrome b and peptidase inhibitor 3) show no significant association. This provides the first hybrid substitution model of which we are aware to directly test the association between genotype and phenotype using a phylogenetic framework. Availability: Perl and HYPHY scripts are available upon request from the authors. Contact: to252{at}cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/25/12/i94.full.pdf

Genotype–phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate

Timothy D. O'Connor 0 Nicholas I. Mundy 0 0 Department of Zoology, University of Cambridge , Cambridge CB2 3EJ, UK Motivation: Mapping between genotype and phenotype is one of the primary goals of evolutionary genetics but one that has received little attention at the interspecies level. Recent developments in phylogenetics and statistical modelling have typically been used to examine molecular and phenotypic evolution separately. We have used this background to develop phylogenetic substitution models to test for associations between evolutionary rate of genotype and phenotype. We do this by creating hybrid rate matrices between genotype and phenotype. Results: Simulation results show our models to be accurate in detecting genotype-phenotype associations and robust for various factors that typically affect maximum likelihood methods, such as number of taxa, level of relevant signal, proportion of sites affected and length of evolutionary divergence. Further, simulations show that our method is robust to homogeneity assumptions. We apply the models to datasets of male reproductive system genes in relation to mating systems of primates. We show that evolution of semenogelin II is significantly associated with mating systems whereas two negative control genes (cytochrome b and peptidase inhibitor 3) show no significant association. This provides the first hybrid substitution model of which we are aware to directly test the association between genotype and phenotype using a phylogenetic framework. Availability: Perl and HYPHY scripts are available upon request from the authors. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION One of the major issues in evolutionary genetics research is the relationship between genotype and phenotype. Natural selection acts on phenotypes and indirectly leaves a signal at the molecular level. The connection between the two levels is important because it ties together the effects of natural selection. Thus, selection for a phenotype can change the genetic variation for specific genes or genomic regions. Within the field of molecular evolution, the study of adaptation has focused on methods for detecting selection in coding sequences, with any inferences about phenotypic evolution being indirect. At the forefront of this enquiry, Yang, Nei, Goldman and others (Goldman and Yang, 1994; Nei and Gojobori, 1986; Yang, 2007) developed computational models of molecular evolution to distinguish between neutral mutation and selection. These codon models focus on the ratio (dN/dS) of the rate of non-synonymous or protein altering changes to the rate of synonymous or silent changes assumed to estimate the neutral rate of evolution (Goldman and Yang, 1994; Muse and Gaut, 1994). At intraspecies level, and occasionally at the closely related interspecies level, quantitative trait locus (QTL) analyses have been designed to detect specific regions of the genome associated with a given trait (Slate, 2005). These methods typically use pedigree information or known population structure to make specific crosses for particular phenotypes (Lynch and Walsh, 1998). The crosses are then genotyped using SNP or other markers across the whole genome and statistical associations of the linkage disequilibrium between genotype and phenotype are identified. Other studies use association mapping to identify genomic regions involved in phenotypic differences, or perform candidate gene associations, e.g. MC1R in relation to colouration differences (Nachman et al., 2003; Theron et al., 2001). A few studies have looked for associations at the interspecies level using phylogenetics. The two main approaches used are regression analysis between evolutionary rate and phenotypic variation and codon branch-site models with phenotypes assigned to branches. In the regression analyses published to date, dN/dS ratios are calculated for each branch in the tree using the free-ratios model (Yang, 1998) and a regression is performed by (i) pairing the dN/dS ratio for each terminal branch with the phenotype value for its terminal node or (ii) pairing the dN/dS ratio for every branch with the reconstructed phenotype on that branch. Using the first approach in primates, Dorus et al. (2004) found a positive correlation between levels of sperm competition (mean number of partners in a periovulatory period) and the dN/dS ratio of semenogelin II (SEMG2), a gene encoding a protein involved in primate semen. Later, Hurle et al. (2007) added additional taxa and performed a similar analysis but found no significant trend. In a similar approach, Herlyn and Zischler (2007) found a negative correlation between the dN/dS in sperm ligand zonadhesin (ZAN ) and primate body weight dimorphism. In birds, Nadeau et al. (2007) employed this method to study correlations between pigmentation genes and sexual dimorphic colour variation in galliforms. Also, they used the second method and correlated dN/dS ratios for internal and terminal branches and ancestral reconstructions of sexual dimorphism in colouration over the phylogenetic tree. Both methods showed a correlation between MC1R, but not other pigmentation genes, and dimorphic colouration (Nadeau et al., 2007). The second method employed is the use of branch-site codon tests which test for changes in selection pressure on particular branches with phenotypes of interest. This method tests for positive selection by comparing a null model of neutral evolution to a model of positive selection on those branches (Zhang et al., 2005). Ramm et al. (2008) reanalysed SEMG2 as well as SEMG1 in primates using the codon models. They found that branches leading to species with high levels of sperm competition (multimale mating systems) show significant evidence of positive selection in SEMG2 but not SEMG1. Branches leading to species with low levels of sperm competition show no evidence for positive selection at either locus. In addition, they tested seven rodent semen proteins and found that Svs2, the rodent orthologue to SEMG2, showed significant evidence for positive selection on branches leading to taxa with high relative testis size. All of these tests can be criticized on theoretical grounds. For tests using phenotypic states derived from terminal taxa, the phenotypic state is applied to a whole branch without regard to its evolution. This creates a problem because some portion of the branch being associated with a phenotype is potentially misapplied, by ignoring the timing of the evolutionary loss or gain of the phenotype. For tests relying on phenotypic character reconstruction for internal assignment, error in reconstruction is not taken into account in downstream analyses. One way around these difficulties is the maximum likelihood approach, which assigns characters to terminal nodes and probability distributions for those characters to internal nodes (Felsenstein, 1981). Thus, it estimates th (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/25/12/i94.full.pdf
Article home page: http://bioinformatics.oxfordjournals.org/content/25/12/i94.abstract

Timothy D. O'Connor, Nicholas I. Mundy. Genotype–phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate, Bioinformatics, 2009, pp. i94-i100, 25/12, DOI: 10.1093/bioinformatics/btp231