Pervasive Variation of Transcription Factor Orthologs Contributes to Regulatory Network Evolution

PLoS Genetics, Mar 2015

Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.

Pervasive Variation of Transcription Factor Orthologs Contributes to Regulatory Network Evolution

March Pervasive Variation of Transcription Factor Orthologs Contributes to Regulatory Network Evolution Shilpa Nadimpalli 0 1 2 Anton V. Persikov 0 1 2 Mona Singh 0 1 2 0 1 Department of Computer Science, Princeton University , Princeton , New Jersey, United States of America, 2 Lewis-Sigler Institute for Integrative Genomics, Princeton University , Princeton, New Jersey , United States of America 1 Funding: This work was funded in part by a National Institutes of Health (http://www.nih.gov/) grant R01- GM076275 (to MS), a National Science Foundation (http://www.nsf.gov/) grant ABI-1062371 (to MS), and a National Science Foundation graduate research fellowship (http://www.nsfgrfp.org/) grant DGE- 1148900 (to SN). The funders had no role in study 2 Editor: Jason D. Lieb, University of Chicago , UNITED STATES Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans. - design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. well as via binding specificity variation in transcription factors (TFs), a less studied phenomenon that has been observed primarily in multi-gene families. Though large-scale experimental studies ascertaining the extent to which TFs contribute to regulatory network variation across organisms are lacking and would be time-consuming, computational methods can begin to address this challenge. Here, we present a systematic, large-scale analysis of DNA-binding specificity evolution in TF orthologs by computationally leveraging specific features of Cys2-His2 zinc finger proteins, the largest class of TFs in animals and major components of their regulatory programs. We find not only that divergence of DNA-binding residues in 1-to-1 orthologous C2H2-ZFs is pervasive but also that these changes show evidence of functional constraint and occur in a gradual, evolutionarily viable manner. We conclude that the diversification of orthologous TFs has most likely played a major and largely unstudied role in gene regulatory network evolution in metazoans. Differences in regulatory networks have been proposed to be one of the major determinants of the phenotypic variations observed across organisms [1]. There are two ways by which regulatory networks evolve: changes in cis or trans. The predominant view is that regulatory evolution results mainly from the gain and loss of binding sites in cis-regulatory regions because incremental, evolutionarily viable steps can occur [25]. Mutations in transcription factors (TFs), on the other hand, can affect the expression of multiple genes and are thought therefore to be more likely to have detrimental consequences [69]. Nevertheless, case studies of specific biological systems have revealed instances of regulatory divergence stemming from TF variation. These variations include gene loss as well as gene duplication where the subsequent paralogs exhibit gain and loss of effector domains, changes in interactions with other regulatory proteins, or novel TF binding potential [1015]. Specific cases of variations in non-duplicated TFs are also known; an example of 1-to-1 orthologous plant TFs with differing binding specificities was recently discovered [16], along with a homeodomain TF in animals where the addition of a functionally important transcriptional repressor domain is found in insect orthologs [17, 18]. However, a large-scale experimental study ascertaining the extent to which TF variation may contribute to overall regulatory network evolution is still lacking; it would require determining DNA-binding specificities or genomic occupancies for numerous TFs across a diverse set of organisms. Computational methods can begin to address this challenge by leveraging specific features of TFs. TFs come in distinct structural classes based upon their incorporation of various DNAbinding domains. For many of these domains, the amino acids conferring DNA-binding specificity are known. This provides a platform to assess TF variation via comparative sequence analysis. The Cys2-His2 zinc finger (C2H2-ZF) TFs in particular are an excellent system to probe for variation, as C2H2-ZF domains have a conserved modular structure with binding specificity conferred largely by four DNA-contacting residues within the domains alpha-helix [19]. Further, they constitute the largest group of TFs in higher metazoans [20], making up nearly half of all annotated TFs in human, and are major participants in regulatory programs. A C2H2-ZF domain can specify a wide range of three or four base pair targets, and tandem arrays of these domains bind contiguous DNA sequences, giving C2H2-ZF genes the ability to recognize an incredibly diverse set of motifs [21]. These features of C2H2-ZFs allow us to make binding specificity predictions of reasonably high quality for this TF family [2226]. Previous evolutionary analyses of C2H2-ZF genes revealed a dichotomy in conservation patterns of this family. Tandemly-duplicated C2H2-ZF paralogs exhibit differences in their C2H2-ZF and effector domain counts and can be highly dynamic across short evolutionary dista (...truncated)


This is a preview of a remote PDF: http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pgen.1005011&representation=PDF
Article home page: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1005011

Shilpa Nadimpalli, Anton V. Persikov, Mona Singh. Pervasive Variation of Transcription Factor Orthologs Contributes to Regulatory Network Evolution, PLoS Genetics, 2015, Volume 11, Issue 3, DOI: 10.1371/journal.pgen.1005011