On the importance of evolutionary constraint for regulatory sequence identification

Briefings in Functional Genomics, Nov 2021

Regulation of gene expression relies on the activity of specialized genomic elements, enhancers or silencers, distributed over sometimes large distance from their target gene promoters. A significant part of vertebrate genomes consists in such regulatory elements, but their identification and that of their target genes remains challenging, due to the lack of clear signature at the nucleotide level. For many years the main hallmark used for identifying functional elements has been their sequence conservation between genomes of distant species, indicative of purifying selection. More recently, genome-wide biochemical assays have opened new avenues for detecting regulatory regions, shifting attention away from evolutionary constraints. Here, we review the respective contributions of comparative genomics and biochemical assays for the definition of regulatory elements and their targets and advocate that both sequence conservation and preserved synteny, taken as signature of functional constraint, remain essential tools in this task.

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bfg/article-pdf/20/6/361/41187647/elab015.pdf

On the importance of evolutionary constraint for regulatory sequence identification

Briefings in Functional Genomics, 20(6), 2021, 361–369 https://doi.org/10.1093/bfgp/elab015 Advance Access Publication Date: 23 March 2021 Review Paper On the importance of evolutionary constraint for regulatory sequence identification Corresponding author: Hugues Roest Crollius. E-mail: Abstract Regulation of gene expression relies on the activity of specialized genomic elements, enhancers or silencers, distributed over sometimes large distance from their target gene promoters. A significant part of vertebrate genomes consists in such regulatory elements, but their identification and that of their target genes remains challenging, due to the lack of clear signature at the nucleotide level. For many years the main hallmark used for identifying functional elements has been their sequence conservation between genomes of distant species, indicative of purifying selection. More recently, genome-wide biochemical assays have opened new avenues for detecting regulatory regions, shifting attention away from evolutionary constraints. Here, we review the respective contributions of comparative genomics and biochemical assays for the definition of regulatory elements and their targets and advocate that both sequence conservation and preserved synteny, taken as signature of functional constraint, remain essential tools in this task. Key words: comparative genomics; gene regulation; enhancer; vertebrate evolution Introduction Understanding how genetic information concealed in DNA sequence is turned into biological function in live cells, organisms or ecosystems remains one of the main goals driving current biological research. In this endeavour, it is of prime importance to properly identify the functional elements, which orchestrate the expression of genomes into phenotypes. Several types of genomic elements affect gene expression: proximal promoters, which define transcription start sites, distal enhancers and silencers, which bring regulatory complexes to the promoters, and insulators, which define the broad genomic domains where regulatory interactions take place. Here we will use the term ‘regulatory elements’ exclusively to mean distal cisregulatory elements such as enhancers and silencers. It should be noted that behind the apparent simplicity of their conceptual definition, the operational definition of enhancers/silencers, i.e. the type of experimental evidence required for their validation, has fluctuated depending on research fields, times or even people. For example, the original description of enhancers in cells transfected with episomal vectors demanded that they function independently of their orientation and distance to the target [1, 2], requirements that are seldom tested nowadays, at least in genomic contexts. The difficulties related to the definition of enhancers are discussed in detail in a recent review [3]. Indeed, while it is now relatively straightforward to identify the coding sequences of genes, recognizing regulatory elements that control their expression remains a much harder task. Two main reasons account for this contrast. First, coding sequences reside in exons that need to be transcribed and processed in characteristic ways (e.g. polyadenylation), which allow for their specific identification by biochemical isolation (e.g. polyA + RNA sequencing). In addition, their nucleotide sequence obeys the universal genetic code, which induces characteristic constraints on nucleotidic arrangements at small (codon triplets) as well as large (Open Reading Frame syntax) scales. In contrast, regulatory elements are more difficult to isolate biochemically and do not seem to obey a recognizable code at the sequence level. However, it is generally admitted that the function of regulatory elements is dependent on their nucleotide sequence, François Giudicelli is an INSERM researcher at the Institut de Biologie de l’École Normale Supérieure (IBENS). His research interests concern the evolution of gene regulation in vertebrates. Hugues Roest Crollius is a CNRS researcher and group leader at the Institut de Biologie de l’École Normale Supérieure (IBENS). His lab studies the evolution of genome organization and function in vertebrates. © The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: 361 François Giudicelli and Hugues Roest Crollius 362 Giudicelli and Roest Crollius Phylogenetic footprints define regulatory elements In the pre-genomic era, the quest for phylogenetic footprints was necessarily restricted to small regions around genes of interest, yielding only scarce and spatially biased knowledge. For example, a considerable number of studies were dedicated to the globin loci, paradigmatic cases for the cis-regulatory control of gene expression by conserved elements [5–7]. With longer spans of genomes being sequenced in multiple species, the use of phylogenetic footprinting allowed the identification of far-acting enhancers located up to 1 Mb from their target [8–11]. When the 1st complete sequences of vertebrate genomes became available two decades ago, the field of comparative genomics rapidly expanded its discoveries. Comparing human and mouse genomes with the 1st fully sequenced fish genome, the pufferfish Fugu rupripes, led to the identification of several thousands highly conserved non-coding elements, named CNEs, which survived 450 M years of diverging evolution [12]. Similar findings were reported upon comparison of the human and elephant shark genomes [13], albeit without experimental confirmation in this case. Here, evolutionary constraints could be observed since Gnathostomata, the last common ancestor of humans and elephant sharks, which lived 530 M years ago. The initial focus was indeed placed on conservation over such extreme evolutionary distances as the most reliable indicator of regulatory function. This focus on ancient conservation helped install the zebrafish as one of the key species used in vertebrate enhancer studies, mainly because of its status as widespread experimental model for developmental biology. Not only could it be used to infer regulatory regions, but it also allowed experimental validation of these inferences using a wide range of transgenesis techniques [14]. It should be noted however that teleosts fishes like zebrafish, fugu or medaka, which make one of the largest vertebrate groups with ∼25 000 species, may not be the most appropriate basal group to study the gene regulatory networks underlying the developmental programme of vertebrates. According to [15], the accelerated evolution of their genomes that followed a whole-genome duplication event at the root of teleosts led to loss of many ancient regulatory elements, which may explain why Venkatesh et al. [13] found more CNEs conserved between human and elephant shark (4782) than between human and the less distant fugu (2107) or zebrafish (2938) with the same sequence similarity thresholds. The reason w (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/bfg/article-pdf/20/6/361/41187647/elab015.pdf
Article home page: https://academic.oup.com/bfg/article/20/6/361/6180111

Giudicelli, François, Roest Crollius, Hugues. On the importance of evolutionary constraint for regulatory sequence identification, Briefings in Functional Genomics, 2021, pp. 361-369, Volume 20, Issue 6, DOI: 10.1093/bfgp/elab015