Regulatory RNAs in the light of Drosophila genomics

Briefings in Functional Genomics, Sep 2012

Many aspects of gene regulation are mediated by RNA molecules. However, regulatory RNAs have remained elusive until very recently. At least three types of small regulatory RNAs have been characterized in Drosophila: microRNAs (miRNAs), piwi-interacting RNAs and endogenous siRNAs. A fourth class of regulatory RNAs includes known long non-coding RNAs such as roX1 or bxd. The initial sequencing of the Drosophila melanogaster genome has served as a scaffold to study the transcriptional profile of an animal, revealing the complexities of the function and biogenesis of regulatory RNAs. The comparative analysis of 12 Drosophila genomes has been crucial for the study of microRNA evolution. However, comparative genomics of other RNA regulators is confounded by technical problems: genomic loci are poorly conserved and frequently encoded in the heterochromatin. Future developments in genome sequencing and population genomics in Drosophila will continue to shed light on the conservation, evolution and function of regulatory RNAs.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

Regulatory RNAs in the light of Drosophila genomics

Antonio Marco Many aspects of gene regulation are mediated by RNA molecules. However, regulatory RNAs have remained elusive until very recently. At least three types of small regulatory RNAs have been characterized in Drosophila: microRNAs (miRNAs), piwi-interacting RNAs and endogenous siRNAs. A fourth class of regulatory RNAs includes known long non-coding RNAs such as roX1 or bxd. The initial sequencing of the Drosophila melanogaster genome has served as a scaffold to study the transcriptional profile of an animal, revealing the complexities of the function and biogenesis of regulatory RNAs. The comparative analysis of 12 Drosophila genomes has been crucial for the study of microRNA evolution. However, comparative genomics of other RNA regulators is confounded by technical problems: genomic loci are poorly conserved and frequently encoded in the heterochromatin. Future developments in genome sequencing and population genomics in Drosophila will continue to shed light on the conservation, evolution and function of regulatory RNAs. - REGULATORY RNAs Early models of gene expression envisioned a system of transcriptional regulation mediated by RNA molecules [1, 2]. This regulatory role of RNA molecules was largely abandoned as transcription factors were characterized, leading to a transcription-factorcentered view of gene regulation [3, 4]. After the discovery of RNA interference (RNAi) in eukaryotes (reviewed earlier [5]), the idea of regulatory RNAs was resurrected in a different form: some RNA molecules may be down-regulating other RNA molecules by sequence complementarity. This type of antisense RNA-mediated regulation had been already described in prokaryotes [6]. When microRNAs (miRNAs) were first observed in the roundworm Caenorhabditis elegans, a mechanism of gene down-regulation by RNARNA complementarity in eukaryotes became apparent [7, 8]. We currently know that multiple types of RNAs have important regulatory functions in the cell, and that they are widespread in animal genomes. Current models of gene regulation integrate the RNA component, providing a much more complex picture than we had two decades ago. Drosophila melanogaster has dominated the field of genetics for over a century. Not surprisingly, genes regulating animal development were first discovered in this species [9]. Early investigations by Ed Lewis showed that multiple loci controlling the fly body patterning were closely linked in a single genomic region, the bithorax complex (BX-C, see [10] and references therein). These loci are located in the genome in the same order as they are spatially expressed in the fly, and they were named after the anatomic region affected in their mutants (Figure 1). Lewis initially characterized 8 genes in the BX-C complex, but only three of them coded for proteins: Ubx, abd-A and Abd-B [11]. Transcripts from the other loci were identified much later [12]. We currently know that three of these transcripts are regulatory RNAs: one long non-coding RNA, bxd and two miRNAs, iab-4 and iab-8 (Figure 1). The pioneering work by Ed Lewis on the BX-C complex in Drosophila, therefore, represented the Table 1: Non-protein-coding RNAs annotated to the Drosophila melanogaster genome Transfer RNAs Small nuclear RNAs Small nucleolar RNAs Ribosomal RNAs MicroRNAs Other non-coding RNAs *Annotated in FlyBase, 2 March 2012. Number of loci* first functional analysis of regulatory RNAs in animals. The D. melanogaster genome sequence has been particularly useful to study regulatory sequences [13]. FlyBase [14] catalogues about 1500 non-protein coding loci (Table 1). miRNAs are the only class of regulatory RNAs indexed in FlyBase. Other known and putative regulatory RNAs are included in the long non-coding RNA category. Genetic loci encoding other short regulatory RNAs such as piwi-interacting RNAs (piRNAs) or endogenous small interfering RNAs (siRNAs) are currently not even catalogued. This review focuses on how Drosophila genomics has contributed to the analysis of regulatory RNAs, and how future developments will provide a better understanding of their function and evolution. microRNAs miRNAs are key regulators of gene expression at the post-transcriptional level. They bind to target transcripts by sequence complementarity inducing either degradation or translational repression [15, 16]. miRNA biogenesis is well understood [Figure 2 (top-left)]. A miRNA locus is transcribed into a primary miRNA, which is processed by the RNase complex Drosha/Pasha producing a precursor hairpin [16]. Precursor hairpins are further cleaved in the cytoplasm by DCR-1 and LOQS (Table 2), the products of the genes Dicer-1 and loquacious [17]. The result is a double-stranded RNA molecule (miRNA duplex in Figure 2) with an approximate length of 21 nt. One of the arms of the miRNA duplex typically becomes the mature sequence. Partial complementary between the mature miRNA and its target mediates the translational repression in association with Argonaute 1 (AGO1). When the complementarity between the miRNA and the target is perfect, the miRNA enters the RNAi pathway, and the targeted transcript is instead, degraded by Argonaute 2 (AGO2) [18]. The first miRNA ever characterized was lin-4 in C. elegans [7, 8]. Lin-4 remained as a unique type of regulator until, a few years later, a second miRNA was characterized: let-7. Like lin-4, let-7 was first identified in C. elegans [19]. However, by that time, the genome of Drosophila was already available [20], and let-7 was identified by sequence similarity in this species, as well as in other animals with ongoing genome projects [21]. Since both lin-4 and let-7 control developmental timing, they were classified as small temporal RNAs (stRNAs). In a collective effort, three groups cloned multiple stRNAs from D. melanogaster, C. elegans and humans [2224], and introduced the term microRNA. The initial cloning of miRNAs from 22 Drosophila loci [22] showed early that miRNAs are often clustered in the genome. The comparative analysis of miRNAs in Drosophila was crucial to establish the basis of the computational prediction of small RNAs [25]. By first screening the genome for potential miRNA loci, the cloning experiments became more specific (i.e. less expensive). Likewise, the prediction of miRNA targets was first modelled in D. melanogaster using this initial set [26, 27]. Both prediction of miRNA loci and targets had relied on conservation in a second available Drosophila genome sequence: D. pseudoobscura. Because of the small size of miRNAs and their target sites, the proper study of miRNAs required a more extensive collection of closely related genomes. This opportunity came Small RNAs associated with the sequencing, assembly and comparative analysis of the 12 Drosophila genomes [28]. Additionally, the breakthrough of high-throughput sequencing allowed small RNAs characterization without the need for cloning. The combination of computational prediction of miRNAs based on comparative genomics and the fast validation of candidates by deep sequencing resulted in a dramatic expansion in the number of known miRNAs in Drosophila (Figure 3, [2931]). These analyses revealed additional miRNA features: (i) as suspected, the mature functional sequence of a miRNA is more conserved than the precursor hairpin [28]; (ii) some miRNAs (mirtrons) bypass the action of Drosha during their biogenesis, being processed as introns by the splicing machinery [32, 33]; (iii) the comparison of closely related species improves the identification of functional miRNA target sites [34]. More recently, as a part of the modEncode project [35], the profile of small RNAs has been thoroughly investigated in multiple tissues and developmental stages, permitting the discovery of additional miRNAs [36]. miRBase [37], the repository for all miRNAs sequences, currently catalogues 240 loci encoding miRNAs in D. melanogaster. The systematic characterization of miRNAs in multiple Drosophila genomes has provided an excellent opportunity to study the evolutionary dynamics of these tiny regulators [38, 39]. Within the Drosophila lineage, miRNAs appear to have high turnover rates [38, 40]. Comparison with other species also shows that only a few miRNAs are conserved among the animals [41, 42]. However, a number of striking observations have been made from the deep sequencing of miRNAs from multiple species: (i) Highly conserved miRNAs can change their function during evolution by modifying their Dicer/Drosha cleavage sites [42, 43]; (ii) functional changes can also occur by changing the arm of the precursor that will produce the mature miRNA [4345]. Specifically, in D. melanogaster, 20% of the conserved miRNAs produce a different mature sequence than their Tribolium castaneum orthologue [43]; (iii) Clusters of co-transcribed miRNAs change dynamically during evolution [43, 46]. All these changes are likely to affect the miRNA function. Undoubtedly, the analysis of more arthropods will provide a clearer picture of miRNA functional evolution. ENDOGENOUS siRNAs The injection of double-stranded RNAs to induce targeted gene silencing has been used extensively in the genetic analysis of plants and animals [5]. This mechanism, called RNAi, is now well understood [5, 47]. Long exogenous double-stranded RNAs are cleaved in the cell into double-stranded RNA molecules of about 21 nt, known as siRNAs. This cleavage is mediated by the DCL-2 Dicer family member in Drosophila [Figure 2 (bottom-left)]. siRNAs bind to full complementary sequences within the target inducing their degradation. In Drosophila, this degradation is mediated by AGO2. The first endogenous (endo-) siRNAs (i.e. encoded in the genomic sequence) in animals were found in C. elegans [48], followed 2 years later by their discovery in Drosophila [4951]. Strikingly, experiments in Drosophila revealed the existence of two independent genomic sources of endo-siRNAs [Figure 2 (bottom-left)]. Some siRNAs are generated from long double-stranded RNA molecules (endodsRNAs) and are processed by the same enzymes known to cleave exogenous siRNAs: DCR-2 and R2D2 [49, 50]. Endo-dsRNAs are mainly composed of transposon-derived sequences. Other siRNAs are derived from long RNA hairpins (hpRNAs), and instead of R2D2, the processing is mediated by LOQS, the partner of DCR-1 in the miRNA pathway [49, 51]. miRNA and siRNA pathways are thus intertwined, sharing at least two proteins: AGO2 and LOQS [Table 2; Figure 2 (left)]. Unlike miRNAs, endo-siRNAs are mostly derived from repetitive regions. Their detection therefore, requires the mapping of short sequenced reads to highly repetitive genomic regions. Perhaps, for that reason, the characterization of Drosophila endogenous siRNAs (and piRNAs, see below) occurred after the annotation of the heterochromatic regions of the genome, which are largely composed of nested transposable elements [52]. Heterochromatin sequences of other Drosophila species have been identified, but an assembled heterochromatic genome only exixts, so far, for D. melanogaster [52, 53]. This imposes a limit on the study of small RNAs other than miRNAs. Consequently, the identification of orthologous siRNA loci among drosophilids has not been very successful. A couple of exceptions can be found for siRNA loci overlapping conserved protein coding genes, such as cis-NAT (antisense to tkv) [54] and hp-CG4068 (antisense to CG4068) [51]. The conservation is, however, limited to closely related species. The study of neighbouring genes to detect orthologous siRNA loci and a better characterization of heterochromatin across the 12 Drosophila genomes will tell us more about the origin of these small regulators. piRNAs Both mature miRNAs and siRNAs have a size of 21 nt. During a systematic cloning of small RNAs in Drosophila, a class of RNAs slightly longer than a miRNA was noticed [55]. These sequences were described as repeat-associated small interfering RNAs (rasiRNAs). rasiRNAs were soon identified as a particular type of piwi-interacting RNAs (piRNAs) [56]. piRNAs are transcribed from genomic piRNA-clusters, mostly composed of inactive transposable elements (TEs, reviewed in [57]). Before the discovery of piRNAs, co-suppression of TEs had been already described [58, 59], and a role of ancestral transposon insertions in silencing novel TEs from the same family was also proposed [59, 60]. Moreover, it has been suggested that clusters of TEs may form a co-suppression network that down-regulates the expression of other TEs [61]. Indeed, piRNAs are known now to be an important defensive mechanism against transposons [6265]. This suggests that, most likely, a complex TE co-suppression network based on piRNAs does exist. The discovery of the piRNA pathway has revealed the nature of two previously known phenomena in Drosophila. First, the maternal effect locus flamenco induces the silencing of gypsy transposons [66]. The flamenco locus is actually a piRNA-cluster [63]. Second, hybrid dysgenesis in Drosophila is produced by the massive mobilization of P-elements in the germ line [67]. The piRNA-mediated response is behind this classic phenomena [68]. There may be, however, two independent piRNA pathways [64, 69, 70] [Figure 2 (right)]. A somatic pathway happens in the nucleus of the follicle cells of the ovary. In somatic cells, large piRNA-clusters are transcribed and then processed by PIWI into small piRNAs [Figure 2 (top-right)]. PIWI/piRNA complexes directly target transposable elements. The flamenco locus is of this kind. An independent germ-line pathway occurs mainly in the cytoplasm of the nurse cells [Figure 2 (bottom-right)]. In this case, genomic piRNA transcripts and targeted (transposon-derived) sequences induce the degradation of each other through the proteins AUB and AGO3. A feed-back loop, called ping-pong, is established, generating a characteristic pattern of sense/ antisense piRNAs overlapping each other by 10 nucleotides. Functional analyses showed that PIWI is also involved in this second pathway, but its role is not yet clear [64]. The piRNA pathway is conserved in animals (reviewed previously [47]). Consequently, other Drosophila species should code for piRNAs. However, the comparative analysis of conserved piRNAs between drosophilids is, as in the case of endo-siRNAs, problematic. The flamenco locus is conserved between D. melanogaster, D. yakuba and D. erecta and it encodes anti-sense piRNAs that target transposons of the gypsy family across the Drosophila lineage, although the specific transposons that are targeted vary from species to species [69]. Clusters of TEs, from which piRNAs derive, tend to be located in the heterochromatin [57]. As in the case of endogenous siRNAs, the analysis of the heterochromatic part of the genome is crucial to further investigate the origin and evolutionary dynamics of piRNAs. Although the primary function of piRNAs is the defence against TEs, a role in chromatin regulation has also been proposed [56, 71]. Interestingly, piRNAs are likely to regulate Drosophila telomeric chromatin, which is mostly composed of retrotransposons (reviewed in [72]). In agreement with these observations, specific piRNA targets in the Drosophila telomeric retrotransposon HeT-A have been identified [73]. These targets are conserved in other Drosophila species [73]. Other instances of piRNAmediated chromatin regulation involving TEs are still unknown. LONG NON-CODING RNAs Long non-coding RNAs (lncRNAs) are often defined as transcripts longer than 200 nt with little or no protein-coding capacity [74]. In practice, an RNA molecule is considered to be a lncRNA if it cannot be ascribed to any other class of non-protein coding RNAs (Table 1). As discussed in the previous sections, small regulatory RNAs show signatures of enzymatic processing in their mature products (mainly conserved size and RNase cleavage sites), facilitating their identification in the genome. However, lncRNAs have no recognizable signatures and their characterization has been based, almost Ubx regulation Heat^ shock stress response Germ cell transcriptional inhibition Dosage compensation Dosage compensation Courtship behaviour Sleep behaviour Note: aLength of the longest RNA transcript. exclusively, on transcriptional analyses. The first characterized lncRNA in Drosophila was bxd (Figure 1), a non-protein-coding transcript that regulates the expression of Ubx. Paradoxically, when bxd transcripts were first characterized, it was proposed that they encode small regulatory proteins rather than acting as long regulatory RNAs [12]. Soon after, other lncRNAs were identified in Drosophila (Table 3). The successful differentiation of female germline cells in the ovary requires the RNA product of the gene pgc [75]. Dosage compensation in males is also mediated by two RNA molecules: roX1 and roX2 [76]. Even the heatshock stress response is regulated by non-protein-coding RNAs form the Hsr! gene [77]. After the sequencing of the Drosophila genome, the first systematic screenings of Drosophila lncRNAs detected 52 putative loci [80, 81]. Whole-genome tiling arrays have facilitated the detection of potential lncRNAs [82], although their validation requires further experimental confirmation. More recently, it has been estimated that around 5000 loci may encode non-protein-coding transcripts in Drosophila [83]. However, the number of functional regulatory lncRNAs is still to be determined. In some cases, the detection of orthologous lncRNAs among drosophilids requires analysis of their secondary structures, in addition to their primary sequences. For instance, roX1 and roX2 sequences diverged so fast that their detection in the 12 Drosophila genomes was based on the conservation of structural features of their RNA products [28]. According to FlyBase [14], bxd, Hsr! and pgc are also conserved across drosophilids. However, both sequence and structural conservation is often restricted to a small part of the RNA molecule [74], making the detection of homologous lncRNAs difficult even between closely related species. Consequently, a comprehensive evolutionary analysis of Drosophila lncRNAs is still missing. It is expected that the sequencing of complete genomes from different populations of Drosophila will help us to understand the evolutionary origin of these enigmatic sequences. FUTURE PROSPECTS Comparative genomics has been particularly useful for the detection of non-protein-coding RNAs [84]. However, prediction of small regulatory RNAs is based on the structure of their precursors due to the size and unstructured nature of the mature sequences. Recently, identification of novel mature small RNAs has proceeded almost exclusively by transcriptional profiling of small RNAs. The combination of deep sequencing and comparative genomics in Drosophila has permitted the identification and evolutionary analyses of miRNAs, but the study of piRNAs and endo-siRNAs has additional issues. First, both piRNAs and endo-siRNAs are likely to vary with the transposable element content of the host genome, and the comparison between even close species is difficult. Also, siRNAs and piRNAs are often located in heterochromatic regions [57], which has been extensively studied in D. melanogaster, but not as much in other fly species. The sequencing and assembling of heterochromatic DNA from the other 11 Drosophila genomes will create an opportunity to study the conservation of these RNAs within the Drosophila genus. The identification of lncRNAs is also a challenge, particularly as we do not know of any universal features of all lncRNAs. The comparative analyses of lncRNAs could be improved by using indirect strategies to identify homologues. For instance, the study of syntenic blocks has been very helpful to annotate orthologous transfer-RNAs in Drosophila [85]. Similar approaches may be successfully applied to lncRNAs (and other non-protein-coding sequences). Population genetics is particularly useful to study the evolutionary dynamics of fast evolving genes (e.g. [86]). The study of regulatory RNAs in populations has been mostly restricted to miRNAs [87, 88], although piRNAs have recently captured the attention of population geneticists [65]. With the development of deep sequencing, the characterization of entire genomes from hundreds of different populations is becoming a reality ( Population genomics of non-coding RNAs shows particular promise for the near future. Do any large classes of regulatory RNA remain unidentified? Are there genomic signatures that would allow us to detect non-coding RNAs without having transcriptional information? Do small RNAs have other, yet unknown, biological functions? These are some of the most important questions in the RNA biology field. The D. melanogaster genome and its close relatives will have a lot to say. The sequencing and analysis of Drosophila genomes have had a big impact on the study of regulatory RNAs. The comparison of 12 Drosophila genomes revealed important aspects of the evolutionary dynamics of miRNA sequences. Comparative genomics analysis of regulatory RNAs other than miRNAs has been, so far, less successful. Population genomics and heterochromatin sequencing in other Drosophila species are promising areas to investigate the nature of regulatory RNAs. Acknowledgements I thank Sam Griffiths-Jones and Maria Ninova for critical reading of the manuscript and two anonymous reviewers for constructive comments. I also thank Matthew Ronshaugen for extensive discussion on the history of non-protein coding RNAs and Casey Bergman for helpful insights on Drosophila transposable elements. FUNDING This work was supported by the Wellcome Trust (097820/Z/ 11/Z) and a grant from the Biotechnology and Biological Sciences Research Council (BB/G011346/1). References 48. Ruby JG, Jan C, Player C, et al. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 2006;127:1193207.

This is a preview of a remote PDF:

Antonio Marco. Regulatory RNAs in the light of Drosophila genomics, Briefings in Functional Genomics, 2012, 356-365, DOI: 10.1093/bfgp/els033