Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs

Genome Biology, Nov 2013

Background Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. Results We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5′ capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. Conclusions We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.

Article PDF cannot be displayed. You can download it here:

http://genomebiology.com/content/pdf/gb-2013-14-11-r131.pdf

Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs

Marques et al. Genome Biology Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs Ana C Marques 0 1 Jim Hughes 2 Bryony Graham 2 Monika S Kowalczyk 2 3 Doug R Higgs 2 Chris P Ponting 0 1 0 Department of Physiology , Anatomy and Genetics , University of Oxford , South Parks Road, Oxford OX1 3QX , UK 1 MRC Functional Genomics Unit, Department of Physiology , Anatomy and Genetics , University of Oxford , South Parks Road, Oxford OX1 3QX , UK 2 MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University , Oxford OX3 9DS , UK 3 Current address: The Broad Institute of MIT and Harvard , Cambridge MA 02142, Massachusetts , USA Background: Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. Results: We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5 capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. Conclusions: We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci. - Background Eukaryotic genomes are pervasively transcribed [1,2] with evidence for up to three-quarters of nucleotides in the human genome being expressed in at least one cell type during development [2]. Transcripts lacking an apparent open reading frame are often classified simply based on their length, the absence of protein-coding potential and their location in the genome relative to protein-coding genes [3,4]. An intriguing class of noncoding transcripts are those exceeding 200 nucleotides in length and transcribed from loci that are intergenic relative to protein-coding genes (intergenic long noncoding RNAs (lncRNAs)). At least 50,000 lncRNAs are expressed from intergenic regions of the human genome, more than twice the number of protein-coding genes [5]. Compared to protein-coding transcripts, intergenic lncRNAs are generally less abundant and their expression is more spatially and temporally restricted [4,6]. Genome-wide analysis of mammalian intergenic lncRNA sequence [7,8] and transcription [9,10] has revealed that, in general, these loci have been conserved during evolution, albeit at substantially lower levels than protein-coding genes, suggesting that at least some intergenic lncRNAs may have conserved biological roles. Biological functions attributed to the handful of wellcharacterized intergenic lncRNAs are diverse, ranging from transcriptional control to post-transcriptional modulation of gene expression (for recent reviews see [11-13]). In this study, for simplicity, we refer to intergenic lncRNAs as those that are transcribed by RNA-polymerase II, 5 end capped and polyadenylated. Here we address two important, and incompletely answered, questions concerning the origins (transcriptional initiation regions (TIRs)) and classification of intergenic lncRNAs. First, what is the relative prevalence of promoter- and enhancerassociated transcripts within sets of transcripts that are annotated simply as being intergenic lncRNAs? Second, do differences in the chromatin status at intergenic lncRNA TIRs reflect their potential function? Histone modifications allow the distinction between different types of regulatory elements [14,15]. Promoters of transcribed protein-coding genes, for example, are enriched in trimethylation of lysine 4 of histone H3 (H3K4me3) [14,15]. Some intergenic lncRNA loci have been defined previously using chromatin signatures that are similar to those often found at protein-coding genes, namely H3K4me3 marked promoters and trimethylation of lysine 36 of histone H3 (H3K36me3) across transcribed regions [16]. These findings demonstrate that some intergenic lncRNAs are transcribed from promoter-like elements. A second class of transcripts could be prevalent in current catalogues of intergenic lncRNAs, namely enhancer-associated noncoding RNAs (eRNAs) [17]. Transcription is a common feature of active mammalian enhancers and can give rise to both non-polyadenylated, bidirectional, unstable transcripts [17] as well as unidirectionally transcribed, polyadenylated, relatively stable and sometimes spliced eRNAs [18,19]. We have previously shown that activation of enhancers located within proteincoding genes promotes transcription of long noncoding RNAs that utilize splicing and polyadenylation signals from their protein-coding hosts to produce stable unidirectional eRNAs [20]. On the other hand, the expression of intergenic lncRNA loci has been associated with enhanced levels of their neighboring protein-coding genes, both through genome-wide [10,21,22] and locus-specific analyses [22,23], suggesting that a large, yet undetermined, fraction of transcripts within lncRNA catalogues are unidirectional eRNAs, as previously proposed by Natoli and Andrau [24]. These observations motivated us to expand on our earlier observations [10,20] to determine to what extent intergenic lncRNAs might originate from active intergenic enhancers. To address this question we generated new genome-wide maps of H3K4me3 and monomethylation of lysine 4 of histone H3 (H3K4me1 and H3K4me3, respectively), deep poly(A) + RNA sequencing and nanoCAGE [25,26] data from purified mouse erythroblasts. Using these data, we annotated a stringent set of intergenic lncRNAs expressed in these cells and accurately defined their transcriptional start sites using these newly acquired nanoCAGE data. We used the relative abund (...truncated)


This is a preview of a remote PDF: http://genomebiology.com/content/pdf/gb-2013-14-11-r131.pdf
Article home page: http://genomebiology.com/2013/14/11/R131

Ana C Marques, Jim Hughes, Bryony Graham, Monika S Kowalczyk, Doug R Higgs, Chris P Ponting. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs, Genome Biology, 2013, pp. R131, 14, DOI: 10.1186/gb-2013-14-11-r131