Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs
Marques et al. Genome Biology
Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs
Ana C Marques 0 1
Jim Hughes 2
Bryony Graham 2
Monika S Kowalczyk 2 3
Doug R Higgs 2
Chris P Ponting 0 1
0 Department of Physiology , Anatomy and Genetics , University of Oxford , South Parks Road, Oxford OX1 3QX , UK
1 MRC Functional Genomics Unit, Department of Physiology , Anatomy and Genetics , University of Oxford , South Parks Road, Oxford OX1 3QX , UK
2 MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University , Oxford OX3 9DS , UK
3 Current address: The Broad Institute of MIT and Harvard , Cambridge MA 02142, Massachusetts , USA
Background: Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. Results: We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5 capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. Conclusions: We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.
-
Background
Eukaryotic genomes are pervasively transcribed [1,2] with
evidence for up to three-quarters of nucleotides in the
human genome being expressed in at least one cell type
during development [2]. Transcripts lacking an apparent
open reading frame are often classified simply based on
their length, the absence of protein-coding potential and
their location in the genome relative to protein-coding
genes [3,4]. An intriguing class of noncoding transcripts are
those exceeding 200 nucleotides in length and transcribed
from loci that are intergenic relative to protein-coding
genes (intergenic long noncoding RNAs (lncRNAs)). At
least 50,000 lncRNAs are expressed from intergenic regions
of the human genome, more than twice the number of
protein-coding genes [5]. Compared to protein-coding
transcripts, intergenic lncRNAs are generally less abundant
and their expression is more spatially and temporally
restricted [4,6]. Genome-wide analysis of mammalian
intergenic lncRNA sequence [7,8] and transcription [9,10]
has revealed that, in general, these loci have been conserved
during evolution, albeit at substantially lower levels
than protein-coding genes, suggesting that at least some
intergenic lncRNAs may have conserved biological roles.
Biological functions attributed to the handful of
wellcharacterized intergenic lncRNAs are diverse, ranging from
transcriptional control to post-transcriptional modulation
of gene expression (for recent reviews see [11-13]).
In this study, for simplicity, we refer to intergenic
lncRNAs as those that are transcribed by RNA-polymerase
II, 5 end capped and polyadenylated. Here we address
two important, and incompletely answered, questions
concerning the origins (transcriptional initiation regions
(TIRs)) and classification of intergenic lncRNAs. First,
what is the relative prevalence of promoter- and
enhancerassociated transcripts within sets of transcripts that are
annotated simply as being intergenic lncRNAs? Second,
do differences in the chromatin status at intergenic lncRNA
TIRs reflect their potential function?
Histone modifications allow the distinction between
different types of regulatory elements [14,15]. Promoters
of transcribed protein-coding genes, for example, are
enriched in trimethylation of lysine 4 of histone H3
(H3K4me3) [14,15]. Some intergenic lncRNA loci have
been defined previously using chromatin signatures that
are similar to those often found at protein-coding genes,
namely H3K4me3 marked promoters and trimethylation
of lysine 36 of histone H3 (H3K36me3) across transcribed
regions [16]. These findings demonstrate that some
intergenic lncRNAs are transcribed from promoter-like
elements.
A second class of transcripts could be prevalent
in current catalogues of intergenic lncRNAs, namely
enhancer-associated noncoding RNAs (eRNAs) [17].
Transcription is a common feature of active mammalian
enhancers and can give rise to both non-polyadenylated,
bidirectional, unstable transcripts [17] as well as
unidirectionally transcribed, polyadenylated, relatively stable and
sometimes spliced eRNAs [18,19]. We have previously
shown that activation of enhancers located within
proteincoding genes promotes transcription of long noncoding
RNAs that utilize splicing and polyadenylation signals from
their protein-coding hosts to produce stable unidirectional
eRNAs [20]. On the other hand, the expression of
intergenic lncRNA loci has been associated with enhanced levels
of their neighboring protein-coding genes, both through
genome-wide [10,21,22] and locus-specific analyses [22,23],
suggesting that a large, yet undetermined, fraction of
transcripts within lncRNA catalogues are unidirectional
eRNAs, as previously proposed by Natoli and Andrau
[24]. These observations motivated us to expand on our
earlier observations [10,20] to determine to what extent
intergenic lncRNAs might originate from active intergenic
enhancers.
To address this question we generated new genome-wide
maps of H3K4me3 and monomethylation of lysine 4 of
histone H3 (H3K4me1 and H3K4me3, respectively), deep
poly(A) + RNA sequencing and nanoCAGE [25,26] data
from purified mouse erythroblasts. Using these data, we
annotated a stringent set of intergenic lncRNAs expressed
in these cells and accurately defined their transcriptional
start sites using these newly acquired nanoCAGE data. We
used the relative abund (...truncated)