High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families
BMC Genomics
High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families
Gyorgy Szittya 2
Simon Moxon 1
Dulce M Santos 0
Runchun Jing 2
Manuel PS Fevereiro 0
Vincent Moulton 1
Tamas Dalmay 2
0 Laboratory of Plant Cell Biotechnology, ITQB/IBET - Apt 127 , 2781-901 Oeiras , Portugal
1 School of Computing Science, University of East Anglia , Norwich, NR4 7TJ , UK
2 School of Biological Sciences, University of East Anglia , Norwich, NR4 7TJ , UK
Background: High-throughput sequencing technology is capable to identify novel short RNAs in plant species. We used Solexa sequencing to find new microRNAs in one of the model legume species, barrel medic (Medicago truncatula). Results: 3,948,871 reads were obtained from two separate short RNA libraries generated from total RNA extracted from M. truncatula leaves, representing 1,563,959 distinct sequences. 2,168,937 reads were mapped to the available M. truncatula genome corresponding to 619,175 distinct sequences. 174,504 reads representing 25 conserved miRNA families showed perfect matches to known miRNAs. We also identified 26 novel miRNA candidates that were potentially generated from 32 loci. Nine of these loci produced eight distinct sequences, for which the miRNA* sequences were also sequenced. These sequences were not described in other plant species and accumulation of these eight novel miRNAs was confirmed by Northern blot analysis. Potential target genes were predicted for most conserved and novel miRNAs. Conclusion: Deep sequencing of short RNAs from M. truncatula leaves identified eight new miRNAs indicating that specific miRNAs exist in legume species.
-
Background
Gene expression is regulated at several layers in plants to
ensure optimal temporal and spatial accumulation of
proteins. One of the latest discovered regulatory layers
involves short RNA (sRNA) molecules 2124 nucleotides
in length that act post-transcriptionally [1]. There are
surprisingly many different sRNAs in plant cells indicating an
extensive role for these molecules [2]. Plant sRNAs are
produced from double stranded RNA (dsRNA) by one of
the four Dicer-like proteins (DCL1-4). The different DCL
proteins process dsRNAs generated by diverse pathways
[1]. MicroRNAs (miRNAs) are produced from partially
complementary dsRNA precursor molecules (pre-miRNA)
[3]. Pre-miRNAs are originally single stranded RNAs with
hairpin structures and recognized by DCL1 [4]. The other
large class of plant sRNAs is small interfering RNAs
(siRNAs). siRNAs are processed from dsRNAs usually
generated by one of the RNA Dependent RNA Polymerases
(RDRs). Trans-acting siRNA (ta-siRNAs) precursors are
generated by RDR6 [5,6] and heterochromatin siRNA
precursors are made by RDR2 [7]. Another group of siRNAs,
the natural antisense siRNAs (nat-siRNAs), are processed
from dsRNA produced by overlapping antisense mRNAs
[8].
MiRNAs are the best characterized sRNAs in plants [9].
The primary transcript (pri-miRNA) is transcribed by RNA
polymerase II and contains an imperfect hairpin structure.
DCL1 trims this hairpin structure producing the
premiRNA and then a second cleavage by DCL1 produces the
miRNA/miRNA* duplex [10]. This molecule has a two
nucleotide 3' overhang at each side of the duplex and
contains a few mismatches [9]. One of the strands of the
miRNA/miRNA* duplex is integrated into RISC (RNA
induced silencing complex). This strand is called mature
miRNA and the partially complementary miRNA* strand
gets degraded, although in most cases the miRNA* strand
also accumulates at a lower level [9]. RISC finds specific
mRNAs because the incorporated mature miRNA can
anneal to partially complementary target sites [3]. Target
sites show near perfect matches to plant miRNA sequences
and initially it was thought that all target mRNAs are
cleaved by RISC [3]. Recently it was shown that the
translation of plant mRNAs is also suppressed without a
cleavage [11].
Most plant miRNA families have been identified by
traditional Sanger sequencing method in model species with
known genome sequences (Arabidopsis, rice and poplar)
and most miRNAs are conserved across plant families
[12]. However, some miRNAs are species/family specific
and Allen et al. [13] suggested that these "young" miRNAs
have evolved recently, in contrary to the conserved
miRNAs ("old" miRNAs). Since non-conserved miRNAs are
often accumulated at a lower level than conserved
miRNAs, traditional small-scale sequencing primarily reveals
conserved miRNAs. Establishment of high-throughput
technologies has allowed the identification of several
non-conserved or lowly expressed miRNAs through deep
sequencing, e.g. in Arabidopsis, wheat and tomato
[1417]. Here we describe the deep sequencing of short RNAs
extracted from M. truncatula leaves and the experimental
validation of eight novel miRNAs.
Results
Deep sequencing of M. truncatula short RNAs
Two separate cDNA libraries of short RNAs were
generated from Medicago truncatula leaves and both libraries
were sequenced by Solexa (Illumina). The first PCR
product was quantified by nanodrop and 5 pM was loaded to
Solexa, which yielded 5942 clusters and 872,048
sequence reads. The second sample was also quantified on
polyacrylamide gel because the capacity of Solexa is
higher than this. Since the nanodrop gave an
approximately 50 times higher concentration than quantification
on a gel we concluded that the nanodrop overestimates
the concentration of the PCR product. To obtain more
sequences, we loaded four times of the amount that we
would have loaded based only on the nanodrop reading
and this yielded 23301 clusters and 3,076,823 sequence
reads. It is worth mentioning that loading based on only
quantification on gel would have led to overloading the
Solexa and would have produced many but unreliable
sequences.
The two sets of reads were combined and analysed
together (Table 1) using the miRCat pipeline we have
developed earlier [18]. The size distribution of sequence
reads showed that the 24 nt class was the most abundant
group of sRNAs followed by the 21 nt sequences and then
the 23, 22 and 20 nt reads (Figure 1). The almost four
million reads represented 1,563,959 distinct sequences
suggesting that the library is still not saturated. Out of the
four million reads 2,168,937 matched to the genome
without any mismatches, representing 619,175
sequences.
Conserved miRNAs
First we looked for known miRNAs by comparing our
library to known miRNAs from other plant species.
174,504 reads corresponding to 25 conserved miRNA
families showed perfect matches to known miRNAs. We
analysed the number of reads for conserved miRNAs and
miR156, 159 and 166 were represented most frequently
in the library (Table 2). Allowing one or two mismatches
between sequences in our library and sequences in
miRBase increased the number of conserved miRNA families
in M. truncatula to 31 (Table 2). Next we predicted target
genes and putative targets were identified for 27 out (...truncated)