High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-9-593.pdf

High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families

BMC Genomics High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families Gyorgy Szittya 2 Simon Moxon 1 Dulce M Santos 0 Runchun Jing 2 Manuel PS Fevereiro 0 Vincent Moulton 1 Tamas Dalmay 2 0 Laboratory of Plant Cell Biotechnology, ITQB/IBET - Apt 127 , 2781-901 Oeiras , Portugal 1 School of Computing Science, University of East Anglia , Norwich, NR4 7TJ , UK 2 School of Biological Sciences, University of East Anglia , Norwich, NR4 7TJ , UK Background: High-throughput sequencing technology is capable to identify novel short RNAs in plant species. We used Solexa sequencing to find new microRNAs in one of the model legume species, barrel medic (Medicago truncatula). Results: 3,948,871 reads were obtained from two separate short RNA libraries generated from total RNA extracted from M. truncatula leaves, representing 1,563,959 distinct sequences. 2,168,937 reads were mapped to the available M. truncatula genome corresponding to 619,175 distinct sequences. 174,504 reads representing 25 conserved miRNA families showed perfect matches to known miRNAs. We also identified 26 novel miRNA candidates that were potentially generated from 32 loci. Nine of these loci produced eight distinct sequences, for which the miRNA* sequences were also sequenced. These sequences were not described in other plant species and accumulation of these eight novel miRNAs was confirmed by Northern blot analysis. Potential target genes were predicted for most conserved and novel miRNAs. Conclusion: Deep sequencing of short RNAs from M. truncatula leaves identified eight new miRNAs indicating that specific miRNAs exist in legume species. - Background Gene expression is regulated at several layers in plants to ensure optimal temporal and spatial accumulation of proteins. One of the latest discovered regulatory layers involves short RNA (sRNA) molecules 2124 nucleotides in length that act post-transcriptionally [1]. There are surprisingly many different sRNAs in plant cells indicating an extensive role for these molecules [2]. Plant sRNAs are produced from double stranded RNA (dsRNA) by one of the four Dicer-like proteins (DCL1-4). The different DCL proteins process dsRNAs generated by diverse pathways [1]. MicroRNAs (miRNAs) are produced from partially complementary dsRNA precursor molecules (pre-miRNA) [3]. Pre-miRNAs are originally single stranded RNAs with hairpin structures and recognized by DCL1 [4]. The other large class of plant sRNAs is small interfering RNAs (siRNAs). siRNAs are processed from dsRNAs usually generated by one of the RNA Dependent RNA Polymerases (RDRs). Trans-acting siRNA (ta-siRNAs) precursors are generated by RDR6 [5,6] and heterochromatin siRNA precursors are made by RDR2 [7]. Another group of siRNAs, the natural antisense siRNAs (nat-siRNAs), are processed from dsRNA produced by overlapping antisense mRNAs [8]. MiRNAs are the best characterized sRNAs in plants [9]. The primary transcript (pri-miRNA) is transcribed by RNA polymerase II and contains an imperfect hairpin structure. DCL1 trims this hairpin structure producing the premiRNA and then a second cleavage by DCL1 produces the miRNA/miRNA* duplex [10]. This molecule has a two nucleotide 3' overhang at each side of the duplex and contains a few mismatches [9]. One of the strands of the miRNA/miRNA* duplex is integrated into RISC (RNA induced silencing complex). This strand is called mature miRNA and the partially complementary miRNA* strand gets degraded, although in most cases the miRNA* strand also accumulates at a lower level [9]. RISC finds specific mRNAs because the incorporated mature miRNA can anneal to partially complementary target sites [3]. Target sites show near perfect matches to plant miRNA sequences and initially it was thought that all target mRNAs are cleaved by RISC [3]. Recently it was shown that the translation of plant mRNAs is also suppressed without a cleavage [11]. Most plant miRNA families have been identified by traditional Sanger sequencing method in model species with known genome sequences (Arabidopsis, rice and poplar) and most miRNAs are conserved across plant families [12]. However, some miRNAs are species/family specific and Allen et al. [13] suggested that these "young" miRNAs have evolved recently, in contrary to the conserved miRNAs ("old" miRNAs). Since non-conserved miRNAs are often accumulated at a lower level than conserved miRNAs, traditional small-scale sequencing primarily reveals conserved miRNAs. Establishment of high-throughput technologies has allowed the identification of several non-conserved or lowly expressed miRNAs through deep sequencing, e.g. in Arabidopsis, wheat and tomato [1417]. Here we describe the deep sequencing of short RNAs extracted from M. truncatula leaves and the experimental validation of eight novel miRNAs. Results Deep sequencing of M. truncatula short RNAs Two separate cDNA libraries of short RNAs were generated from Medicago truncatula leaves and both libraries were sequenced by Solexa (Illumina). The first PCR product was quantified by nanodrop and 5 pM was loaded to Solexa, which yielded 5942 clusters and 872,048 sequence reads. The second sample was also quantified on polyacrylamide gel because the capacity of Solexa is higher than this. Since the nanodrop gave an approximately 50 times higher concentration than quantification on a gel we concluded that the nanodrop overestimates the concentration of the PCR product. To obtain more sequences, we loaded four times of the amount that we would have loaded based only on the nanodrop reading and this yielded 23301 clusters and 3,076,823 sequence reads. It is worth mentioning that loading based on only quantification on gel would have led to overloading the Solexa and would have produced many but unreliable sequences. The two sets of reads were combined and analysed together (Table 1) using the miRCat pipeline we have developed earlier [18]. The size distribution of sequence reads showed that the 24 nt class was the most abundant group of sRNAs followed by the 21 nt sequences and then the 23, 22 and 20 nt reads (Figure 1). The almost four million reads represented 1,563,959 distinct sequences suggesting that the library is still not saturated. Out of the four million reads 2,168,937 matched to the genome without any mismatches, representing 619,175 sequences. Conserved miRNAs First we looked for known miRNAs by comparing our library to known miRNAs from other plant species. 174,504 reads corresponding to 25 conserved miRNA families showed perfect matches to known miRNAs. We analysed the number of reads for conserved miRNAs and miR156, 159 and 166 were represented most frequently in the library (Table 2). Allowing one or two mismatches between sequences in our library and sequences in miRBase increased the number of conserved miRNA families in M. truncatula to 31 (Table 2). Next we predicted target genes and putative targets were identified for 27 out (...truncated)