Evolution of an X-Linked Primate-Specific Micro RNA Cluster

Molecular Biology and Evolution, Mar 2010

Micro RNAs (miRNAs) are a class of small regulatory RNAs, which posttranscriptionally repress protein production of the targeted messenger RNAs (mRNAs). Accumulating evidence has suggested lineage-specific miRNAs have contributed to lineage-specific characteristics. However, the birth and death of these miRNAs, particularly in primates, largely remain unexplored. We herein characterized the evolutionary history of a newly discovered miRNA cluster on primate X-chromosome, spanning a ∼33-kb region in human Xq27.3. The cluster consists of six distinct miRNAs, four of which are compactly organized in a 3-kb region belonging to a phylogenetic group distinct from the other two miRNAs. By comparing the genomic structure of this cluster in human with four other primates (chimpanzee, orangutan, rhesus macaque, and marmoset), we identified several previously uncovered miRNAs in these primates that share orthology with the human miRNAs. We found the entire miRNA cluster was well conserved among primate species but unidentifiable in other mammalian species (including mouse, rat, cat, dog, horse, cow, opossum, and platypus), suggesting that the formation of this cluster was after the primate–rodent split but before the emergence of New-World Monkey (represented by marmoset). Our analysis further revealed complex evolutionary dynamics on this locus, characterized by extensive duplication events. Phylogenetic analysis revealed birth and death of the miRNAs within this region, accompanied by rapid evolution, which highlighted their functional importance. These miRNAs are primarily expressed in primate epididymis, part of the male reproductive system. Our analysis showed that their predicted target mRNAs are significantly enriched for several functional classes relevant to epididymal physiology, such as morphogenesis of epithelium and tube development. Furthermore, several genes controlling sperm maturation and male fertility are confidently predicted to be their targets. Collectively, we argue these miRNAs might play an important role in epididymal morphogenesis and sperm maturation and in establishing primate-specific epididymal characteristics.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://mbe.oxfordjournals.org/content/27/3/671.full.pdf

Evolution of an X-Linked Primate-Specific Micro RNA Cluster

Jingjing Li 0 1 2 Yu Liu 0 1 2 Dong Dong 0 1 Zh ol i Zh ng 0 1 2 0 Banting and Best Department of Medical Research, University of Toronto , Toronto, ON, Canada 1 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto , Toronto, ON, Canada 2 Department of Molecular Genetics, University of Toronto , Toronto, ON, Canada Micro RNAs (miRNAs) are a class of small regulatory RNAs, which posttranscriptionally repress protein production of the targeted messenger RNAs (mRNAs). Accumulating evidence has suggested lineage-specific miRNAs have contributed to lineage-specific characteristics. However, the birth and death of these miRNAs, particularly in primates, largely remain unexplored. We herein characterized the evolutionary history of a newly discovered miRNA cluster on primate X-chromosome, spanning a ;33-kb region in human Xq27.3. The cluster consists of six distinct miRNAs, four of which are compactly organized in a 3-kb region belonging to a phylogenetic group distinct from the other two miRNAs. By comparing the genomic structure of this cluster in human with four other primates (chimpanzee, orangutan, rhesus macaque, and marmoset), we identified several previously uncovered miRNAs in these primates that share orthology with the human miRNAs. We found the entire miRNA cluster was well conserved among primate species but unidentifiable in other mammalian species (including mouse, rat, cat, dog, horse, cow, opossum, and platypus), suggesting that the formation of this cluster was after the primate-rodent split but before the emergence of New-World Monkey (represented by marmoset). Our analysis further revealed complex evolutionary dynamics on this locus, characterized by extensive duplication events. Phylogenetic analysis revealed birth and death of the miRNAs within this region, accompanied by rapid evolution, which highlighted their functional importance. These miRNAs are primarily expressed in primate epididymis, part of the male reproductive system. Our analysis showed that their predicted target mRNAs are significantly enriched for several functional classes relevant to epididymal physiology, such as morphogenesis of epithelium and tube development. Furthermore, several genes controlling sperm maturation and male fertility are confidently predicted to be their targets. Collectively, we argue these miRNAs might play an important role in epididymal morphogenesis and sperm maturation and in establishing primate-specific epididymal characteristics. - Introduction MicroRNAs (miRNAs) are a class of small noncoding RNAs that have important regulatory roles in animals and plants (Bartel 2004, 2009; Bartel and Chen 2004). Derived from hairpin-structured precursor sequences, miRNAs are matured by Dicer-mediated cleavage into single-stranded form of ;22 nt in length (Bartel and Chen 2004; Bartel 2009). In animals, mature miRNAs typically repress protein production of their target genes through partially complementary binding to specific sequences in the 3# untranslated regions of the targets. This process is typically mediated by the miRNA seed region, between the second and eighth nucleotides along the 5# end of an miRNA mature sequence (Lewis et al. 2003, 2005; Bartel 2004, 2009; Bartel and Chen 2004; Krek et al. 2005; Kertesz et al. 2007). To this date, many human genes of diverse functions are known to be under miRNA regulation (Krek et al. 2005; Lewis et al. 2005; Kertesz et al. 2007; Baek et al. 2008; Selbach et al. 2008). Dysregulation of miRNAs and thus their targets have been implicated in human diseases (Calin et al. 2004; Calin and Croce 2006; Zhao et al. 2007). In contrast to the intense research on miRNA target identification, miRNAs themselves, as a class of regulators, their emergence, evolution and degeneration in animals have largely remained uncharacterized. Recent work based on parallel sequencing has revealed that the emergence of miRNAs and their associated small RNA processing machinery can be traced back to sponge, suggesting an early origin of miRNAs in animal phyla (Grimson et al. 2008). Furthermore, Heimberg et al. (2008) showed that expansion of miRNA repertoire is correlated to increased vertebrate morphological complexity, suggesting miRNAs contribute significantly to lineage-specific characteristics. As many miRNAs with lineage-specific characteristics were recently identified by either sequencing or by computational prediction from fully sequenced genomes, further exploration of miRNA evolution using comparative genomic approaches is becoming feasible. By comparing several Drosophila species, Lu, Fu, et al. (2008) and Lu, Shen, The Author 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: FIG. 1. Phylogenetic relationship and genomic structure of the miRNA cluster on human X-chromosome. The scale at the bottom is in units of nucleotide substitution per site. et al. (2008) have shown a high birth-and-death rate of miRNAs in the fly lineage, which presumably have undergone adaptive evolution. Beyond Drosophila, the evolution of an imprinted miRNA cluster specifically present in placental mammals has been investigated (Glazov et al. 2008), whereas Zhang et al. (2007, 2008) studied the function diversification and rapid evolution of two miRNA clusters in the primate lineage. By comparing recently sequenced ;250 small RNA libraries covering 26 different organs and cell types of human, mouse, and rat (Landgraf et al. 2007), here we characterize a novel X-linked miRNA cluster in human encompassing miRNAs with primate-specific characteristics. Using a comparative genomics approach, we identified several new miRNAs in other primate species (chimpanzee, orangutan, rhesus, and marmoset), which have not been characterized in miRBase (Griffiths-Jones et al. 2008). Our sequence analysis showed that this miRNA cluster can be phylogenetically dissected into two separate regions, each shaped by different evolutionary events. Our evolutionary study further revealed complex evolutionary dynamics on this miRNA cluster, characterized by extensive segmental duplications and rapid evolution on the miRNA loci. Functional analysis combined with their predominant expression in human epididymis suggests these miRNAs might have been highly adapted in establishing primatespecific epididymal physiology. Materials and Methods Data Compilation Available mammalian genomes and annotations were downloaded from the University of California Santa Cruz (UCSC) Genome Browser (Kent 2002; Kent et al. 2002; Kuhn et al. 2009). Precursor, mature, and seed regions of the six human miRNA sequences with their genomic coordinates were retrieved from miRBase (Griffiths-Jones et al. 2008). Clone counts for the six miRNAs across diverse tissue and cell types were from a previously published study (Landgraf et al. 2007). Orthology Detection in Primate Genomes As the six human miRNAs share sequence similarity, directly searching closely related organisms using individual human miRNAs as query (typically ;80 nt in length) may yield multiple hits. To give unambiguous orthology assignment, we divided the entire cluster into two subregions, one encompassing hsa-mir-890/888/892a/892b (spanning ;3 kb) and the other covers the remaining part encompassing hsa-mir-891b/891a (see fig. 1). When performing Blast search in the ;3-kb region against the other four primates, we found its syntenic structure is well conserved with .71% of the regions alignable and with sequence identity .88%. We subsequently confined our search for orthologs for hsa-mir-890/888/892a/892b within these syntenic regions, using the gene order of the four miRNAs in FIG. 2. Sequence alignment of the six miRNAs in the X-linked cluster. Clearly the miRNAs can be separated into two groups based on sequence similarity. humans as a reference. Similar procedures were also applied to hsa-mir-891b/891a (see fig. 8). The mapped synteny of the entire miRNA cluster encompassing the six orthologous miRNAs is shown in supplementary figure S1, Supplementary Material online. Sequence Analysis Sequence alignments were performed using ClustalW (Thompson et al. 1994) implemented in BioEdit (Hall 1999). Sequence search was based on Blast (Altschul et al. 1990) and BLAT (Kent 2002) using default parameters. Specifically, we used BLAT when searching primate and rodent genomes, and then used Blast for confirmation. This was because BLAT is optimized to detect highly similar sequences in a very efficient way (Kent 2002). However, for other mammalian genomes that were more divergent from the primates, for example, horse, we only used Blast (parameterized into both BlastN and discontiguous megaBlast, which is designed for detecting remote homologs in cross-species comparison) because BLAT was not amenable for detecting divergent or short sequences. All sequence search and alignment were followed by manual inspection and correction. The phylogenetic gene trees were constructed using PAUP (Wilgenbusch and Swofford 2003) following a branch and bound search algorithm (Hendy and Penny 1982) and visualized by Tree-View (Page 1996, 2002). Mauve (Darling et al. 2004) was used to detect sequence duplication and rearrangement events between the cluster region and its immediate 3# downstream. RNA secondary structures were predicted by using RNAfold with default settings (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi) and were also confirmed by using RNAshape (Steffen et al. 2006). miRNA Target Prediction and Test for Function Enrichment Predicted miRNA targets were compiled by using TargetScanS (Lewis et al. 2003, 2005; Friedman et al. 2009) (http:// www.targetscan.org/). Because the six miRNAs are not fully conserved, the predicted targets are not necessarily conserved either; therefore, we did not use conservation criteria to filter the predicted targets. Test of functional enrichment was implemented by DAVID (April, 2009) (http:// david.abcc.ncifcrf.gov/home.jsp) (Huang da et al. 2009) and the enrichment test was on Biological Process of all hierarchy in Gene Ontology (Barrell et al. 2009). Results Genomic Structure of an X-Linked miRNA Cluster A recent sequencing study based on small RNA libraries from human, mouse, and rat uncovered several previously unknown miRNAs (Landgraf et al. 2007), among which here we focused our analysis on six novel miRNAs physically clustered on the minus strand of human X-chromosome (Xq27.3). These miRNAs are hsa-mir-890, hsa-mir888, hsa-mir-892a, hsa-mir-892b, hsa-mir-891b, and hsamir-891a and altogether span a ;33-kb region. Their relative positions on human X-chromosome are shown in figure 1. The first four miRNA genes (hsa-mir-890, hsa-mir-888, hsa-mir-892a, and hsa-mir-892b) are tightly clustered within a 3-kb genomic region. This, together with the high sequence similarities among them (fig. 2), suggests that they resulted from tandem duplication events. Located 5# to this cluster (see fig. 1), hsa-mir-891b is ;3.8 kb away from hsa-mir-892b, whereas hsa-mir-891a is further distant from hsa-mir-891b, approximately 26 kb away on its 5# upstream. Both mir-891b and mir891a share sequence similarity between themselves (shown in fig. 2) but not with mir-890/888/892a/892b. The lack of sequence similarity suggests that they are unlikely to be derived from mir-890/888/892a/892b and they have experienced different evolutionary events that physically positioned mir-891b/891a onto such a miRNA-rich locus within the same transcription unit (Landgraf et al. 2007). We used sequence homology search combined with synteny mapping (see Materials and Methods) to unambiguously identify the orthologous regions of this miRNA cluster in related primates: human (Homo sapiens, hg18), chimpanzee (Pan troglodytes, panTro2), orangutan (Pongo pygmaeus abelii, ponAbe2), rhesus macaque (Macaca mulatta, rheMac2), and marmoset (Callithrix jacchus, calJac1). The phylogenic relationship of these species is shown in figure 1 (the upper panel) (Steiper and Young 2006). As the two subclusters encompassing mir-890/888/892a/ 892b and mir-891b/891a belong to two distinct phylogenetic classes (fig. 1), we performed synteny mapping separately for these two subclusters. Briefly, for the subcluster of mir-890/888/892a/892b, we searched for the ;3-kb region (fig. 1) against the genomes of chimpanzee, orangutan, rhesus macaque, and marmoset and found this region to be well conserved across the five primates, with .71% of the region alignable and with sequence identity .88%. The Human Chimpanzee Orangutan Macaque Marmoset 1 1 1 1 1 1 1 1 1 1 1 1 Presence in miRBase and absence in miRBase and newly identified. subsequent sequence homology search for these miRNAs was then confined to these syntenic regions. Similar procedures were also performed to search for the orthologs of hsa-mir-891b/891a. The mapped syntenic regions of the entire miRNA cluster are shown in supplementary figure S1, Supplementary Material online, in which the identified orthologous miRNA genes in each species are preserved in exactly the same order as in human genome. We noted that among the identified orthologous miRNAs in other primates, many had not been previously characterized (see table 1 for their current annotation status in miRBase, August 2009 and supplementary fig. S1, Supplementary Material online, for their genome coordinates). Moreover, we found the physical distances between neighboring miRNAs were also conserved across the primates as shown in figure 3A. For example, the distance between mir892a and mir-892b is constant across all five species at exactly 455 bp, suggesting strong selective pressure on this locus to maintain genomic integrity. For the region between mir-891b and mir-891a, as it spans ;26 kb, it is expected that it would have higher chance to accumulate uneven indels in different species. In contrast with the indels in other primates (chimpanzee, orangutan, and macaque), the genomic region between mir-891b and mir-891a in marmoset has been substantially expanded by extensive duplication and insertion events, leading to a ;1.5-fold increase in the physical distance between these two miRNA genes (compared with human). Further, with the identified orthologs of the six human miRNAs across primate lineage (we excluded mir-888 of chimpanzee because the orthologous locus was marked as Ns in the current assembly), phylogenetic analysis (see fig. 1) revealed that most of these miRNA genes had diverged before the speciation of the five primate species, and thus, they have evolved independently since the split of the New-World Monkey and the Old-World Monkey (ca. 43 Ma). mir-891a/891b in marmoset have a different pattern in the phylogenetic tree (fig. 1), possibly due to the relative short sequences (the precursor-miRNA sequences are usually ;77 bp) used in constructing the tree. Given the emergence time at ;43 Ma (the upper bound), we next sought to pinpoint the lower bound of the emergence time for the entire ;33-kb miRNA cluster region. We found the overall genomic structure of this entire miRNA cluster to be well conserved across human, chimpanzee, orangutan, and rhesus macaque, discernable in marmoset but unidentifiable in nonprimate mammals, including mouse, rat, dog, horse, cow, opossum, and platypus. We used the program BLAT when searching the closely related primate genomes (Kent 2002) and Blast for the nonprimate mammals. Note that the genome assemblies of these nonprimate species are of sufficiently high quality, so it was unlikely that this miRNA cluster was missed due to insufficient sequencing coverage. This finding, in line with the involvement of several primatespecific genomic rearrangement events in shaping this cluster structure (described below), suggests the entire miRNA cluster region emerged after the rodentprimate split (;75 Ma) but before the split of the New-World Monkey and the Old-World Monkey (;43 Ma) (Steiper and Young 2006). Lending additional support, further inspection revealed that the level of sequence conservation at single-nucleotide resolution was rather low, interspersed with the high level of conservation in some small fragmented regions. This trend is demonstrated in figure 3B, in which sequence conservation was quantified as UCSC phastCons scores derived from 17-way vertebrate genome comparison (Siepel et al. 2005). Functional Divergence between Homologous miRNAs Having investigated the conservation of the genomic structure on the entire miRNA cluster across primate species, we next studied the possible functional diversification of the individual miRNAs. Because miRNAs mainly function through the seed region of their mature sequence (position 28 on the 5# end) to recognize their target messenger RNAs (mRNAs), we used the pattern of nucleotide substitutions in the mature and seed regions to gauge their degree of functional diversification (Lewis et al. 2003, 2005; Bartel 2004, 2009; Doench and Sharp 2004). Figure 4 shows the nucleotide substitutions for each precursormiRNA sequence in the five primates: The mature miRNA regions are shown inside red windows, and the seed regions (position 28) are delimited by blue vertical lines; all sequences are based on mirBase annotation for the human miRNAs. Although figure 2 shows mir-890/888/ 892a/892b in human share substantial sequence similarity over the entire sequence length, figure 4 shows that the mature sequence and the seed region have diverged significantly among these miRNAs. By comparing the same miRNA gene across different primate species, it is clear that nucleotide substitutions occur frequently in regions outside of the mature sequence, but are substantially reduced in mature sequences and are rare in seed regions, indicating strong purifying selection on the functional regions. This is consistent with miRNA biogenesis in which precursor miRNAs are usually cleaved into single-stranded mature form to modulate gene regulation based on seed-mediated base pairing (Bartel and Chen 2004). It is also interesting to note that, for each of the six miRNAs, the seed regions are identical in human, chimpanzee, orangutan, and macaque but not in marmoset. This indicates the strongest purifying selection on the seeds, implying the molecular functions of these miRNAs are likely to be conserved across these species, except for marmoset. Indeed although each miRNA has some speciesspecific nucleotide substitutions, marmoset seems to have accumulated much more sequence changes than other primates, especially in the seed regions, revealing functional divergence between marmoset and other primates on the loci of mir-888, mir-892a, mir-891b, and mir-891a. Lending additional support, close examination of sequence divergence between marmoset and other primates further revealed that, in marmoset, nucleotide substitutions in nonmature regions might also have affected miRNA maturation and viability. As shown in figure 5A, we observed a 7-bp deletion on mir-888 in marmoset, which disrupts its predicted hairpin structure. Meanwhile, extensive nucleotide substitutions on mir-891b and mir-892b in marmoset also disrupted their hairpin structures (fig. 5B and C), which substantially reduced their thermodynamic stability. For other miRNAs (mir-890, mir-891a, and mir-892a), extensive nucleotide substitutions were not predicted to affect their hairpin structures in marmoset. Therefore, we can infer that the functional forms of mir890, mir-891a, and mir-892a are likely to be present in the common ancestor of all five primates. Meanwhile, it is likely that the orthologous sequences of mir-888, mir-891b, and mir-892b have experienced miRNA pseudogenization in marmosets, which resulted in disrupted hairpin structure. Alternatively, it is also possible that these sequences preserved their rudimentary forms in marmoset but independently gained miRNA functionality in other four catarrhine primates. Similarly, by searching for the six individual miRNA genes in the genomes of mouse, rat, dog, horse, cow, opossum, and platypus, only hsamir-892a had two significant hits in dog with sequence similarity around 51%. Importantly, these two putative orthologs all have accumulated substantial nucleotide substitutions on their putative seed regions (see supplementary fig. S2, Supplementary Material online), suggesting that they recognize and regulate different sets of target genes in dog even if they are functional miRNAs. FIG. 4. Alignment of the six miRNA genes across five primate species. The red boxes indicate mature miRNA regions; the seed regions are highlighted by two blue vertical lines within the box. Complex Evolutionary Dynamics on mir-890/888/ 892a/892b Complex Tandem Duplication Events Gave Rise to the miRNA Cluster. As we showed earlier, the entire cluster region can be phylogenetically and physically dissected into two subgroups; in the following, we analyze their evolutionary dynamics separately. For the ;3-kb region spanning from mir-890 to mir-892b (see fig. 1, encompassing mir890, mir-888, mir-892a, and mir-892b), as their structure is generally conserved across all five primates, we focused our investigation on the human genome. Our sequence analysis revealed complex evolutionary dynamics on this region, characterized by extensive small-scale segmental duplications and genomic rearrangements both outside and inside this ;3-kb region. Outside this region, there exist extensive interlocus duplication events. We detected that this ;3-kb subcluster region had undergone at least three rounds of small-scale segmental duplications to its 3# immediate downstream region (shown in Panel A of fig. 6), accompanied by the insertion of some repetitive elements FIG. 5. Structural analysis of three miRNA genes in marmoset. including long terminal repeats and Alu elements (member in Ya5 family). Alu Ya5 is a class of young Alu repeats specifically present in human (Carroll et al. 2001), its presence suggesting very recent evolutionary dynamics on this locus. For convenience, we label the sequence chunks involved in the duplication events as a, b, and c for those in the subcluster regions, and a#, b#, and c# for their counterparts in the immediate 3# downstream region (see fig. 6A). In addition, a nonalignable region (;300 bp) that is 5# upstream adjacent to c is labeled as d. We also detected extensive tandem duplication events inside the 3-kb subcluster region, in which d (;300 bp) is apparently a duplicate copy of the 3# end of c (;500 bp), whereas two concatenated copies of c formed a (;1.1 kb), a duplicate copy of a#. We then used C1 and C2 to designate the two duplicate copies of c in a, and used D1, D2 to designate the homologs of C1 and C2 in a# (see fig. 6B). Although the exact directions and steps for such complex tandem duplication events are unidentifiable based on available data, structurally c or d can be regarded as a building block that generated all the other miRNA genes within this region, 3# downstream from them. Rapid Turnover of the Functional miRNA Loci. As sequencing over ;250 small RNA libraries covering 26 different organ systems and cell types did not find any evidence supporting the existence of functional miRNA genes in a#, b#, and c# (fig. 6), there are two possible explanations and evolutionary scenarios for this. 1) This region originally harbored miRNAs duplicated from the 5# upstream regions (encompassing a, b, and c); however, they had since been degenerated. 2) The cluster of miRNAs in the 5# upstream region (encompassing a, b, and c) were gained after the duplication event. In the following analysis, we attempted to discern which one of these evolutionary scenarios is more likely. As the structure of this region is well conserved across the five primates we studied, the birth and death of miRNA (and their homologous) loci should date back before the emergence of their common ancestor. We next performed phylogenetic analysis on the sequence of C1, C2, D1, D2, d, c, and c# to infer the sequence turnover. All of these regions have a stretch of miRNA-like sequence in the 3# end, some being functional, whereas some not. To find an outgroup to root the phylogenetic tree, we Blasted sequences of C1, C2, D1, D2, c, c#, and d against all the sequenced vertebrate genomes, only finding 14 significant hits in horse (Equus caballus, equCab2 with 6.8X coverage) but none in rodents. The alignable regions in mouse are ;20 bp in length, much shorter than the ;300 bp of the query sequences. This situation is also true when we used the sequence of individual primate miRNA as Blast queries (see Discussion). Close examination of the horse genome further revealed that the 14 Blast hits are all located on an unanchored contig that cannot be localized to a chromosome. The putative orthologs in horse are interspersed within a region of ;25 kb in length, compared with only 6 kb in primates (3 kb in the miRNA cluster region and 3 kb 3# downstream). Figure 7A shows the phylogenetic relationship among these homologous sequences, using the horse sequence as an outgroup. Clearly, although sharing sequence similarity with the primate sequences, the horse sequences form a separate clade, indicating they had been highly divergent from primates. Thus, even if the miRNAs are still functional in horse, their sequence and thus molecular functions should have diverged significantly. As indicated in figure 6, mir-890, mir-888, mir-892a, and mir-892b are located on the 3# end of C1, C2, c, and d. Under maximal parsimony, it is obvious from figure 7A that the common ancestor of C1, D1, c, and d gained miRNA functionality, which was subsequently lost in D1. In the other clade, C2 independently gained miRNA functionality, which has evolved to mir-888 of the current form. It is worth noting that although evolving an miRNA site from random sequences is generally difficult, in our scenario, however, all sequences in comparison share substantial sequence similarity and the miRNA functionality might have been finally achieved by just changing very few nucleotides (not necessarily in seed regions) that turned the primitive miRNA-like sequences into functional ones. In this sense, the evolutionary processes shown in figure 7A were probable. We next examined the rate of such turnover on the actual miRNA sites. We aligned a and a# (see fig. 6) and then used a sliding window of 77 nt (the length of mir-890 and mir-888 in C1 and C2) to slide along the alignment of a and a#. By moving the sliding window base by base, we calculated the rate of nucleotide substitution within the window using Kimuras two parameter model. As shown in figure 7B, the two miRNA loci for mir-890 and mir-888 are among the top 5% of the sites showing the highest sequence divergence, especially for mir-888, suggesting rapid evolution on the miRNA loci. As mir-888 resides on the 3# end of C2, which gained miRNA functionality as shown in figure 7A, it implies rapid gain of the miRNA site. Conversely, D1 lost an miRNA site, so the high divergence on mir-890 loci suggests a rapid sequence loss on D1. It is also evident from figure 7B that sequences flanking the junction between a and b (after the sequence position ;1,050 onward as indicated in figs. 6A and 7B) also exhibits rapid evolution. Although the underlying reason remains to be further investigated, we speculated that such rapid evolution might be associated with the genomic rearrangement event that concatenated the sequence chunk a with b (see fig. 6). Evolution of mir-891b/mir-891a We next examined the other subcluster in the region consisting of two miRNA genes mir-891b and mir-891a (see figs. 1 and 8). This region is much longer than the one we examined above, extending over 30 kb. As shown in figure 8, our sequence analysis suggests that the two miRNA loci were generated by a duplication event (the blue bars in the lower panel of fig. 8), which is identifiable in all five primate genomes we studied (represented by human, rhesus macaque, and marmoset in fig. 8). Due to limited data, we could only identify the time but not the direction of the duplication event, that is, which region existed before the duplication. By cross-species comparison, we found an Alusq element at the 5# end of each duplicate copy (fig. 8). As Alu elements are mostly primate specific (Kapitonov and Jurka 1996; Carroll et al. 2001), we deduced that the duplication events should have occurred only in primate lineage, indicating at least one of the two miRNAs arose in the primate lineage. In addition, the Alusq elements are estimated to be active at ;44 Ma(Kapitonov and Jurka 1996; Hayakawa et al. 2001), which is consistent with the divergence time between marmosets and humans FIG. 8. Evolutionary dynamics of the subcluster region encompassing mir-891b and mir-891a. It is clear that these two miRNAs were derived from one duplication event. (;43 Ma, Steiper and Young 2006). It is likely that the duplicated miRNA copy is only present in primates after the emergence of New-World Monkey and not in the more primitive primates such as mouse lemurs (divergence time is around 77 Ma; Steiper and Young 2006). Although the overall structure of this region is conserved across the five primates, in marmoset, both regions encompassing mir-891b and mir-891a have been expanded; particularly the 5# upstream region containing mir-891a had extended to span 21 kb, compared with just 4.7 kb in human and macaque. Notably the amplification involves a series of tandem duplications of mir-891a as we detected six other full-length duplicate copies of mir-891a in marmoset by BLAT search. Because these miRNAs are not present in other primate species, the tandem duplication events must have occurred recently in the marmoset lineage. Direct Blast search using mir-891b and mir-891a as query did not yield any significant hit in other fully sequenced mammalian genomes. When using the ;4.7-kb region encompassing these two miRNAs as Blast query, we were able to find an miRNA-like sequence in the horse genome, which shares ;50% sequence identity with the two human miRNAs. No hits in other genomes were found. Such a low sequence similarity indicates that this miRNA-like sequence may not be a real functional miRNA in horse or at least have been substantially divergent from the miRNA in primates. Functional Analysis of the miRNAs We next investigated potential cellular functions of the miRNAs residing in the cluster region. Although the six miRNAs within this region are grouped into two phylogenetic classes with different evolutionary histories, it is possible that they may share similar physiological functions as they are within the same transcriptional unit and are predominantly expressed in human epididymis (Landgraf et al. 2007). Epididymis is a long, narrow, and singleconvoluted tubule through which spermatozoa is progressively matured as motile cells (Thimon et al. 2007, 2008). We surveyed the miRNA expression levels as relative clone frequency across ;250 small RNA libraries sequenced in a previous study (Landgraf et al. 2007); table 2 lists the relevant tissues or cell types, in which at least one miRNA has been cloned. Clearly two of the six miRNAs (hsa-miR892a and hsa-miR-891b) are specifically expressed in human epididymis and the other four are predominantly expressed in epididymis with some low expression in other tissue or cell types. Taken together, the observed high tissue specificity implies the six miRNAs might be highly specialized in regulating physiological functions of human epididymis. To better understand their molecular functions, we next predicted mRNA targets for each of the six miRNAs in human based on TargetScanS (Lewis et al. 2003, 2005; Friedman et al. 2009) and performed computational functional analysis on these putative miRNA targets. Due to physical proximity of these miRNAs, they have been thought of being in the same transcription unit (Landgraf et al. 2007). Therefore, we pooled together the predicted targets of individual miRNAs as a single group of genes under coordinated regulation by the six miRNAs and then examined functional enrichment for this gene set. Firstly, we found in general, these target genes are enriched in similar functional categories as the protein-coding genes that are known to be expressed in epididymis (Thimon et al. 2007), including cellcell adhesion (false discovery rate [FDR] 5 1.3 10 6) and metal-ion transportation (FDR 5 3 10 3), which are important in maintaining miR-890 miR-888 miR-892a miR-892b miR-891b miR-891a Pituitary 0.1 0 0 0 0 0 epididymal luminal environment (Cornwall 2009). Secondly, we observed the gene list is also enriched for anatomical structure development (FDR 5 7 10 16), system development (FDR 5 1.6 10 15), tissue remodeling (FDR 5 0.08), and morphogenesis of an epithelium (FDR 5 0.1), suggesting these miRNAs might contribute to the morphogenesis of epididymis. Further lending support to this notion, we found the target genes are also enriched for tube development (FDR 5 0.08), consistent with the overall epididymal morphology. In addition, cell motility is also one of the enriched functional classes (FDR 5 6 10 4), indicating potential roles of the miRNAs in regulating mobilization of nonmotile sperm cells produced in testis, which is one of the primary functions of epididymis. Indeed, we found several predicted target genes to be associated with sperm-related functions, including SPAG6 (sperm-associated antigen 6), which plays an essential role in regulating sperm flagella motility and in maintaining structural integrity of mature sperm (Sapiro et al. 2002). This gene was confidently predicted to be regulated by hsa-miR-888 with a perfect 7-mer seed match, ranking 97% among all the putative targets (TargetScanS context score is 0.46). Similarly SPAG1 (sperm-associated antigen 1; Liu et al. 2006), an infertility-related sperm protein, is also predicted to be regulated by hsa-miR-888 (perfect 7-mer seed match with context score 0.3, ranking 87% of all putative targets). Likewise, SPAG8 (sperm associated antigen 8; Liu et al. 1996) and RSBN1 (round spermatid basic protein 1, which is the basic protein structure of sperm; Takahashi et al. 2004), were predicted to be regulated by hsa-miR-892a, whereas hsa-miR-890 was predicted to regulate TEDDM1 (transmembrane epididymal protein 1). Collectively, our functional analysis suggested multiple physiological roles of these miRNAs in epididymis, specifically in regulating epididymal morphogenesis and sperm maturation. In this paper, we carried out detailed characterization of a novel cluster of six epididymis-specific miRNAs on primate X-chromosome. With our orthology assignment, we unambiguously identified several new miRNAs in other primates (see table 1) that had not been characterized before. Our analysis revealed that the genomic structure of this miRNA cluster was well conserved within primates but was virtually absent in any other mammalian genomes, including mouse, rat, cat, dog, cow, horse, opossum, and platypus. This finding, combined with the involvement of several primate-specific genomic rearrangement events (such as Alu insertions and primate-specific segmental duplications) in shaping this cluster region, indicated that the evolution of this genomic region into the current form was only during the primate lineage. However, as some sequences within this cluster region share homology, though highly divergent, with regions in other mammalian species (such as horse in our analysis), the emergence of this cluster in primates was likely not completely de novo but likely being tinkered and tailored from some ancestral sequences by extensive genomic rearrangement events and rapid sequence evolution. This is particularly the case for the six miRNA genes within this region. One miRNA, mir-892a, indeed showed detectable but highly divergent homology with two miRNA-like sequences in the dog genome (see supplementary fig. S2, Supplementary Material online). When analyzing the evolution of mir-890/888/892a/ 892b, we identified a homologous region on the immediate 3# downstream of the miRNA cluster; several miRNA-like sequences reside in the downstream region but have no evidence indicating they are functional. Using a maximum parsimony approach, we were able to reconstruct the evolutionary history for miRNA birth and death, which, in our case, is a good example illustrating different fates of genes derived from duplication events. The gain of miRNA mir-888 in one region (C2) indicates neofunctionalization, whereas the loss of the miRNA site in another region (D1) suggests nonfunctionalization. For mir-890, 888, 892a, and 892b, which share substantial sequence homology but have divergent seed sequences (see fig. 4), their coexpression (in the same transcriptional unit) specifically in the same tissue suggests the common ancestor of the four miRNAs might have been subfunctionalized. The observed rapid evolution at two miRNA loci as indicated in figure 7, combined with the fact that all the miRNAs are predominantly expressed in primate reproduction system whose traits are directly relevant to organismal fitness, implicates functional advantage of the miRNAs for primates. Indeed, human epididymis show several unique features compared with other nonprimate mammals (Cornwall 2009), including its morphology (such as less developed cauda region and poor sperm reservoir storage, Bedford 1994) and function (such as rapidly spermatozoa processing and efficient biochemical modification of sperm maturation; Turner 1995), which are consistent with the predicted roles that the miRNAs might play in regulating epididymal functions. Given currently the lack of studies on human epididymis, even less in human epididymal miRNAs, our analysis provides potential molecular basis explaining the epididymal differences between primate and nonprimate mammals. As a future direction, expression profiling of the six miRNAs in human epididymal caput, corpus, and cauda might be necessary to detect their region-specific expression, which is a common feature of many protein-coding genes in establishing epididymis physiology (Thimon et al. 2007). In addition, whole-genome expression profiling upon in vivo knockdown (and knock-in) of the six miRNAs in human epididymis samples might provide mechanistic clues of their primate-specific physiology. Supplementary Material Supplementary figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Acknowledgments This work is funded by a grant from Canadian Institutes of Health Research (CIHR), Grant number FRN 79302.


This is a preview of a remote PDF: https://mbe.oxfordjournals.org/content/27/3/671.full.pdf

Jingjing Li, Yu Liu, Dong Dong, Zhaolei Zhang. Evolution of an X-Linked Primate-Specific Micro RNA Cluster, Molecular Biology and Evolution, 2010, 671-683, DOI: 10.1093/molbev/msp284