Recently transposed Alu repeats result from multiple source genes

Nucleic Acids Research, Oct 1990

A human Alu repeat subfamily (the PV subfamily) whose members include insertional polymorphisms is found, as predicted, to differ by five tightly linked mutations relative to another subfamily of recently inserted Alu repeats. Based on these sequence differences some of the small number of polymorphic Alus can be selected from the background of nearly one million member sequences which are fixed in the human genome. Shared patterns of mutations suggest that PV subfamily members are the progeny of several different founder sequences. The additional observation that all members ofthe PV subfamily end in a stretch of uninterrupted polyadenine residues rather than merely A-rlch sequences is evidence for post-transcriptional polyadenyiation of the presumptive RNA intermediate. The drift of polyadenine sequences toward tandemly repeated A-rich motifs suggests a biological function that may select for the fixation of dispersed Alu repeats.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/18/20/6019.full.pdf

Recently transposed Alu repeats result from multiple source genes

Nucleic Acids Research Recently transposed Alu repeats result from source genes multiple A.Gregory Matera 0 Utha Hellmann 0 Mary F.Hintz 0 Carl W.Schmid 0 0 Department of Chemistry, University of California , Davis, CA 95616 , USA A human Alu repeat subfamily (the PV subfamily) whose members include insertional polymorphisms Is found, as predicted, to differ by five tightly linked mutations relative to another subfamily of recently inserted Alu repeats. Based on these sequence differences some of the small number of polymorphic Alus can be selected from the background of nearly one million member sequences which are fixed in the human genome. Shared patterns of mutations suggest that PV subfamily members are the progeny of several different founder sequences. The additional observation that all members of the PV subfamily end in a stretch of uninterrupted polyadenine residues rather than merely A-rlch sequences is evidence for post-transcripttonal polyadenytation of the presumptive RNA intermediate. The drift of polyadenine sequences toward tandemly repeated A-rich motifs suggests a biological function that may select for the fixation of dispersed Alu repeats. INTRODUCTION The human Alu family of repetitive DNA elements consists of about 0.5 million dispersed members sharing a recognizable 280 nt consensus sequence (Reviews 1—3). Structural evidence strongly suggests that these repeats are retroposons dispersed via an RNA intermediate, but the putative transcript has, until recently, been either elusive or controversial ( 1 —4 ). Moreover, most human Alus are essentially immobile. Many members predate human and ape or even monkey divergence ( 1—3 ). Scrutiny of the many Alu sequences accumulated in data banks have led to essentially identical conclusions by five independent groups ( 5 - 9 ) : Alu repeats can be segregated into recognizable subfamilies, each of which is presumably the progeny of a distinct set of founder sequences. The different degrees of divergence exhibited by the members of each subfamily suggests that they appeared at different evolutionary times. Confirming this interpretation, the few human Alus which are known to be polymorphic insertions into the human genome belong to a subfamily which is evolutionarily the most recently inserted, as judged by the sequence homogeneity of its members. Although five independent groups agree on these broad conclusions, there are some minor differences of opinion regarding the exact sequence assignment of each subfamilyTFof ouf present purposesT the consensus sequence of the subfamily identified by Deininger and Slagel ( 10 ), which is also identical to the 'precise' subfamily defined by Britten et al. ( 11 ), is most informative. This definition can be regarded as a consensus of the several versions proposed for the recent Alu subfamily ( 5 — 10 ). The conclusions of this oD investigation do not depend on the choice of consensus sequence. nw Previous sequence comparisons suggested that two laod polymorphic human Alu repeats might differ by five shared ed mutations from the precise subfamily consensus sequence, thereby fro constituting another discernable subfamily ( 3,10 ). This hm interpretation has been partially confirmed by using an :ttp oligonucleotide hybridization probe that incorporates two of these //an five putative diagnostic differences (4). One Alu repeat selected .ro by library screening under stringent hybridization with this probe fox is associated with a 300 bp restriction fragment length jrod polymorphism (RFLP). While not yet decisively proven, this rnu oRfFLvPariaalnmtosAtlcuertsaeiqnulyenrceesusltshafsro mrecaenntAlylu einxspearntidoend. Tinhishgurmouanp .lrsago evolution (i.e. comparing human to chimpanzee and gorilla) and /b y corresponds to one or more transcriptionally active source genes ug ( 4 ). tse o n A p r i l 2 4 , 2 0 1 6 The five mutations which distinguish these Alu variants from the recently fixed precise subfamily should be tightly linked. Moreover, independently isolated polymorphic Alus should contain these same tightly linked mutations. Here we test and confirm both predictions by sequence analysis. For simplicity, we shall refer to Alus which share these linked mutations as predicted variant (PV) Alus. In predicting this subfamily, we recognize that some PV Alus are fixed in the human genome, possibly predating ape and human divergence, and that not all polymorphic Alus need belong to this group (see Discussion). Rather, these PV Alus result from recognizable source genes that have been active in recent human evolution. MATERIALS AND METHODS Library construction and screening A human X DASH genomic library (Stratagene) was screened using an Alu oligonucleotide hybridization probe GM-002 at high stringency (67 °C and 5XSSPE) ( 4 ). This stringency corresponds to the melting temperature of an exactly paired duplex with GM-002. GM-002, which is shown below, incorporates two of the five diagnostic mutations of the proposed PV subfamily. Sequences of the resulting clones PV71, PV83, and PV93, are reported in the Results section. Poly A plus RNA from a human hydatidiform mole-was isolated rising the 'Fast-Track' mRNA isolation kit (Invitrogen). A Not VEco Rl-linked 1 GT10 cDNA 10 20 30 40 50 60 70 GGCCGGGCGC GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA TCACGAGGTC Truncated GGCCGGGCGC GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA TCACGAGGTC 80 90 100 110 120 130 140 AGGAGATCGA GACCATCCTG GCTAACACGG TGAAACCCCG TCTCTACTAA AAATA-CAAAA AATTAGCCGG c c c c c. T c. ... . . . A . . c. ... . . . A . . 150 160 170 180 190 200 210 GCGTGGTGGC GGGCGCCTGT AGTCCCAGCT ACTCGGGAGG CTGAGGCAGG AGAATGGCGT GAACCCGGGA -...A. .A.. ..-...=. T PV Consensus Cl INH PV Consensus Cl INH Precise MM PV92 .A..-. .CT. 220 230 240 250 260 270 280 GGCGGAGCTT GCAGTGAGCC GAGATCGCGC CACTGCACTC CAGCCTGGGC GACAGAGCGA GACTCCGTCT C Precise GGCGGAGCTT GCAGTGAGCC GAGATCCCGC CACTGCACTC CAGCCTGGGC GACAGAGCGA GACTCCGTCT C PV Consensus library (insert size 50Ont and larger) was constructed and amplified from this RNA by Invitrogen, Inc. The library was screened using standard procedures ( 4 ) with end-labeled oliognulceotide GM-002 at 60°C in 5xSSPE. Clone PV 6 was isolated from this library and subcloned as described below. Plasmid construction and DNA sequencing Plasmid subclones of Alu family members PV 83 and PV 71 were prepared as follows: a 2.0kb Bst YI and a 1.3kb Hinc WBst YI fragment from PV 83, a 1.7kb Dra VBst YI and a 0.5kb Dra VBst YI fragment from PV 71. PV 6 was subcloned into pUC TATGAGATTA GAACACTAC AGGCTGAGAT GAAAATGT GGCTACAGAG GAAAGAA CTTCNCAAAA GAAGACATTTATG GCCTTCTCAA GAAGGGCCTTCT [ALU] (A) 22 GAACACTAC ACATTACTGA [ALU] (A) 31 GAAAATGT [ALU] (A)28 GAAAGAA CATCTGTCCT TTCCCTCTCT [ALU] (A) 29 GAAGACATTTATG CAGCCAAAAA [ALU] (A) 20 GAAGGGCCTTCT AAAACTCCTC GAGGCTTAAA GAr.TACAAGAACT TTCG *ALU] (A)i5 GACTACAAGAACT AAAATACTGG G£ABB(ii)x [ALU](A)n GiA£B(H)x TPA Mlvi PV 92 PV71 APO PV 83 Consensus 19 from a X GT10 cDNA clone as a 650bp Eco RI insert. A 2.4kb Hind HI subclone of the region flanking the human Apolipoprotein A1-C3-A4 gene cluster was a generous gift of Dr. S. Karathanasis ( 12,13 ). A 0.6 kb Sma I fragment was subcloned from the original Hind HI subclone to obtain sequence overlaps. Sequencing was performed on pUC dsDNA with Sequenase 2.0 (USB) using forward, reverse and three different Alu oligonucleotide primers (GM-002, 5'-ATCGAGACCATCCCGGCTAAAA-3'; GM-003, 5'-GGTTTCACCGTTTTAGCCG-3'; GM-004, 5'-GCAGTGAGCCGAGATCC-3' [Synthecell, Inc]). Alignment of sequences was accomplished using the MicroGenie data analysis software (Beckman). RESULTS PV Alus share diagnostic sequence features The base sequences of polymorphic AJus mapping within the Mlvi-2 and TPA loci, as well as that of clone PV 92, have been previously published ( 4,14,15 ) (Fig. 1). As mentioned, these Alus share five differences with respect to the precise subfamily sequence. Several investigators have described an Alu insertiondeletion polymorphism mapping 5' to the human apolipoprotein Al gene ( 12,13,16 ). The base sequence of a subclone of this region identifies the presence of an Alu repeat that includes all five diagnostic mutations (APO, Fig. 1). A comparison of the restriction sites within this Alu repeat to those used to map the length polymorphism confirms that the observed polymorphism results from an insertion of this Alu repeat. Four Alu repeats designated ' P V in Figure 1 were selected by library screening under stringent hybridization conditions with an Alu oligonucleotide (GM-002, Materials and Methods). PV 92 is a probable inseitional polymorphism in the human genome and is absent from chimpanzee DNA, as judged by restriction fragment length polymorphisms ( 4 ). PV 83 Alu is truncated, having an seventeen nucleotide deletion on its 5' end, but is bounded by direct repeats. Thus, PV 83 presumably results from the insertion of an incomplete cDNA. Interestingly, primer extension of Alu RNA was also found to prematurely terminate at this position ( 4 ). This Alu is fixed in a panel of eight humans ( 4 ). PV 6 is an incomplete cDNA clone in which the Alu is inverted with respect to the sense of the original transcript. The end of the cDNA occurs at position 270 so that ten terminal nucleotides are missing, none of which are diagnostic (Fig. 1). Six of the seven predicted variant Alus share all five diagnostic mutations (Fig. 1). PV 83 has four of the diagnostic mutations; the only exception, G at position 145, matches the precise subfamily consensus. While^heseAlus^harethe^ive diagnostic mutations, there are clear differences between members (Fig. 1). The average pairwise divergence of these Alus is approximately seven nucleotides; they differ by an average of four mutations from the PV consensus. Based on estimates of the fidelity of reverse transcriptase and the mutational rate of non-selected DNA, it is unlikely that these sequences acquired all of their observed differences post-transcriptionally (17,18). Some of the differences among these Alus might result from sequencing artifacts; however, the diversity of these newly inserted Alus is consistent with the possibility of several distinct source genes. The non-random pattern of mutations is a stronger indication that discernable subgroups were encoded by distinct source genes. As the best example, TPA and PV 71 Alus share three differences: C at position 123, a deletion at position 134 and a T at position 166, relative to the other PV subfamily members (Fig. 1). Two other possible subgroupings are Mlvi and PV 92 which share a deletion at position 157, and PV 71, PV 83, and PV 6 which share a deletion at position 148 (Fig. 1). However, a single deletion, particularly one occurring near a GC rich run, is not as convincing as the three mutations linking the TPA and PV 71 subgroup. The suggestion that several distinct source genes generated newly inserted Alu repeats is strengthened by the existence of an Alu (Cl INH) that clearly belongs to the precise rather than the PV subfamily (Fig. 1, 19). D o w n l o a d e d f r o m h t t p : / / n a r . o x f o r d j o u r n a .l s o r g / b y g u e s t o n A p r The 3' ends of PV Alus are polyadenylated il2 4 The 3' ends of Alu repeats are typically A-rich suggesting that ,2 0 they, like the 3' ends of other retroposons, result from post- 1 6 transcriptional polyadenylation of an RNA intermediate ( 1—3 ). However, most Alu repeats do not have the usual recognizable polyadenylation signals, and their 3' ends are not exclusively adenine but include other nucleotides often arranged in simple repeating structures ( 1, 2 ). Figure 2 compares the ends of PV Alus These heterogeneouslength sequences are composed exclusively of adenine; exactly the type of products expected for polyadenylation of RNA intermediates. The Cl INH Alu inseitional polymorphism which does not belong to the PV subfamily (Discussion) also ends in a run of forty-two A's (19). Thus, the simple repeats in the A -rich 3' ends of other Alu members must result from postinsertional events and are not encoded by the Alu source gene(s). These findings complement the observation of a high degree of polymorphism associated with the poly A tail of fixed Alu repeats (20). PV Alus insert at a defined target sequence The direct repeats surrounding Alu repeats are typically also A-rich ^nd-often-include similarity to the S' end of the Alu insert ( 1 ). However, because of their mutational divergence and similarity to the simple repeating structures within the A-rich 3' ends, the exact structure of the direct repeats is often ambiguous ( 1 ). All direct repeats surrounding PV Alus include the sequence 5'-GANx-3'(Fig. 2). At present, we do not know the molecular basis for this striking target site specificity. DISCUSSION Sequence comparisons have compelled the conclusion that Alu repeats can be segregated into subfamilies resulting from distinct source genes ( 5 - 9 ) . Further, the shared sequence features of two previously known polymorphic Alus strongly indicates that the currently active source gene(s) will have five distinct base substitutions compared to the most recently fixed subfamily ( 3,4,7,9,10 ). This hypothesis is experimentally tested and confirmed by two complementary approaches. A polymorphic Alu mapping near the apolipoprotein gene cluster is found to have the five linked base substitutions. Diagnostic polymorphic restriction cleavage sites reported for this insertional polymorphism ( 12,13,16 ) originally drew our attention to the possibility that a PV Alu was responsible for the polymorphism. Conversely, using an oligonucleotide that incorporates two of the five diagnostic mutations to isolate other PV Alus, we find that all five mutations are tightly linked. As reported by Matera et al. (1990), at least one of the four PV Alus described here is associated with a human DNA length polymorphism (4). Thus, we can selectively enrich for a small number of newly inserted Alus against a background of nearly one million Alu repeats which are fixed in the human genome. When compared to the sequence diversity of the entire human Alu family, these new Alus obviously result from a select source gene or genes. Although sequence differences between PV subfamily members might reflect multiple source genes, these differences might also result from an error prone insertion mechanism, random drift or simple sequencing errors. The present finding that PV Alus can be further resolved into recognizable sub-subfamilies strongly supoorts the existence of multiple but closely related source genes. This interpretaion is confirmed by the existence of a polymorphic Alu that closely matches the precise subfamily concensus sequence and does not share any of the tightly-linked PV Alu mutations (19). It is notable that 4.5S RNA, an Alu homolog in rodents is encoded by multiple genes that are arranged in tandem (21). However, there is but a single locus for another Alu homolog, the human 7SL RNA gene (22). Rosenberg et al. (23) recognized that a consensus sequence for a repeat sequence family is merely an average, which need not exactly match any individual member sequence. Remarkably, the APO Alu differs by only a single nucleotide from the PV consensus sequence. The PV consensus sequence identified here is likely an exact match to one of the source genes encoding new members of this subfamily. A-rich 3 ' ends of Alu repeats are often highly structured, consisting of tandem repeats of simple A-rich elements ( 1, 2 ). This observation and the absence of conventional polyadenylation signals prompted the suggestion that the A-rich ends might be encoded by Alu genes (2). In contrast, 3 ' ends of predicted variant Alus are composed of variable-length tracts of pure adenine, which suggests their addition via a post-transcriptional mechanism. Tandemly repeated elements which are present in the 3 ' A-rich ends of many Alu repeats must form subsequently to their post-transcriptional polyadenylation. Tandem A-rich elements in many instances resemble the short direct repeats flanking the Alu repeats and, apparently, 'diffuse' into the Arich tract from the 3' end ( 1 ). This interpretation is confirmed by the additional observation that many Alu tails begin with a 5' run of pure A's and end with a 3' region consisting of tandemly repeated A-rich elements (20). A high frequency of length polymorphism is associated with the A-rich 3' ends of individual Alu repeats but these length polymorphisms are stably inherited (20). Evidently, the polyadenine tracts are stabilized by acquiring other nucleotides through some common mechanism. Another issue is whether the dispersed Alu repeats have a function. We speculate that the A-rich tails of some Alu repeats may serve as transcription terminators. A-rich regions are known Pol II transcription terminators; complementary T rich regions are thought to be both Pol U and Pol HI terminators (24,25). Human 5S genes are interrupted by an inverted Alu repeat providing a probable Pol IH transcription terminator (26). Similarly, the A-rich transcription terminator of sea urchin histone genes closely resembles the A-rich 3' ends of numerous Alu repeats (27). The observation that an interspersed Alu blocks transcriptional interference is also consistent with this speculation (28). One problem in reconciling short interspersed repeats (Sines) with a biological function has been that entirely different Sine families populate divergent mammalian genomes ( 1,2 ). Presumably, the exact sequence of the dispersed repeat is not essential to whatever, if any, function it serves. Our present speculation also circumvents this difficulty. Selection for A-rich as well as complementary T-rich transcription termination signals between transcription units might account for the ubiquity of mammalian Sines. Human Alus merely provide one selected pathway to disperse such A-rich elements. ACKNOWLEDGEMENTS We thank Dr. S. Karathanasis for helpful discussions and for the apolipoprotein gene subclone and Dr. J. Gatewood for RNA isolation. This research was supported by USPHS grant GM 21346. D o w n l o a d e d f r o m h t t p : / / n a r . o x f o r d j o u r n a .l s o r g / b y g u e s t o n A p r i l 2 4 , 2 0 1 6 16. Lim, D., Coleman, R. T., Assmann, G. and Frossard, P.M. (1986) Am. J. Hum. Genet., 39, abstr. 621. 17. Ricchetti, M. and Bue, H. (1990) EMBOJ., 9, 1583-1593. 18. Nei, M. (1987). In Molecular Evolutionary Genetics. Cohimbia Univ. Press, New York, p. 267. 19. Stoppa-Lyonnet, D., Carter, P.E., Meo, T. and Tosi, M. (1990) Proc. NatL Acad Set. USA, 87, 1551-1555. 20. Economou, E.P., Bergen, A.W., Warren A.C. and Antonarakis, S.E. (1990) Pwc.Natl. Acad. Sd. USA, 87, 2951-2954. 21. Schoeniger, L.O. and Jelinek, W.R. (1986) Mol. Cell. Biol., 6, 1508-1519. 22. Ullu, E. and Weiner, A.M. (1985) Nature, 318, 371-374. 23. Rosenberg, H., Singer, M. and Rosenberg, M. (1978) Science, 200, 394-402. 24. Proudfoot, N.J. (1989) TIBS, 14, 105-110 . 25. Bogenhagen, D.F. and Brown, D.D. (1981) Cell, 24, 261-270. 26. Little, R.D. and Braaten, A.C. (1989) Genomics, 4, 376-383. 27. Briggs, D., Jackson, D.,Whitelaw, E. and Proudfoot , N.J. (1989) Nucl. Acids Res., 17, 8061-8071 . 28. Wu, J., Grindlay, J., Bushell, P., Mendelsohn, L. and Allan, M. (1990) Mol. Cell Biol., 10, 1209-1216. 1. Schmid , C.W. and Shen , C.-K. J. ( 1985 ). In Maclntyrc, R J . (ed.) Molecular Evolutionary Genetics . Plenum, New York, p. 323 - 358 . 2. Weiner , A.M. , Deininger , P.L. and Efctranadis , A. ( 1986 ). In Annual Review of Biochemistry. Annual Reviews , Inc., Palo Alto , Vol. 55 , p. 631 - 661 . 3. Schmid , C.W. , Deka , N. and Matera , A.G. ( 1989 ). In Adolph, K. W. (ed.) Chromosomes: Eukaryotic, Prokaryotic and Viral . CRC Press, Inc., Boca Raton , Vol. I, p. 3 - 2 9 . 4. Matera , A.G. , Hellmann . U. and Schmid , C.W. ( 1990 ) Mol . Cell Biol ., accepted. 5. Willard , C.W. , Nguyen , H.T. and Schmid , C.W. ( 1987 ) 7 . Mol . Evol., 26 , 180 - 186 . 6. Quentin , Y. ( 1988 ) J. Mol . Evol., 27 , 194 - 202 . 7. Slagel , V. , Fleminton , E. , Traina-Dorge , V. , Bradshaw , H. , Deininger , P. ( 1987 ) Mol . Biol. Evol., 4 , 19 - 29 . 8. Jurka , J. and Smith , T. ( 1988 ) Proc . Nat. Acad. Set. USA , 85 , 4775 - 4778 . 9. Britten , R.J. , Baron , W.F. , Stout , D.B. and Davidson , E H . ( 1988 ) Proc . Nat. Acad. Sci. USA , 85 , 4770 - 4774 . 10. Deininger , P.L. and Slagel , V.K. ( 1988 ) Mol . Cell. Biol., 8 , 4566 - 4569 . 11. Britten , R.J. , Stout , D.B. and Davidson , E.H. ( 1989 ) Proc . Nat. Acad. Sci. USA , 86 , 3718 - 3722 . 12. Antonarakis , S.E. , Chakravanti , A. , Halloran , S.L. , Hudson , R.R. , Feissee , L. and Karathanasis , S.K. ( 1988 ) Hum . Genet. 80 , 265 - 273 . 13. Mietus-Snyder , M. , Charmley , P. , Korf , B. , Ladias , J.A.A. , Gatti , R.A. and Karathanasis , S.K. ( 1990 ), manuscript subrrutted. 14. Economou-Pachnis , A. and Tsichlis , P.N. ( 1985 ) Nucleic Acids Res ., 13 , 8379 - 8387 . 15. Degen , S.J.F. , Rajput , B. and Reich, E. ( 1986 ) /. Biol. Oiem., 261 , 6972 - 6985 .


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/18/20/6019.full.pdf

A.Gregory Matera, Utha Hellmann, Mary F. Hintz, Carl W. Schmid. Recently transposed Alu repeats result from multiple source genes, Nucleic Acids Research, 1990, 6019-6023, DOI: 10.1093/nar/18.20.6019