Micropia-Dm2, the nucleotide sequence of a rearranged retrotransposon from Drosophila melanogaster
Descrição do Produto
Nucleic Acids Research, Vol. 18, No. 14 4265
k.) 1990 Oxford University Press
Micropia-Dm2, the nucleotide sequence of a rearranged retrotransposon from Drosophila melanogaster Dirk-Henner Lankenau and Wolfgang Hennig Department of Molecular and Developmental Genetics, Katholic University, Toernooiveld, NL-6525 ED Nijmegen, The Netherlands EMBL accession no. X14173
Submitted June 7, 1990
Micropia-Dm2 belongs to a group of four sequenced micropia elements. It originates from a screen with micropia-DhMiF2A as a radioactive probe on a lambda genomic clone bank of
Drosophila melanogaster (2, 3). The sequence features are very similar to those of micropia-Dml 1, however, some major rearrangements have taken place. A general description of all micropia elements has been published (4). Here we represent the complete nucleotide sequence of micropia-Dm2 and about 500 bp of 5' flanking sequence. The internal sequences of the copia insertion have been partially sequenced (4), but data are not shown here. Forty-two sequence features are indicated within the sequence. The numbers at the left border of the sequence refer to these features and are described below.
Description of important sequence features: 1. EcoRI site, end of micropia insert 2. TATA-box 3. Cap-site 4. Poly adenylation signal 5. RNA termination signal 6. Simple tandem repeat 7. Inverted repeat of 3' end of 5' LTR 8. Leu tRNA primer binding site 9. Initiation codon Met of MHC-class I-like (similar) protein gene
10. 11. 12. 13. 14. 15. 17. 18. 19. 20. 21. 22. 23.
Insertion of copia, 5' target site duplication Inverted repeat of 5' end of 5' LTR of copia Copia position 4768 (Emori et al. 1985) Purine rich region, plus strand primer binding site of copia 5' inverted repeat of 3' LTR of copia & 16. Promoter: 5' start sites of cellular 2- and 5-kb RNA, shown on 3' LTR of copia Polyadenylation signal of copia Simple tandem repeat in copia 3' inverted repeat of 3' LTR of copia Insertion site of copia, 3' target site duplication, sequence belongs to micropia MHC-like protein gene CCHC-finger motif of retroelements, two repeats Amino terminal end of putative protease Carboxy terminal end of putative protease
24. 25. 26. 27.
Insertion site of retroposon, 5' target site duplication 5' end of retroposon, inserted into micropia 3' oligo(A) tail of retroposon insertion Insertion site of retroposon, 3' target site duplication, sequence belongs to micropia. This region between the protease and reverse transcriptase has been described as 'undefined' peptide (3). Recently a significant homology to this region has been identified in the vaccinia virus genome
(5).
28. Amino terminal end of putative reverse transcriptase coding region 29. Carboxy terminal end of reverse transcriptase 30. 5' end of so called 'tether' 31. 3' end of 'tether' 32. Amino terminal end of integrase. 33. Carboxy terminal end of RNaseH 34. Amino terminal end of putative integrase 35. Deletion in micropia-DM11 36. Deletion in micropia-DhMiF2A 37. 3' end of open reading frame and 5' end of 5' non-protein
coding region 3' conserved tandem repeats, 5 units Putative 5' (+ strand) primer binding site 5' inverted repeat of 3' LTR First tandem repeat unit and second truncated unit of LTR, potential ecdysteroid receptor binding sites are indicated by gray shading; micropia-Dm2 abruptly ends at this site 42. 5' flanking sequence of micropia
38. 39. 40. 41.
REFERENCES 1. Emori,Y., Shiba,T., Kanaya,S., Inouye,S., Yuki,S. and Saigo,K. (1985) Nature 315, 773-776. 2. Huijser,P., Kirchhoff,C., Lankenau,D.-H. and Hennig,W. (1988) J. Mol. Biol. 203, 689-698. 3. Lankenau,D.-H., Huijser,P., Jansen,E., Miedema,K. and Hennig,W. (1988) J. Mol. Biol. 204, 233-246. 4. Lankenau,D.-H., Huijser,P., Jansen,E., Miedema,K. and Hennig,W. (1990)
Chromosoma 99, 111-117.
5. Slabaugh,M.B. and Roseman,N.A. (1989) Proc. Natl. Acad. Sci. USA 86, 4152-4155.
4266 Nucleic Acids Research, Vol. 18, No. 14 RETROTRANSPOSON micropla-Dm2 2?,1_GAATTCATGG 3 AATTGAGAAG
CGCTGCTTGG TTAGAAAGGG AAAACTATAT AATGAAAAGG GAATGCCAAA AACTGGAGTG AGGdGCATTAA TCGTGGAGAA
AGACAAAGCA_GGCTGCACGA
60 120
4 GCCAAAGCAG ACGCAAGTGG ACTCGTTGAC TGCGCACAGC TGCATAAAAT TATATAGTA6 180 5,6 AAAGAGATTT. GAGCGACGCT GAATA_TGq__C q ~ q~!(If_:iGA(i0CCC CTGATATTCT 2 40 fl ACACCCTGCA TTTTCTGAGG ATCAGTGGTG 300
T-AACCCGACATCFAGAAGTGG_GATCTGTGCC GAAATGCAAA ATCGGAATTT
'g§TGCAGTAGTG+ iM GCCGGCACGT
GGCTGAACTT GTAAAAATCA TGCAAGTGAC
360
GAGCAATGTT .GGAATATACT ATTCAACCTA CAAAAATAAC GTTAAACAAC 4 20
-ACTACTTTAA TTTGaTATAA TGGCCRRRRR
RRRRRRRRRR RRRRRRRRRR RRRRRRRRRR
4 80
internal sequence of copia 12 RRRRRRRRRR RRGATCAGTT AACACTTCCC AGAATGCACA CCACCCACAT TTGATAGTTA 540
1314
CTAATGkATA TTATTGTTAT GTTTTTAATT ATAGACGTTA TTTTTGA GG GGCGTGTTGG 600 AATATACTAT TGAAGGTACA AAAATAACGT TAAACAACAC TACTTTATAT TTGATATG AA 660
15 TGGCCACACC TTTTATGCCA TAAAACATAT TGTAAGAGAA TACCACTCTT_TTTATTCCTT 720 15,16 CTTTCCTTCT TGTACGTTTT TTGCTGTGAG TAGGTCGTGG TGCTGGTGTT GCAGTTGAAA 780 j Pf ff g S" XRXUYkdXt"AAACTCAAA CATAAACTTG ACTATTTATT TATTTATTAA 840 T T ATACAAAGCAACCTAG TTAT~GATGTT AAGCTACC6CA 900 1920 ~ ~ -T ATTAAGGACC TOTC ATCTGG 960
*.1.q.G~
TTCTAACTGA AGGGAACTGC TCCAGGAATT ATTTACTCAA
TGACGACGCT TTCTTGCGCA TGCGTACCAG ATGCTCGAGA
21 221
GCACCCCCTT ATCTCAGTGG ATTTCTGCAG CAGCCGCCCG GACTACAAAG TATGGCAAAC AAGTAAGCTA TGACAACCTT
-9qCkC-TT-CTG--T-TC&AAGCCG
aGCAAAGTG GTGTCCAACA AAAGGAAGTA AATTGATCAT GGCACTAAGT CTAACACAAA TCTCGTACCA GGGTATGACT CGCTTTGAAA CCGAAGAGAC GCCGGCCGCT ACTGCCGCCG AATGTTACGC Gr.TGTATGCG TGGCGGAATA TGGAAATAGA AGAAATTGCC ATTGACAGTC GTTTGCAGCG CGTCCTCTTC CAGGCGGAGT TAAAAGCGTT TACGTTCGAC GGACCTGACC ACAAGAACCG TAAGGCATCG GGACATCGAA
TTGCjTGAATG __CCGAAGTAAA
ATAGACAGGC GAAACCGCAG CGTGAAAAAT CAAATGTTAC GTGCTATCGG GqGGACATTT CTCAACCMIG_TGCCCGAAAA ACGGAAGTGC AGCCAAACAA
-AACAGAAGAC TGTTAACCAA TGTTGTGTGA CTGAGCCAAA GGGAAGCTTG GTGAGATCTA TTAGCAGTAA GTGGCAGTGT TGGAAATATT 23 GAGAAATACT
24 25 ALATCAAAAAC
TCCAATTTGT GTTATCTGGT GTGCAGTACA ATTTCATGTA TAAACAAGAC
TTCGATTCCG AAACGTATAA TTGCAAATCT GTCCCGAACG TTTTATGTAA
TGTTAATAAT_TGTTCTTTTA
2,iCTGCATTGTA TTGATTCATC --CGTTACTGAG CGATCGTTTA AGCTCAATTA ATTGAGTTAC 28 TACTCGAGTA AATACAGGCG GCGCCGACCT TACAGACTTA ATTGATAAGA TGTAATATTG CGTCAAAAAG AAGAACGGAA CACGATTTCG GATAAATACC AGCAAATTAT TTCACATGCC
GCTTCTAAT
GTGCAGAGTG ACAATACTGT TGAGTGAAGT AGCAAATGAG TTTTGACATC TATGTTAATT AAATAAATAT TATTGACACC CTCGACTTCA CCGTTTGATT GAGAGAAGTA TTGCTCTCCC
CTTTGTCCGA TTGAAAAGCA AAATGAAAAT GCCCCGAAGA TTCGCCCAAG CCGACCGTCT ATGTGTTGAT CCTTGCCGCT TATCAGCGAT TGGATATGGC AAGTGGTTTC
TGAATCCGTG GAATATACTG CATTTGTGCC CCAGCGCACA GTCATAAATG CACTTGGTGA GGACGACATA ATGGTAGTAT CGCCAACCAA TTTGAATGTT CTTACAAAGG CTGGTTTTAC AACAACGGTT CAGTATTTAG GCTATGAAGT AAAGATATCT TCATTAAGCT CCTTGCCTCC GCCTCTTACT TTCGCAAATT ATTGTATTCA CTTTCGTCTG GTAGCGGCAA 31 CAGACTTAAA GTTGTGACGA TCCTCACAAA CTGATGCAAG GAGTTGCACA ATATCCTATT 32 CCGTAFTAGAA AGTAAGCCCC ATGTAATCGA ATCTAGATAT CACTCCTACG AGCTGGAAAC TCGCCATTAC CTAATTGGCC GTGAGTTCGT TTCTCGCACA AAAATAGATT TAACCCCCAG GTTTAATTTC GAAATTCAGT ATAGAGAGGG AAGAAATCCT TTATCACCCG AACACAATTT AAATCTGTCT GAAATTTCAA GTACTTGGCT AATAGAAATT GTTAACAAAT TGGAGTCAGA TGATTTGCGA AAAGGTGTAT TATATCGCAA ACCAGTTGTA CCCAGAGCTT TCAAATGGTC GCATTTAGGG TGGCAAAAGA CACTTGATAA 33134GAACAAGTAT GTTCGAAAAT TTTTTTCAAA TTCCGGGAAG GTTCAGGCGG AACTTCATTC CATCCACATA GATATAACGG GGAAATTAAG 36,35 TGTTCAGATC GATGCCTATA CAAAGTTTGT CGAAAGCTGT GTTAATGCTA TGAAATCTTC (ACATCCCAAC CAGGGCAGAT GTTTTACTAG GAAAGTTGAA CTTCACTTGA TTGCTACGGG GGTGATGGAA ACACTGAAAA ATTTGTTGTC GGACGCACTT GGCGAAGTCC AACTTGCACT AAGTCCGTTA GAAATGTTAA TTGGTAAACA TGAGACCGAA TGTGAAATAG ATATGGCAAC TTCTTAGCGT CTTACGACAA ATCCCGATTT CACGTAGGTG ACTATGTGCT ATTGAGGAAT AAATTCAGAG GACCGTTTTT GGTAACTGAA TCGTTGACGA GTAACCGATC GTTCA-AGTAT 31 GCAGAAATCC CGAATGAGTT AAACGAGAAT TGAATGAAAA GAAAAGCCCG CCAATGAGTT
29.,30CATTGGCTTG
ACCGATATAA AACTGCATGG
ACGTTTTTAA AGTCGGCTGG GTTACAACCG ACATCCAATG AAGAAGCGAC CCAGTTGTAT ATGCGACAAG TGCGGCCAAC GATGTGACTC CATCAACGAG AAAGACGACA GGCATTGGTG GAAAATATTA CTGATAGGGC
CTCCCTTATT AATGATAAAA CACTATAAAC GAATGATATT CGATAATTTT AAAGTTGTAA GCGCTGTTAT GTTACTGTTA ATAAAAAAAA AAAATTGTTC GTATCACTAG TTTACCAACG GGATACCTCA GATCCAACTA AAACTGTTCA GTGCGAATCG AGGTGAGCGA TTTGCTAGCC CCATGTTGCT TTTAGAGAGC TAAACTCGAA CAAATTGCTA GACTTCGCGG CACCAAATCC CGATTCACCC AAAAATGCGC CATCTGTTTT TCTTTTGTAA TCGTTTACAT TTGGAAAGGT TAAAAACTGT GCTAAATGCA GTTTTCTCAA GAAATTCGTC CGAATGTGCG
AGACGGCCTC CCTTGCTAAC GGAATTGGCT CTTTAACCTT GCGAGCGGGA TCCTCAAACT CTCTCCGGCG TTAGACAATT CGTGTCTGGA TTCTCCCAAC TTATGAAACC GATTACATGG AGCGCTGAGC TGGAAGAGAT TGAGCCTGCT CTGGTAATCT TCGACCCGCA TGCCTGTGGA TATGGAGCGA TACTTTTGCA ATACTTCAGC AAAACAACTA CCTCTGTTGA CTTGGCAGTG GTAAAAGCCG TTAAACATTT TGTCTATACA GACTGCAATT CATTAAAAGC AGTTCACCGC TGGTGGGCCT ACTTACAATC TAAGCGTATG GCTCATGTGG ATTTCCTATC GTCAATAAAC AAGATTCCCG AAAAAAGAGT TCTTGCTGAG CAACGGTTAG ACCTTGAGAT TGAATTGGCC GAAAACTTGG CCAAAACGTA GGTCCAAAGA CGAGGTAGAA CAAGTTATTT AGTAATTAAC CAGGTACACG AGTCGATAAT AGTGTACCAG TATTATTGGT TCGCTAAAAT CTGCATAACT TGTAGATCAG_TOAATCATC CATTCCGAAG ACAAGTATAC CGTGGCACAC TGGCAAGAGC GATTTGAAGG AATATGTCAT TTATCTGTAT.CACACCTTAA AGATAGATGC CATATCCTTA TTTGGAGTAC CFAGATCGCAT CTCTAAGTTT TCAGAGTTTT GCGTATCGCA AATGAGCCGT GCAAATGGGC AAGTGGAACG AGTGGTAGAA TCAAGTCAAC GATCGTGGCA GAATTGTACA ATTTCTCGTG CCACTGAGGC GGCTTGACCC CTTGGATTAG TTCCCCCATG TGTTAGAGCT CATGCGACAG AAAATATGAA GATAGCAGTA GGGCAGCCGT TGACAAACAC GAAGAAAGAC ACCAAACTAA GTTAGATCCG GTATTAGAGG GTGACAGGTA TACACTAAAG TGCCATGAAG ATATACGTAA AATGCCGGAT GTAGAGCAAT AGCTGAAATA TAGAAACAGT
CTTTTG~TGA?A CGAG MTT
1020
TGGCAAGAGT 1080
~
1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740
1800 1860 1920
1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940
3000 3060 3120 3180 3240 3300
3360 3420 3480 3540 3600
3660 3720 3780 3840 3900 3960 4020 4080 4140
4200 4260 4320 4380 4440
4500 4560 4620 4680 4740 4800 4860
38' AGGTGAGACG AGATTATTTGTCGTACGATATGGGTTAT qp 4980 TCCAGGAGAG 'ACGATGAGTT TGGATTGAAT TAATAATCAA GTGTGTGTGA ACTGGCGGAA 5040 GATCGATATAj TAGAAATCGA TAAATGATAA TGTTAAGATA AGTTGTGAGC TGATGTATTA 5100 CTGATCAATG GCTGTTTTGT TGAAATTAAA GTATCAAAAG
GAACTGAATA TTCTTCACAG TGTTATTGAA CTTACACGAG
TGAAAATAGA AATTAAGATT TAGTGATGAA GACGTGAAAT
ATAAGTTATC TAAGAAATAC AGTAGGTGAT GTCAGALATGG
CCAGCAACAG ACCTAATAAA CTTGATATCT CCGTGTCGTG
TGAAATAAGA GTCAAACTAA TGGTATCTCG GCGAAAATAA
_39,40 41TGAGTATGCIG TGTAGCGCT GLTTTACTTCT TCTCCAINECTTTGCTA TTATGCGU 7A 'IL 7TATI GAACACGTGG CAAAAAThPC TTGAGGTGTG AAATGAGAAT GATAATGAAT TCTTTATTAC AAATATGFAT AATAAATGTA TCTCCCAAAT ATTCTAGAGTTCGTAT 42 TGGCGATCTT CGTTGAACTT TCGCTCGGCT CTCAGTTGCA GTACTGGGTA GACCGAGAGA
GOGGGGTATGT
CTCAATTCCC ACGATTGTCA GACTGGTTTA CTGTAGTTAG
5160 5220 5280 5340 5400 5460 5520 5580 5640 5700
TAACTCGCGC GGTAGTTAGT TCGGCCACAA CAGTAATGAT GTTGTGGTCA ACTCGTCCAC TCTCCACTCA TATGTGGCCC ACATAGATGC TGCCTCATTG CGTCGATATC ACTAGTAGCT CTGTGATTAT CGTTCGTGTA CACCTCTGTT 5760 GGGCGGCGAC TCTTGCCTCC AGCATCCACT CTGACAGCAG CTCACCCCGT 5820
CTTGCCCCTC GGCATGACGA GAGAGTTTGC TAAGCCACAT GGGGGACACT 5880 GCATTCGCGT CGAGGCCCAG GATTGCGGGG GTTCTGCTGG CCTGCAGCAA GACCGCATCC 5940 ATGTACCTAT ACCAGAGTGC ATCGAACTGC AGTATCGG 6000
View publication stats
Lihat lebih banyak...
Comentários