Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences

Share Embed


Descrição do Produto

174–175

Nucleic Acids Research, 2002, Vol. 30, No. 1

© 2002 Oxford University Press

Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences Cecilia Lanave*, Flavio Licciulli1, Mariateresa De Robertis2, Alessandra Marolla1 and Marcella Attimonelli1 Centro di Studio sui Mitocondri e Metabolismo Energetico CNR, Via Amendola 165/A, 70126 Bari, Italy, 1Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Via E. Orabona 4, 70126 Bari, Italy and 2Dipartimento di Genetica e Anatomia Patologica, Università di Bari, Via E. Orabona 4, 70126 Bari, Italy Received September 20, 2001; Accepted September 21, 2001

ABSTRACT The AMmtDB database (http://bighost.area.ba.cnr.it/ mitochondriome) has been updated by collecting the multi-aligned sequences of Chordata and Invertebrata mitochondrial genes coding for proteins and tRNAs. Links to the multi-aligned mtDNA intraspecies variants, collected in VarMmtDB at the Mitochondriome web site, have been introduced. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user’s operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually. INTRODUCTION Mitochondrial genomes are frequently used in molecular evolution, including molecular systematics and phylogeny studies. In Metazoa, the mitochondrial (mt) genome (1) is circular; it has a genome length of ∼15–17 kb and a very compact gene organization, i.e. no space between genes, in some cases short overlaps of genes, and the presence of only one major non-coding region containing, in general, the main regulatory elements. Because of its reduced size the metazoan mt genome can be completely sequenced rather easily, thus making comparative studies possible not only at the gene but also at the genomic level. For these studies a very important prerequisite is the best multiple alignment of the sequences under comparison. Due to the great number of the available complete Metazoa mt genes, the availability of a database reporting carefully produced multi-alignments of the mtDNA genes associated with a system allowing the extraction and management of the selected data according to the needs of the end-users may be extremely useful. Here we present an update of the AMmtDB database (2). The data of the previous release associated to Chordata, have been enriched with Invertebrata data, thus covering the multiple alignment for all the Metazoa

mt genes coding for proteins and for all the Chordata mt genes coding for tRNAs. The section of multi-aligned D-loop sequences (3) related to mammalian species is at present no longer updated (2). The intra-species multi-alignments of variants of complete mtDNA genes coding for proteins and tRNAs derived from the VarMmtDB collection (http:// bighost.area.ba.cnr.it/mitochondriome) have been added in the present release. AMmtDB DATABASE Data source Sequence data are mainly retrieved from the primary databases [EMBL (4) and GenBank (5)], generally using Entrez (5) and SRS (6). Another source of data collection is the literature for the published sequence data not included in the primary databases. Useful to the updating is the consultation of the site (http:// megasun.bch.umontreal.ca/ogmp/projects/projects.html) and the Organelle complete genome list available through Entrez at the Genome section. The database was updated in August 2001. Data organization The database is organized into three main sections: CDS, tRNA and D-loop sequences. Sequences coding for partial genes are not included. The genes coding for proteins are multi-aligned on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. The multi-alignments are produced with CLUSTALW (7) and PileUP (8) and then carefully optimized manually. The resulting multi-alignments are stored in MSF format. For genes coding for tRNAs the multi-alignments based on the primary structure have been updated in the present release. Each entry in the database is associated to a class/gene-specific multi-alignment and assumes a name composed by both the class and the gene names. Because of the variability of the mt genomes and the usage of these for diversity studies, data from primary databases reporting sequences from the same species and the same genes have been analyzed and grouped in gene- and species-specific cluster identified by a code. Each cluster is stored in a multialigned file and the link to this file is available through the AMmtDB entries. Further information about the variant

*To whom correspondence should be addressed. Tel: +39 80 548 2180; Fax: +39 80 548 4467; Email: [email protected]

Nucleic Acids Research, 2002, Vol. 30, No. 1

clusters is available at the Mitochondriome web site in the section Databases/VarMmtDB. The AMmtDB flat-file format (FF) has been defined and is available at http://bighost.area.ba.cnr.it/ BIG/Tutorials/AMmtDB/AMmtDBff.html. Cross-referencing to the primary databases (lines DR in the FF), to the multi-alignment files (lines ML in the FF) and to the VarMmtDB compilation multi-alignments (lines MV in the FF) is available. Data content AMmtDB contains at present 365 entries related to 1046 different species (updated August 2001) and 27 mammalian D-loop sequences (updated August 1999). These data include genes from 114 Chordata and 46 Invertebrata complete mitochondrial genomes. The taxonomic classes for the presently available data are listed here alphabetically: agnatha, amphibia, arthropoda, aves, brachiopoda, chondrichthyes, echinodermata, mammalia, mollusca, nematoda, osteichthyes, platyhelminthes, reptilia and urochordata. AMmtDB availability The AMmtDB database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/ mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, genome maps, links to other mitochondrial sites and related information are available. The data selected through SRS, reporting in the entry the name of the multi-alignment files, can be viewed and managed using GeneDoc or other programs for the management of multialigned data depending on the user’s operative system. The multi-alignment files both in MSF (filename.msf) and FASTA (filename.aln) format can be downloaded via ftp at the following address bighost.area.ba.cnr.it/pub/Embnet/Database/ AMmtDB.

175

Users of this database are kindly requested to cite the present article. ACKNOWLEDGEMENTS This work has been supported by ‘Ministero Università e Ricerca Scientifica’, Italy (PRIN99, Programma Biotecnologie legge 95/95-MURST 5%; Progetto MURST Cluster C03/2000, CEGBA). REFERENCES 1. Saccone,C. (1994) The evolution of mtDNA. Curr. Opin. Genet. Dev., 4, 875–881. 2. Lanave,C., Licciulli,F., Liuni,S. and Attimonelli,M. (2000) Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences. Nucleic Acids Res., 28, 153–154. 3. Sbisà,E., Tanzariello,F., Reyes,A., Pesole,G. and Saccone,C. (1997) Mammalian mitochondrial D-loop region structural analysis: identification of new conserved sequences and their functional and evolutionary implications. Gene, 205, 125–140. 4. Stoesser,G., Baker,W., van den Broek,A., Camon,E., Garcia-Pastor,M., Kanz,C., Kulikova,T., Lombard,V., Lopez,R., Parkinson,H. et al. (2001) The EMBL Nucleotide Sequence Database. Nucleic Acids Res., 29, 17–21. Updated article in this issue: Nucleic Acids Res. (2002), 30, 21–26. 5. Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 11–16. Updated article in this issue: Nucleic Acids Res. (2002), 30, 13–16. 6. Etzold,T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114–128. 7. Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) CLUSTAL W: improved software for multiple sequence alignment. Comput. Appl. Biosci., 8, 189–191. 8. Devereux,J., Haeberli,P. and Smithies,O. (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res., 12, 387–395.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.