Micro-Mar: a database for dynamic representation of marine microbial biodiversity
© Pushker et al; licensee BioMed Central Ltd. 2005
Received: 10 May 2005
Accepted: 09 September 2005
Published: 09 September 2005
The cataloging of marine prokaryotic DNA sequences is a fundamental aspect for bioprospecting and also for the development of evolutionary and speciation models. However, large amount of DNA sequences used to quantify prokaryotic biodiversity requires proper tools for storing, managing and analyzing these data for research purposes.
The Micro-Mar database has been created to collect DNA diversity information from marine prokaryotes for biogeographical and ecological analyses. The database currently includes 11874 sequences corresponding to high resolution taxonomic genes (16S rRNA, ITS and 23S rRNA) and many other genes including CDS of marine prokaryotes together with available biogeographical and ecological information.
The database aims to integrate molecular data and taxonomic affiliation with biogeographical and ecological features that will allow to have a dynamic representation of the marine microbial diversity embedded in a user friendly web interface. It is available online at http://egg.umh.es/micromar/.
The global oceanic ecosystem is highly dependent on the activity of its large population of prokaryotes. However, their small size, relatively diluted environment and reluctance to be grown in pure culture make marine prokaryotes also one of the less known group of microbes. During the last decade, PCR based approaches have produced an enormous amount of information in this field, mostly based on the 16S rRNA genes due to their accepted taxonomic relevance . Other genes such as rpoB  and recA  have started to be used on the grounds of their evolutionary stability. In recent years the sequencing of large insert libraries such as BACs or Fosmids have produced an important additional source of information. Last year, the Whole Genome Shotgun (WGS) applied to the Sargasso Sea produced a sequence database of 1.045 billion base pairs . Still, many databases deal with only 16S or ITS sequences. For example, the Ribosomal Database Project counts 101632 entries corresponding to 16S sequences  and RISSC has more than 1600 entries corresponding to 16S-23S ribosomal spacer sequences . While genomic sequences submission is continuously growing, there is not a clear correspondence in the improvement of analytical power of this enormous amount of information.
Micro-Mar is a novel database storing publicly available marine prokaryotes sequences along with their biogeographical (sampling site, latitude, longitude etc.) and ecological information (depth, temperature, salinity etc.). Each entry represents an individual marine prokaryote with one or more DNA sequences coming from a particular sampling location and depth. The database aims not only to provide a collection of marine prokaryotes data, but also a research tool to relate microbial biodiversity with its environment, opening possibilities for studying adaptations at the level of the microbial community, designing water management strategies, pollution detection or marine productivity prediction.
Construction and content
In order to retrieve marine prokaryotes sequences from the NCBI , ad hoc queries were used. Moreover manual sequence searches were also carried out. All the sequences obtained were downloaded in GenBank format. Some of the details, such as geographic origin, depth, temperature etc., were obtained manually (if not available) by searching within the publications or by direct interaction with the authors. Type of entry indicates whether a sequence comes from offshore, inshore or sediments and a PCR product, a cloned DNA product or an isolated strain. A BLAST  search against Micro-Mar was performed in order to get the closest marine prokaryotes sequence and also the closest taxonomic unit (generally a pure culture). Top fifty BLAST hits are also available on the webpage to give more idea about the complete similarity profile for a particular sequence. Top fifty BLAST hits to whole NCBI nucleotide sequence database are also reported. The complete dataset was loaded in to MySQL  relational tables. Micro-Mar uses LAMP: The Open Source Web Platform . Geographic Information System (GIS) uses JpGraph library  to display different sampling locations on a world map. All the web pages follow HTML 4.01 standard and use CSS for consistent styling.
There are five major options available in the Micro-Mar database: (i) search, (ii) GIS, (iii) local BLAST, (iv) MMSeqUp and (v) forum.
The search option provides an interface for a large number of queries to the database. It can be used to search the database for 5S rRNA, 16S rRNA, ITS, 23S rRNA or CDS sequences along with various biogeographical and ecological parameters. The results can be either in tabular format or in a world map showing different sampling sites through GIS. Each entry is linked back to other databases such as NCBI for more information. A number of entries can be selected to analyze further along with the given sequences by aligning using CLUSTALW  and a tree can be created using PHYLIP  to see the phylogenetic position of submitted sequences against the Micro-Mar sequences. Alignment files (PHYLIP format) and tree files (Newick tree format, Postscript and PDF) are also available for download.
The GIS option provides an interface for selecting a particular sampling location on the world map and getting all the sequences from that location and their details. Furthermore, for a selected region on the map, the following information can be obtained, i) taxonomy report: taxonomic details at different levels (domain, phylum, class, order, family and genus); ii) depth report: a plot showing number of sequences vs depth; iii) biodiversity report: a list of organisms found; iv) get all entries and v) advanced search. The reports allow to retrieve sequences corresponding to a particular taxonomy, depth or biodiversity.
The local BLAST option can be used to do a BLAST search against Micro-Mar database to get the most similar sequences to the submitted sequences. The results can be either in default BLAST format or in a tabular format. All the hits can be selected and displayed on the world map using GIS and all the GIS features discussed above can be used. Selected sequences can be downloaded in FASTA Format and also analyzed further by aligning and creating a tree along with the given sequences. Alignment and tree files can also be downloaded in different formats as described in the search option.
MMSeqUp facilitates online sequence submission to the Micro-Mar database. It allows users to upload a file containing new marine prokaryotes sequences. Related biogeographical and ecological parameters can also be submitted on the webpage.
An online forum powered by PHPBB  has been created for the following tasks: (i) FAQs: a compilation of frequently asked questions, (ii) suggestions, (iii) discussion: open discussion and iv) feedback: comments on the Micro-Mar web interface v) What's new: Recent developments in the database.
The creation of Micro-Mar database is an initiative towards cataloging all the information related to marine prokaryotes collected during the lasts two decades and providing an interface that will help the scientific community to do comparative analyses of marine prokaryotes sequences and make it amenable for biogeographical and ecological analyses. The database is updated every week to include the most recent marine prokaryotic sequences. In near future, more samples from extreme environments will be integrated in the database to improve the analytical power and the biodiversity range. As more and more entries are incorporated, it will be possible to correlate accurately the bacterial biodiversity with biogeographical and ecological parameters giving a global overview of the various aspects of the biodiversity within the oceans. In order to achieve this, it is encouraged that scientists include more information about biogeographical and ecological parameters while submitting their sequences to various public databases.
Availability and requirements
List of abbreviations
– Geographic Information System
– Linux + Apache + MySQL + PERL/PHP/Python
This work was funded by MIRACLE (EVK3-2002-00087), GEMINI (QLK3-CT-2002-02056) projects of the European Commission and "Mineria Genómica" project of Generalitat Valenciana (GRUPOS03/060). We would like to thank Alex Mira for helping with the manuscript and providing constructive comments. We also thank Boris A. Legault and many other users for giving their feedback on the database.
- Woese C, Stackebrandt E, Macke T, Fox G: A phylogenetic definition of the major eubacterial taxa. Syst Appl Microbiol 1985, 6: 143–151.View ArticlePubMedGoogle Scholar
- Mollet C, Drancourt M, Raoult D: rpoB sequence analysis as a novel basis for bacterial identification. Mol Microbiol 1997, 26: 1005–1011. 10.1046/j.1365-2958.1997.6382009.xView ArticlePubMedGoogle Scholar
- Karlin S, Weinstock G, Brendel V: Bacterial classifications derived from recA protein sequence comparisons. J Bacteriol 1995, 177: 6881–6893.PubMed CentralPubMedGoogle Scholar
- Venter J, Remington K, Heidelberg J, Halpern A, Rusch D, Eisen J, Wu D, Paulsen I, Nelson K, Nelson W, Fouts D, Levy S, Knap A, Lomas M, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith H: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 2004, 304: 66–74. 10.1126/science.1093857View ArticlePubMedGoogle Scholar
- Cole J, Chai B, Farris R, Wang Q, Kulam S, McGarrell D, Garrity G, Tiedje J: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 2005, 33: D294-D296. 10.1093/nar/gki038PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia-Martinez J, Bescos I, Rodriguez-Sala J, Rodriguez-Valera F: RISSC: a novel database for ribosomal 16S-23S RNA genes spacer regions. Nucleic Acids Res 2001, 29: 178–180. 10.1093/nar/29.1.178PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson J, Higgins D, Gibson T: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673–4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author Department of Genome Sciences, University of Washington, Seattle; 2004Google Scholar
- Wheeler D, Chappey C, Lash A, Leipe D, Madden T, Schuler G, Tatusova T, Rapp B: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2000, 28: 10–14. 10.1093/nar/28.1.10PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.