MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization
© Zhou et al; licensee BioMed Central Ltd. 2006
Received: 06 June 2006
Accepted: 17 October 2006
Published: 17 October 2006
MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data.
MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO.
MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports. Access to MannDB is freely available at http://manndb.llnl.gov/.
MannDB was created to meet a need for rapid, comprehensive sequence analysis with an emphasis on protein processing, surface characteristics, and functional classification to support selection of pathogen or virulence-associated proteins suitable as targets for driving the development of protein-based reagents (e.g., antibodies, non-natural amino-acid ligands, synthetic high-affinity ligands) for pathogen detection [1, 2]. Because comprehensive analyses of this type required using a large number of open-source tools, and because it was necessary to scale the computations for analysis of whole proteomes, we built a fully automated system for executing sequence analysis tools and for storage, integration, and display of protein sequence analysis and annotation data. In order to be able to rapidly examine and compare whole bacterial and viral proteomes for selection of suitable target proteins for bio-defense applications, we compiled data for whole proteomes from representative organisms from all categories of biological threat agents listed by several governmental agencies: APHIS, CDC, HHS, USDA, USFDA, NIAID, and WHO [3–9] as well as taxonomic near-neighbor species as appropriate. Therefore, the scope of MannDB is automated sequence analysis and evidence integration for proteins from all currently recognized bio-threat pathogens. Emphasis is placed upon analyses that are most useful in characterizing potential protein targets and surface motifs that could be exploited for development of detection reagents. The content of MannDB is updated on a regular basis.
In recent years several software systems and accompanying databases have been developed for microbial genome annotation, each with a particular emphasis [10–19]. Some databases place an emphasis on gene prediction and DNA-based analyses vs. protein sequence-based analyses, or provide automated (primary) vs. curated (secondary) annotations. Although microbial annotation databases frequently include predictions of biological, chemical, structural, and physical properties of proteins (e.g., antigenicity, post-translational modifications, hydrophobicity, membrane helices), none currently offers the comprehensive suite of analyses (see MannDB website for complete list of tools) contained within MannDB for characterizing viral as well as bacterial proteins from human and agricultural/veterinary pathogens of interest to the bio-defense community and for rapidly identifying putative virulence-associated proteins for development of functional assays. The MannDB database was built and linked to MvirDB  in order to meet these requirements. In addition, we focus on sequence analyses that assist in selection of protein features (e.g., surface characteristics) most suited for targeting detection reagent development.
Construction and content
Utility and discussion
MannDB provides users with pre-computed sequence analyses for complete proteomes of bacterial and viral pathogens from several governmental agencies' lists of bio-threat agents. The genomes and tools are maintained up to date, with predictions being re-run every 3 months. The user can browse proteomes, or can blast sequences against MannDB to pull up related entries and associated data. MannDB provides a convenient source of automated sequence analyses and downloaded annotation information for whole proteomes of human pathogenic bacteria and viruses and has a high degree of integration with external databases.
MannDB provides sequence analysis information of primary interest to researchers in the bio-defense community. We have been using MannDB for several years to "annotate" DNA signatures  and more recently to assist collaborators in efforts to down-select from whole bacterial and viral genomes to identify suitable protein targets and protein features for driving the development of detection reagents . For example, a common requirement for a detection assay is that it be performed with minimal sample disruption. Therefore, an initial down selection for proteins expected to be on the surface of a bacterial particle might entail identification of proteins that are predicted to be secreted or membrane bound by using tools such as PSORT [23, 24], TMHMM , SignalP, TargetP , TopPred , and HMMTOP . Having results from several tools that provide similar predictions but using different algorithms or slightly different approaches allows us to compare predictions and make selections with greater confidence. Identification of surface features for targeting of detection reagents is done primarily by means of additional sequence- and structure-based analyses , although predictions pertaining to post-translational modifications (e.g., glycosylation, cleavage) are taken into consideration as they may affect protein recognition.
MannDB is a genome-centric database containing comprehensive automated sequence analysis predictions for protein sequences from organisms of interest to the bio-defense research community. Computational tools for the MannDB automated pipeline were selected based on customer needs in providing down selections from large sets of proteins (e.g., whole proteomes) to short lists of proteins most suitable for developing reagents to be used in field assays for detection of pathogens. For that reason we have focused our efforts on applying tools that would enable selection of proteins that meet assay requirements, such as cellular localization, that would assist in determining the value of a surface feature for targeting ligand binding, or that would identify antigenic sub-sequences of particular value in antibody development. As the goals of some of these assays have been to detect toxins or proteins associated with virulence, we constructed hard links between protein sequences in MannDB with entries in MvirDB in order to conveniently identify and characterize protein targets and features for these applications. We believe that MannDB will be of general use to the bio-defense and medical research communities as a resource for predictive sequence analyses and virulence information.
Availability and requirements
MannDB is freely accessible at http://manndb.llnl.gov/. Although the software that populates and updates MannDB is not open-source, the user may request collaborative sequence analysis services by contacting firstname.lastname@example.org.
List of abbreviations
Basic local alignment search tool.
Animal and Plant Health Inspection Service.
Centers for Disease Control and Prevention.
Health and Human Services.
United States Department of Agriculture.
United States Food and Drug Administration.
National Institute of Allergies and Infectious Diseases.
World Health Organization.
This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract no. W-7405-ENG-48 and was supported by funding from the Department of Homeland Security.
- Slezak T, Kuczmarski T, Ott L, Torres C, Mederos D, Smith J, Truitt B, Mulakken N, Lam M, Vitalis E, Zemla A, Zhou C, Gardner S: Comparative genomics tools applied to bioterrorism defense. Briefings in Bioinformatics 2003, 4: 133–149. 10.1093/bib/4.2.133View ArticlePubMedGoogle Scholar
- Zhou CEZ, Zemla A, Roe D, Young M, Lam M, Schoeinger J, Balhorn R: Computational approaches for identification of conserved/unique binding pockets in the A chain of ricin. Bioinformatics 2005, 21: 3085–3096. [http://bioinformatics.oxfordjournals.org/cgi/reprint/21/14/3089]Google Scholar
- APHIS Agricultural Select Agent Program select agent and toxin list[http://www.aphis.usda.gov/programs/ag_selectagent/ag_bioterr_toxinslist.html]
- CDC bioterrorism agents/diseases list[http://www.bt.cdc.gov/agent/agentlist-category.asp]
- HHS and USDA select agents and toxins list[http://www.cdc.gov/od/sap/docs/salist.pdf]
- USFDA Bad Bug Book[http://www.cfsan.fda.gov/~mow/intro.html]
- NIAID category A, B and C priority pathogens[http://www3.niaid.nih.gov/biodefense/bandc_priority.htm]
- WHO list of major zoonotic diseases[http://www.who.int/zoonoses/diseases/en/]
- WHO list of diseases covered by the Epidemic and Pandemic Alert and Response (EPR)[http://www.who.int/csr/disease/en/]
- Andrade MA, Brown NP, Leroy C, Hoersh S, de Daruvar A, Reigh C, Franchini A, Tamames J, Valencia A, Ousounis C, Sander C: Automated genome sequence analysis and annotation. Bioinformatics 1999, 15: 391–412. 10.1093/bioinformatics/15.5.391View ArticlePubMedGoogle Scholar
- Frishman D, Albermann K, Hari J, Heumann K, Metanomski A, Zollner A, Mewes H-W: Functional and structural genomics using PEDANT. Bioinformatics 2001, 17: 44–57. 10.1093/bioinformatics/17.1.44View ArticlePubMedGoogle Scholar
- Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJA, Lachaize C, Veuthey A-L, Gasteiger E, Bairoch A: Automated annotation of microbial proteomes in SWISS-PROT. Computational Biology and Chemistry 2003, 27: 49–58. 10.1016/S1476-9271(02)00094-4View ArticlePubMedGoogle Scholar
- Goesmann A, Linke B, Bartels D, Dondrup M, Drause L, Neuweger H, Oehm S, Paczian T, Wilke A, Meyer F: BRIGEP – the BRIDGE-based genome-transcriptome-proteome browser. Nucleic Acids Research 2005, 33: W710-W716. 10.1093/nar/gki400PubMed CentralView ArticlePubMedGoogle Scholar
- Markowitz VM, Korzeniewski F, Palaniappan K, Szeto P, Ivanova N, Kyrpides NC: The integrated microbial genomes (IMG) system: a case study in biological data management. Proceedings of the 31st VLDB Conference: 2005; Trondheim Norway 2005, 1067–1078.Google Scholar
- Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, Puhler A: GenDB – an open source genome annotation system for prokaryote genomes. Nucleic Acids Research 2003, 31: 2187–2195. 10.1093/nar/gkg312PubMed CentralView ArticlePubMedGoogle Scholar
- Peterson JD, Umayam LA, Dickinson TM, Hickey EK, White O: The comprehensive microbial resource. Nucleic Acids Research 2001, 29: 123–125. 10.1093/nar/29.1.123PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 2005, 33: D501-D504. 10.1093/nar/gki025PubMed CentralView ArticlePubMedGoogle Scholar
- Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Research 2006, 34: 53–65. 10.1093/nar/gkj406PubMed CentralView ArticlePubMedGoogle Scholar
- Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS: BASys: a web server for automated bacterial genome annotation. Nucleic Acids Research 2005, 33: W455-W459. 10.1093/nar/gki593PubMed CentralView ArticlePubMedGoogle Scholar
- MvirDB microbial virulence database[http://mvirdb.llnl.gov]
- Blom N, Hansen J, Blaas D, Brunak S: Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Science 1996, 5: 2203–2216.PubMed CentralView ArticlePubMedGoogle Scholar
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. Journal of Molecular Biology. 2004, 340: 783–795.Google Scholar
- Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FSL: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21: 617–623. 10.1093/bioinformatics/bti057View ArticlePubMedGoogle Scholar
- Nakai K, Horton P: PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization. Trends in Biochemical Science 1999, 24: 34–35. 10.1016/S0968-0004(98)01336-XView ArticleGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer ELL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology 2001, 305: 567–580. 10.1006/jmbi.2000.4315View ArticlePubMedGoogle Scholar
- Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 2000, 300: 1005–1016. 10.1006/jmbi.2000.3903View ArticlePubMedGoogle Scholar
- Claros MG, von Heijne G: TopPred II: An improved software for membrane protein structure predictions. CABIOS 1994, 10: 685–686.PubMedGoogle Scholar
- Tusnady GE, Simon I: Principles governing amino acid composition of integral membrane proteins: applications to topology prediction. Journal of Molecular Biology 1998, 283: 489–506. 10.1006/jmbi.1998.2107View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.