MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization
© Zhou et al. 2006
Received: 06 June 2006
Accepted: 17 October 2006
Published: 17 October 2006
Skip to main content
© Zhou et al. 2006
Received: 06 June 2006
Accepted: 17 October 2006
Published: 17 October 2006
MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data.
MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO.
MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports. Access to MannDB is freely available at http://manndb.llnl.gov/.
MannDB was created to meet a need for rapid, comprehensive sequence analysis with an emphasis on protein processing, surface characteristics, and functional classification to support selection of pathogen or virulence-associated proteins suitable as targets for driving the development of protein-based reagents (e.g., antibodies, non-natural amino-acid ligands, synthetic high-affinity ligands) for pathogen detection [1, 2]. Because comprehensive analyses of this type required using a large number of open-source tools, and because it was necessary to scale the computations for analysis of whole proteomes, we built a fully automated system for executing sequence analysis tools and for storage, integration, and display of protein sequence analysis and annotation data. In order to be able to rapidly examine and compare whole bacterial and viral proteomes for selection of suitable target proteins for bio-defense applications, we compiled data for whole proteomes from representative organisms from all categories of biological threat agents listed by several governmental agencies: APHIS, CDC, HHS, USDA, USFDA, NIAID, and WHO [3–9] as well as taxonomic near-neighbor species as appropriate. Therefore, the scope of MannDB is automated sequence analysis and evidence integration for proteins from all currently recognized bio-threat pathogens. Emphasis is placed upon analyses that are most useful in characterizing potential protein targets and surface motifs that could be exploited for development of detection reagents. The content of MannDB is updated on a regular basis.
In recent years several software systems and accompanying databases have been developed for microbial genome annotation, each with a particular emphasis [10–19]. Some databases place an emphasis on gene prediction and DNA-based analyses vs. protein sequence-based analyses, or provide automated (primary) vs. curated (secondary) annotations. Although microbial annotation databases frequently include predictions of biological, chemical, structural, and physical properties of proteins (e.g., antigenicity, post-translational modifications, hydrophobicity, membrane helices), none currently offers the comprehensive suite of analyses (see MannDB website for complete list of tools) contained within MannDB for characterizing viral as well as bacterial proteins from human and agricultural/veterinary pathogens of interest to the bio-defense community and for rapidly identifying putative virulence-associated proteins for development of functional assays. The MannDB database was built and linked to MvirDB  in order to meet these requirements. In addition, we focus on sequence analyses that assist in selection of protein features (e.g., surface characteristics) most suited for targeting detection reagent development.
MannDB provides users with pre-computed sequence analyses for complete proteomes of bacterial and viral pathogens from several governmental agencies' lists of bio-threat agents. The genomes and tools are maintained up to date, with predictions being re-run every 3 months. The user can browse proteomes, or can blast sequences against MannDB to pull up related entries and associated data. MannDB provides a convenient source of automated sequence analyses and downloaded annotation information for whole proteomes of human pathogenic bacteria and viruses and has a high degree of integration with external databases.
MannDB provides sequence analysis information of primary interest to researchers in the bio-defense community. We have been using MannDB for several years to "annotate" DNA signatures  and more recently to assist collaborators in efforts to down-select from whole bacterial and viral genomes to identify suitable protein targets and protein features for driving the development of detection reagents . For example, a common requirement for a detection assay is that it be performed with minimal sample disruption. Therefore, an initial down selection for proteins expected to be on the surface of a bacterial particle might entail identification of proteins that are predicted to be secreted or membrane bound by using tools such as PSORT [23, 24], TMHMM , SignalP, TargetP , TopPred , and HMMTOP . Having results from several tools that provide similar predictions but using different algorithms or slightly different approaches allows us to compare predictions and make selections with greater confidence. Identification of surface features for targeting of detection reagents is done primarily by means of additional sequence- and structure-based analyses , although predictions pertaining to post-translational modifications (e.g., glycosylation, cleavage) are taken into consideration as they may affect protein recognition.
MannDB is a genome-centric database containing comprehensive automated sequence analysis predictions for protein sequences from organisms of interest to the bio-defense research community. Computational tools for the MannDB automated pipeline were selected based on customer needs in providing down selections from large sets of proteins (e.g., whole proteomes) to short lists of proteins most suitable for developing reagents to be used in field assays for detection of pathogens. For that reason we have focused our efforts on applying tools that would enable selection of proteins that meet assay requirements, such as cellular localization, that would assist in determining the value of a surface feature for targeting ligand binding, or that would identify antigenic sub-sequences of particular value in antibody development. As the goals of some of these assays have been to detect toxins or proteins associated with virulence, we constructed hard links between protein sequences in MannDB with entries in MvirDB in order to conveniently identify and characterize protein targets and features for these applications. We believe that MannDB will be of general use to the bio-defense and medical research communities as a resource for predictive sequence analyses and virulence information.
MannDB is freely accessible at http://manndb.llnl.gov/. Although the software that populates and updates MannDB is not open-source, the user may request collaborative sequence analysis services by contacting firstname.lastname@example.org.
Basic local alignment search tool.
Animal and Plant Health Inspection Service.
Centers for Disease Control and Prevention.
Health and Human Services.
United States Department of Agriculture.
United States Food and Drug Administration.
National Institute of Allergies and Infectious Diseases.
World Health Organization.
This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract no. W-7405-ENG-48 and was supported by funding from the Department of Homeland Security.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.