- Open Access
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data
© Duncan et al; licensee BioMed Central Ltd. 2010
- Received: 8 October 2009
- Accepted: 22 February 2010
- Published: 22 February 2010
New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available.
To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses.
DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge.
- Basic Local Alignment Search Tool
- Genome Sequence Data
- Integrate Microbial Genome
- Reverse Vaccinology
- Strain Information
DNA sequencing technology using chain-terminating dideoxy nucleoside triphosphates, first developed by Frederick Sanger [1, 2], has remained the mainstay of genome sequencing efforts for more than thirty years. However, recently developed, new massively parallel DNA sequencing platforms are now extensively used to generate sequence data at a fraction of the cost and labor required by Sanger technology. Three "next generation" sequencing systems that are currently commercially available include the Roche/454 Genome Sequencer , Illumina/Solexa Genome Analyzer II [4, 5] and Applied Biosystems SOLiD System . In addition, commercial release of two additional platforms, including the Helicos Heliscope and the Pacific Biosmart SMRT, are planned for 2010 .
Collectively, these systems, with their high depth of coverage and relatively low costs, have allowed individual researchers to initiate genome sequencing projects that were previously available to only large genome centers [8–10]. The enhanced sequencing capability afforded by next-generation sequencing has had an especially significant impact on bacterial genomics. By facilitating genome sequencing of multiple isolates of the same bacterial species, several examples of extensive intraspecies genotypic heterogeneity have been revealed, leading to a revision of many long-standing views of microbial speciation [11–14]. One of the first such studies revealed significant genetic variability among eight different strains of Streptococcus agalactiae, group B Streptococcus (GBS) . After performing cross strain comparisons Tettelin et al. found a considerable number of genes not shared among the strains. Their discovery led to the proposal of the bacterial "pan-genome", defined as the global gene repertoire of a bacterial species comprised of the core genome (the set of genes shared by all the strains of the same bacterial species), the dispensable genome (the set of genes present in some but not all of the strains) and the strain specific genes (the set of genes found only in a single strain) . Genome heterogeneity has also been noted for species of Helicobacter pylori, Staphylococcus aureus, and Escherichia coli [13, 15, 16]. As noted by Muzzi et al., comparative genomics of bacterial species has important implications for vaccine development and discovery of novel antimicrobials . Other novel applications for next generation sequencing technologies have also been developed, including bacterial metagenomics [18–20], and transcriptome mapping [21–24].
Despite the potential for new insights into bacterial diversity and function, important challenges continue to include the organization, management and analysis of genome sequencing data. To address the need for tools for querying, analyzing and comparing multiple genomes of related species, several databases and software tools have been developed , including the Integrated Microbial Genomes (IMG) system [26, 27], Integrated Microbial Genomes-Expert Review (IMG ER) system , GenColors [29, 30], the Microbial Genome Database (MBGD) [31, 32], the Comprehensive Microbial Resource (CMR)  and the EDGAR software .
The IMG system contains complete and draft microbial genome sequence data generated by the Joint Genomes Institute (JGI) as well as other publicly available genome data not limited to microorganisms. Tools provided through IMG allow users to query, view and perform comparative analysis of genomes, genes and functions. Recently, a new version of IMG called IMG ER has been added to the IMG system. Tools available through IMG ER allow users to analyze and curate annotated microbial genome data whether it is unpublished or published. Although IMG ER allows users to upload their genome sequencing data for curation and analysis, it is not available for download and in-house use. The GenColors software allows users to browse, analyze and compare genome information from complete and ongoing genome projects related to prokaryotic or eukaryotic genomes. Additionally, GenColors may be used for the purpose of annotation in the case of incomplete projects. The CMR software contains sequence and annotation data for all of the current publicly available completed microbial genomes and provides a variety of comparison tools for the analysis of the multiple genomes including cross-genome analysis capabilities. Currently, however, there is no functionality that allows users to submit genome data for use with CMR. Similar to CMR, MBGD provides users with several tools for the comparison and analysis of complete bacterial genomes. Unlike CMR, MBGD contains a newly added feature called MyMBGD that allows users to add their own genome data to MBGD. The EDGAR software has recently been released and includes comparative analysis tools for the comparison of multiple strains of a given species. EDGAR offers similar capabilities to those found in CMR and MBGD, in addition to features such as phylogenetic analyses and visualization capabilities including Venn diagrams and synteny plots.
While the aforementioned systems include data management and analysis functionalities there are limitations. For example, genome projects that include proprietary data may be restricted in the submission of the data to third party software. Many of the current data management software tools are not available for download and in-house use, a requirement when access to next generation sequencing instruments can outstrip the availability of experienced bioinformaticians to assist with data management and analysis.
In addition to the already mentioned software applications, there are other tools that are designed for genome annotation or re-annotation of unpublished or published genomes [25, 35, 36]. Several of these tools provide data curation capabilities for the purpose of correcting annotation errors and improving annotated data but are restricted to use with the annotated data generated through specified software packages. Additionally, as with many software applications, they require the researcher to develop a working knowledge of the analysis capabilities of the software as well as provide "expert" curation of the data. With the increased use of next-generation sequencing in academic, industrial and government settings, however, biologists do not always have immediate access to computational support needed to easily manage the data and to initiate comparative analysis.
To overcome some of these limitations, DraGnET was developed specifically to provide biologists with their own web based tool that is both convenient and easy to use. DraGnET allows researchers to independently store, retrieve and curate their own data generated from any annotation engine and to perform genome comparisons during the beginning phase of a sequencing project. Additionally, publicly available genome data can be stored for the purpose of comparing draft genome data with reference genomes. DraGnET includes provisions for data access, searching, and modification as well as access to basic local alignment search tool (BLAST) functionalities  for amino acid sequence similarity searches and cross strain comparisons. As a consequence, DraGnET allows investigators to immediately begin testing of biologically relevant hypotheses without having to devote time to learning sophisticated analysis programs or to depend on computational support from designated personnel. Additionally, the DraGnET source code has been made available, allowing researchers to further customize and develop the software to meet the needs of specific sequencing projects.
To demonstrate the utility of DraGnET, we have successfully established a DraGnET project, deployed for Internet access, and performed preliminary cross strain comparisons to identify potential vaccine targets against the animal pathogen Haemophilus parasuis. Microbial genome sequencing has proven to be a powerful approach to identify new, protective vaccines via reverse vaccinology, i.e., discovery of vaccine targets by scanning sequence data for potential surface-exposed antigens . Moreover, broadly protective antigens may be identified by comparison of genomes from multiple strains of a single species [17, 39, 40]. Reverse vaccinology has led to the development of new vaccines for several human and animal pathogens where previously vaccines were not available [41–44]. DraGnET enables facile preliminary comparisons of multiple draft or complete genome sequences of any number of organisms, including identification of protein encoding genes shared by multiple strains, making DraGnET a useful bioinformatic tool.
DraGnET project setup
DraGnET is an open source web application designed to provide researchers with a tool for storing their own unpublished annotated draft and complete genome data from multiple strains of a species in a database; allowing gene and strain information to be available for retrieving, searching, modifying and downloading. The application also provides a web interface for the use of BLAST, allowing for protein sequence similarity searches and cross strain comparisons of strains stored in the database. In addition, DraGnET provides a link for the automatic generation of FASTA files for each genome stored in the database. The files are available for download and can be used with other software and tools for further analysis. The details of the functionalities of DraGnET are provided in the following section.
DraGnET is set up to allow any user to search, view and compare genome sequence data stored in the database; however, only curators may insert and modify the data by signing in to the application. This was designed to prevent inconsistencies in the data and to protect the application when it is being accessed by multiple users from different locations.
Formatting BLAST database files
BLAST functionalities for sequence similarity searches and cross strain comparisons are provided through DraGnET web-interfaces. To use these functionalities, BLAST database files for each strain stored in the database must be created through command line arguments. The command used to format BLAST database files is the similar for each strain stored in the database, having to change only the FASTA file used for BLAST database file generation. Details of this process are included in the DraGnET setup package. Once the BLAST databases are created, all BLAST functionalities offered with DraGnET are available for use.
The following functionalities are implemented through the web interface and are available for all users.
Quick and Advanced Search
Batch BLAST Search
Batch BLAST Dissimilarity Search
"Batch BLAST Dissimilarity Search" takes as input the results of "Batch BLAST Search" and extracts the gene identifiers associated with protein sequences that produced a "no hits found" result. The resulting set of genes identifiers represents genes that have no protein sequence homology to any sequences found in the selected search database. Results are written to a text file.
Generate FASTA Files
The "Generate FASTA Files" option automatically generates FASTA files for each strain stored in the database. When users click on the "Generate FASTA Files" button a set of files in FASTA format, one for each strain, will be available to download. Subsequently, the files can be used with other publicly available comparative analysis software tools or they can be saved as text files for use with "Batch BLAST Search".
Case study: Haemophilus parasuis genome data
To demonstrate the functionalities of DraGnET we used the web application to store genomic data from three strains of Haemophilus parasuis, two draft genomes (strains 29755 and 12939) and a complete reference genome (strain SH0165) , and to perform preliminary cross strain comparisons to identify protein products common to each strain. H. parasuis is a bacterial pathogen that causes severe respiratory disease in swine and vaccines effective against multiple isolates are lacking . Since outer membrane proteins, including lipoproteins, that are shared among the H. parasuis strains represent potential broadly protective antigens, identifying common genes is a first step toward vaccine development. Draft genome sequence data for strains 29755 and 12939 were generated using the Illumina/Solexa Genome Analyzer II platform (G. Phillips, D. Dyer, and K. Register, unpublished data). The genomes were assembled using SH0165 as a reference genome using NextGene software (State College, PA). Annotation was performed through the Institute for Genome Sciences (IGS) Annotation Engine offered by the University of Maryland, School of Medicine.
Initially, annotated genome sequence data representing the three strains were formatted for use with DraGnET by conversion to semicolon-separated files. Subsequently, information for each strain was entered and the corresponding file was uploaded and populated in the database using the application's web interface (Figure 4). Once in the database, the annotated data was available for searching and modifying. As shown in Figure 7, "Quick Search" was used to search for information related to the gene identifier "hph_875"; which returned a table with the annotated gene information. Data modification is an important functionality provided through DraGnET, especially in the case of draft genome data. To demonstrate this capability, gene information related to gene identifier "hph_1391" was selected for updating. As shown in Figure 6, the following gene attributes were selected and modified: gene description, localization, and signal sequence. Once submitted, all modifications to the data were confirmed using "Quick Search". DraGnET provides additional functionalities for preliminary analysis of draft and complete genome data. To identify protein products common to all three of the H. parasuis strains, "Advanced Search" and BLAST functionalities provided through the DraGnET interface were used to perform preliminary cross strain comparisons. This demonstrates the DraGnET application is ideally suited for smaller companies or academic labs that are just beginning to use next-generation sequencing for vaccine development.
While data from genome sequencing projects typically become publicly available through sequence repositories, the rate at which large-scale sequences information is being generated and subsequent analysis will, in many cases, delay public availability of the data. In addition, sequencing projects where proprietary data are generated are limited as to how the information can be managed and analyzed until it is ready for public reposition. This limitation emphasizes the need for software applications that provide researchers with in-house data management and analysis capabilities. While some of the features of DraGnET are provided with other applications, our software provides a user friendly in-house web application that enables researchers to manage their own unpublished or proprietary annotated draft genome data at the initial stage of development without having prior knowledge of query languages necessary for data storage, retrieval and update.
Additional features of the application include BLAST capabilities and the automatic generation of FASTA files from protein sequence data stored in the database. A web interface is provided for use of stand-alone BLAST alleviating the requirement to perform searches through command line and allowing users to search against a single strain or multiple strains as well as perform cross strain comparisons once the BLAST database files are created. DraGnET was designed to store and compare different strains from the same species; however the web interface design is generic enough to accommodate multiple organisms and their related strains. Additionally, the DraGnET software can be further developed to customize the program for specific needs.
As demonstrated in the case study, DraGnET provides researchers with an application that can be used as a first step toward data curation and analysis. Subsequently, after the data are made publicly available, more comprehensive analysis may be performed, for example by any of the aforementioned analysis software. Alternatively, the sequence data can continue to be analyzed using in-house programs, including annotation and BLAST comparisons [37, 48].
Currently gene attributes selected for storage and use with DraGnET are fixed. Further development of DraGnET will include the storage of more comprehensive annotation data as well as more advanced functionalities for comparative analysis.
DraGnET currently contains draft and complete genome data from three strains of H. parasuis made available for collaborative research efforts. Readers are encouraged to visit the DraGnET website located at http://www.dragnet.cvm.iastate.edu and examine the functionalities of the software.
New genome sequencing methods now allow multiple draft genomes to be generated, assembled, and annotated at an unprecedented rate at modest expense. Following sequencing, assembly and annotation, there is an immediate need for the data to be organized, stored, curated and formatted for comparative analysis. The DraGnET software is an ideal in-house tool that allows i.) storage and integration of annotated data generated from different annotation platforms in a database, ii.) retrieval of gene and strain information based upon basic or advanced search parameters, iii.) management of gene and strain information, iv.) generation of FASTA formatted files for all strains stored in the database, v.) sequence similarity searches using BLAST, vi.) Batch BLAST searches for cross strain comparisons and vii.) retrieval of strain specific genes based upon Batch BLAST results. The application allows for the setup of individual projects used on local machines or may be deployed through Internet (or Intranet) access for use by other researchers across different locations. To demonstrate this, we setup a DraGnET project, deployed it for Internet access, and identified potential vaccine targets in multiple strains of H. parasuis using preliminary cross strain comparisons.
Project name: Draft Genome Evaluation Tool (DraGnET)
Project home page: http://www.dragnet.cvm.iastate.edu
Operating system(s): Microsoft Windows 2003 and Windows XP
Programming language: Java
Other requirements: JRE 1.6.0, MySQL 5.0, MySQL 5.0GUI Tools, Apache Tomcat 6.0 and Blast 2.2.18
License: GNU GPL
The authors thank Fadi Towfic for helpful suggestions and Josh Mack for technical support. The authors also thank Dr. Michelle Giglio for support during the annotation process performed through the Institute for Genome Sciences (IGS) Annotation Engine offered by the University of Maryland, School of Medicine http://ae.igs.umaryland.edu/cgi/index.cgi. This work was funded in part by the Iowa Healthy Lifestock Initiative and the National Pork Board.
- Sanger F, Coulson AR: A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol 1975, 94: 441–448. 10.1016/0022-2836(75)90213-2View ArticlePubMedGoogle Scholar
- Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977, 74: 5463–5467. 10.1073/pnas.74.12.5463View ArticlePubMedPubMed CentralGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437: 376–380.PubMedPubMed CentralGoogle Scholar
- Fedurco M, Romieu A, Williams S, Lawrence I, Turcatti G: BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res 2006, 34: e22. 10.1093/nar/gnj023View ArticlePubMedPubMed CentralGoogle Scholar
- Turcatti G, Romieu A, Fedurco M, Tairi AP: A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res 2008, 36: e25. 10.1093/nar/gkn021View ArticlePubMedPubMed CentralGoogle Scholar
- Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate multiplex polony sequencing of an evolved bacterial genome. Science 2005, 309: 1728–1732. 10.1126/science.1117389View ArticlePubMedGoogle Scholar
- Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008, 9: 387–402. 10.1146/annurev.genom.9.081307.164359View ArticlePubMedGoogle Scholar
- Marguerat S, Wilhelm BT, Bahler J: Next-generation sequencing: applications beyond genomes. Biochem Soc Trans 2008, 36: 1091–1096. 10.1042/BST0361091View ArticlePubMedPubMed CentralGoogle Scholar
- Schuster SC: Next-generation sequencing transforms today's biology. Nat Methods 2008, 5: 16–18. 10.1038/nmeth1156View ArticlePubMedGoogle Scholar
- Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol 2008, 26: 1135–1145. 10.1038/nbt1486View ArticlePubMedGoogle Scholar
- Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-genome. Curr Opin Genet Dev 2005, 15: 589–594. 10.1016/j.gde.2005.09.006View ArticlePubMedGoogle Scholar
- Field D, Wilson G, Gast C: How do we compare hundreds of bacterial genomes? Curr Opin Microbiol 2006, 9: 499–504. 10.1016/j.mib.2006.08.008View ArticlePubMedGoogle Scholar
- Fukiya S, Mizoguchi H, Tobe T, Mori H: Extensive genomic diversity in pathogenic Escherichia coli and Shigella Strains revealed by comparative genomic hybridization microarray. J Bacteriol 2004, 186: 3911–3921. 10.1128/JB.186.12.3911-3921.2004View ArticlePubMedPubMed CentralGoogle Scholar
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, et al.: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci USA 2005, 102: 13950–13955. 10.1073/pnas.0506758102View ArticlePubMedPubMed CentralGoogle Scholar
- Bjorkholm B, Lundin A, Sillen A, Guillemin K, Salama N, Rubio C, Gordon JI, Falk P, Engstrand L: Comparison of genetic divergence and fitness between two subclones of Helicobacter pylori. Infect Immun 2001, 69: 7832–7838. 10.1128/IAI.69.12.7832-7838.2001View ArticlePubMedPubMed CentralGoogle Scholar
- Fitzgerald JR, Sturdevant DE, Mackie SM, Gill SR, Musser JM: Evolutionary genomics of Staphylococcus aureus: insights into the origin of methicillin-resistant strains and the toxic shock syndrome epidemic. Proc Natl Acad Sci USA 2001, 98: 8821–8826. 10.1073/pnas.161098098View ArticlePubMedPubMed CentralGoogle Scholar
- Muzzi A, Masignani V, Rappuoli R: The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials. Drug Discov Today 2007, 12: 429–439. 10.1016/j.drudis.2007.04.008View ArticlePubMedGoogle Scholar
- Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science 2006, 312: 1355–1359. 10.1126/science.1124234View ArticlePubMedPubMed CentralGoogle Scholar
- Leininger S, Urich T, Schloter M, Schwark L, Qi J, Nicol GW, Prosser JI, Schuster SC, Schleper C: Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 2006, 442: 806–809. 10.1038/nature04983View ArticlePubMedGoogle Scholar
- Wegley L, Edwards R, Rodriguez-Brito B, Liu H, Rohwer F: Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ Microbiol 2007, 9: 2707–2719. 10.1111/j.1462-2920.2007.01383.xView ArticlePubMedGoogle Scholar
- Bainbridge MN, Warren RL, Hirst M, Romanuik T, Zeng T, Go A, Delaney A, Griffith M, Hickenbotham M, Magrini V, et al.: Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics 2006, 7: 246. 10.1186/1471-2164-7-246View ArticlePubMedPubMed CentralGoogle Scholar
- Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD: Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics 2006, 7: 272. 10.1186/1471-2164-7-272View ArticlePubMedPubMed CentralGoogle Scholar
- Torres TT, Metta M, Ottenwalder B, Schlotterer C: Gene expression profiling by massively parallel sequencing. Genome Res 2008, 18: 172–177. 10.1101/gr.6984908View ArticlePubMedPubMed CentralGoogle Scholar
- Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB: Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 2007, 144: 32–42. 10.1104/pp.107.096677View ArticlePubMedPubMed CentralGoogle Scholar
- Medigue C, Moszer I: Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 2007, 158: 724–736. 10.1016/j.resmic.2007.09.009View ArticlePubMedGoogle Scholar
- Markowitz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A, Zhao X, Dubchak I, Hugenholtz P, Anderson I, et al.: The integrated microbial genomes (IMG) system. Nucleic Acids Res 2006, 34: D344–348. 10.1093/nar/gkj024View ArticlePubMedPubMed CentralGoogle Scholar
- Markowitz VM, Szeto E, Palaniappan K, Grechkin Y, Chu K, Chen IM, Dubchak I, Anderson I, Lykidis A, Mavromatis K, et al.: The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res 2008, 36: D528–533. 10.1093/nar/gkm846View ArticlePubMedPubMed CentralGoogle Scholar
- Markowitz VM, Mavromatis K, Ivanova NN, Chen IM, Chu K, Kyrpides NC: IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 2009, 25: 2271–2278. 10.1093/bioinformatics/btp393View ArticlePubMedGoogle Scholar
- Romualdi A, Felder M, Rose D, Gausmann U, Schilhabel M, Glockner G, Platzer M, Suhnel J: GenColors: annotation and comparative genomics of prokaryotes made easy. Methods Mol Biol 2007, 395: 75–96.View ArticlePubMedGoogle Scholar
- Romualdi A, Siddiqui R, Glockner G, Lehmann R, Suhnel J: GenColors: accelerated comparative analysis and annotation of prokaryotic genomes at various stages of completeness. Bioinformatics 2005, 21: 3669–3671. 10.1093/bioinformatics/bti606View ArticlePubMedGoogle Scholar
- Uchiyama I: MBGD: microbial genome database for comparative analysis. Nucleic Acids Res 2003, 31: 58–62. 10.1093/nar/gkg109View ArticlePubMedPubMed CentralGoogle Scholar
- Uchiyama I: MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups. Nucleic Acids Res 2007, 35: D343–346. 10.1093/nar/gkl978View ArticlePubMedPubMed CentralGoogle Scholar
- Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O: The Comprehensive Microbial Resource. Nucleic Acids Res 2001, 29: 123–125. 10.1093/nar/29.1.123View ArticlePubMedPubMed CentralGoogle Scholar
- Blom J, Albaum SP, Doppmeier D, Puhler A, Vorholter FJ, Zakrzewski M, Goesmann A: EDGAR: a software framework for the comparative analysis of prokaryotic genomes. BMC Bioinformatics 2009, 10: 154. 10.1186/1471-2105-10-154View ArticlePubMedPubMed CentralGoogle Scholar
- Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res 2006, 34: 53–65. 10.1093/nar/gkj406View ArticlePubMedPubMed CentralGoogle Scholar
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al.: The RAST Server: rapid annotations using subsystems technology. BMC Genomics 2008, 9: 75. 10.1186/1471-2164-9-75View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
- Rappuoli R: Reverse vaccinology, a genome-based approach to vaccine development. Vaccine 2001, 19: 2688–2691. 10.1016/S0264-410X(00)00554-5View ArticlePubMedGoogle Scholar
- Serruto D, Serino L, Masignani V, Pizza M: Genome-based approaches to develop vaccines against bacterial pathogens. Vaccine 2009, 27: 3245–3250. 10.1016/j.vaccine.2009.01.072View ArticlePubMedGoogle Scholar
- Bambini S, Rappuoli R: The use of genomics in microbial vaccine development. Drug Discov Today 2009, 14: 252–260. 10.1016/j.drudis.2008.12.007View ArticlePubMedGoogle Scholar
- Maione D, Margarit I, Rinaudo CD, Masignani V, Mora M, Scarselli M, Tettelin H, Brettoni C, Iacobini ET, Rosini R, et al.: Identification of a universal Group B streptococcus vaccine by multiple genome screen. Science 2005, 309: 148–150. 10.1126/science.1109869View ArticlePubMedPubMed CentralGoogle Scholar
- Myers GS, Parker D, Al-Hasani K, Kennan RM, Seemann T, Ren Q, Badger JH, Selengut JD, Deboy RT, Tettelin H, et al.: Genome sequence and identification of candidate vaccine antigens from the animal pathogen Dichelobacter nodosus. Nat Biotechnol 2007, 25: 569–575. 10.1038/nbt1302View ArticlePubMedGoogle Scholar
- Pizza M, Scarlato V, Masignani V, Giuliani MM, Arico B, Comanducci M, Jennings GT, Baldi L, Bartolini E, Capecchi B, et al.: Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 2000, 287: 1816–1820. 10.1126/science.287.5459.1816View ArticlePubMedGoogle Scholar
- Al-Hasani K, Boyce J, McCarl VP, Bottomley S, Wilkie I, Adler B: Identification of novel immunogens in Pasteurella multocida. Microb Cell Fact 2007, 6: 3. 10.1186/1475-2859-6-3View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389View ArticlePubMedPubMed CentralGoogle Scholar
- Yue M, Yang F, Yang J, Bei W, Cai X, Chen L, Dong J, Zhou R, Jin M, Jin Q, Chen H: Complete genome sequence of Haemophilus parasuis SH0165. J Bacteriol 2009, 191: 1359–1360. 10.1128/JB.01682-08View ArticlePubMedPubMed CentralGoogle Scholar
- Rapp-Gabrielson VJ, Oliveira SR, Pijoan C: Haemophilus parasuis. In Diseases of Swine. 8th edition. Edited by: Straw BEJJZ, D'Allaire S, Taylor DJ. Ames, I.A.: Blackwell Publishing; 2006:475–481.Google Scholar
- Berriman M, Rutherford K: Viewing and annotating sequence data with Artemis. Brief Bioinform 2003, 4: 124–132. 10.1093/bib/4.2.124View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.