The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage
© Ganesan et al; licensee BioMed Central Ltd. 2008
Received: 29 March 2008
Accepted: 07 August 2008
Published: 07 August 2008
Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package.
The SeqWord Genome Browser (SWGB) was developed to visualize the natural compositional variation of DNA sequences. The applet is also used for identification of divergent genomic regions both in annotated sequences of bacterial chromosomes, plasmids, phages and viruses, and in raw DNA sequences prior to annotation by comparing local and global OU patterns. The applet allows fast and reliable identification of clusters of horizontally transferred genomic islands, large multi-domain genes and genes for ribosomal RNA. Within the majority of genomic fragments (also termed genomic core sequence), regions enriched with housekeeping genes, ribosomal proteins and the regions rich in pseudogenes or genetic vestiges may be contrasted.
The SWGB applet presents a range of comprehensive OU statistical parameters calculated for a range of bacterial species, plasmids and phages. It is available on the Internet at http://www.bi.up.ac.za/SeqWord/mhhapplet.php.
The study of genome OU signatures has a long history dating back to early publications by Karlin et al. who focused mainly on dinucleotide compositional biases and their evolutionary implications [1–3]. Statistical approaches of OU comparison were further advanced by Deschavanne et al., who applied chaos game algorithms ; and by Pride et al., who extended the analysis to tetranucleotides using Markov Chain Model simulations . Later, a number of practical tools for phylogenetic comparison of bacterial genomes [4, 6, 7], identification of horizontally transferred genomic islands [8–13] and assignment of unknown genomic sequences [14, 15] based on OU statistics became publicly available. These approaches exploited the notion that genomic OU composition was less variable within genomes rather than between them, regardless of which genomic regions had been taken into consideration . A general belief was that if a significant compositional difference was discovered in genomic fragments relative to the core genome, these loci most likely can be assigned to horizontally transferred genetic elements (transposons, prophages or integrated plasmids). This approach was criticized by several researchers [17, 18] who pointed out that codon bias and base composition are poor indicators of horizontal gene transfer. Therefore, there is a need for more informative parameters which also take into account higher order DNA variation. An overview of the current OU statistical methods based on di-, tetra- and hexanucleotides has been published recently. The conclusion of the review was that all methods were context dependent and, though being efficient and powerful, none of them were superior in all applications . Thus, the major motivation of our work was to develop more flexible and informative algorithms seamlessly integrating di- to heptanucleotides OU analysis for reliable identification of divergent genomic regions.
Recently we have introduced the concept of OU patterns into the literature . Each OU pattern is characterized by a number of OU statistical parameters namely, local pattern deviation (D), pattern skew (PS), relative variance (RV) and others (see Methods section). Novelties of the developed algorithms relative to other existing methods include the following: i) distances between patterns of different word length (from di- through to heptanucleotides) calculated for the same sequences are comparable; i.e. one may use longer word patterns to perform a large scale analysis and then switch to shorter word patterns for a more detailed view; ii) OU patterns calculated for sequences of different lengths are comparable provided that the length of the sequence is longer than the corresponding thresholds (specified in the Methods section); iii) alterations of OU patterns may be analyzed by different non-redundant parameters (D, PS and RV with different schemes of normalization by frequencies of shorter constituent words). Superimposition of these OU characteristics allows better discrimination of divergent genomic regions relative to other contemporary approaches .
Calculation of OU statistical parameters has been described previously [20, 21]. OU pattern was denoted as a matrix of deviations Δ[ξ 1...ξ N]of observed from expected counts for all possible words of length N:
Δ[ξ 1...ξ N]= (C[ξ 1...ξ N]|obs- C[ξ 1...ξ N]|e)/C[ξ 1...ξ N]|0 (1)
where ξn is any nucleotide A, T, G or C in the N-long word; C[ξ 1...ξ N]|obsis the observed count of the word [ξ1...ξN]; C[ξ 1...ξ N]|eis the expected count and C[ξ 1...ξ N]|0is a standard count estimated from the assumption of an equal distribution of words in the sequence: (C[ξ 1...ξ N]|0= L seq × 4-N).
Two approaches of normalization have been exploited where the F values were calculated for the complete sequence of a chromosome, plasmid, etc (generalized normalization) or for a given sliding window (local normalization). The normalization by equation 2 allows identification of words, frequencies of which cannot be predicted exactly by frequencies of shorter constituent words.
Application of ranks instead of relative oligonucleotide frequency statistics made the comparison of OU patterns less biased to the sequence length provided that the sequences are longer than the limits of 0.3, 1.2, 5, 18.5, 74 and 295 kbp for di-, tri-, tetra-, penta-, hexa- and heptanucleotides, respectively .
PS is a particular case of D where patterns i and j were calculated for the same DNA but for direct and reversed strands, respectively. Dmax = 4 N × (4 N - 1)/2 and Dmin = 0 when calculating a D or, in a case of PS calculation, Dmin = 4 N if N is an odd number or Dmin = 4 N - 2 N if N is an even number due to presence of palindromic words . Normalization of D-values by Dmax ensures that the distances between two sequences are comparable regardless of the word length of OU patterns.
where L seq is sequence length, and N is word length. Normalization of OU pattern variance by σ0 makes the variances comparable regardless of the word length of OU patterns and the sequence length. The regression equation was tested on 300 randomly generated sequences with an equiprobable occurrence of all 4 nucleotides by the DataFit 7.1.44 software.
The SWGB is coded in Java to be used as an applet in a Web-browser either on the Internet or locally (the programs OligoWords in Python and SeqWord_Viewer, which respectively calculate and visualize the OU patterns for DNA sequences, are available for download from the SWGB website). SWGB should run on any platform with a Java 1.5.x runtime environment or newer.
Sliding window size and OU pattern types (oligomer lengths) selected for sequences of different length present in the SeqWord database.
OU pattern type
> 2 Mbp
from 1 mbp to 2 Mbp
from 0.5 mbp to 1 Mbp
< 0.5 Mbp
User familiarity with the abbreviations of the various OU statistical parameters is important. Different types of OU patterns were abbreviated as type_N mer. Types might be "n0" for non-normalized, or "n1" for normalized by mononucleotide frequencies. For example, the non-normalized tetranucleotide usage pattern is denoted as n0_4mer; tetranucleotide usage pattern normalized by mononucleotide content is n1_4mer etc. The genomes in the SWGB database were analyzed by the following statistical parameters: D – distance between two patterns of the same type (in this work we used distances D between local patterns calculated for overlapping genome fragments and the global genome patterns calculated for the complete sequence – the local pattern deviation); PS – pattern skew, distance between the two patterns of the direct and reverse strands of the same DNA sequence; RV and GRV – oligonucleotide usage variances normalized locally and globally, respectively, and reduced to the OU variance expected for a randomly generated sequence (see Materials section); GC-content (GC) and GC-skew (GCS) in DNA fragments.
The applet GUI and database of pre-calculated OU patterns
The SeqWord Genome Browser (SWGB) applet is available via the Internet [22–24] and is mouse and menu driven. The Web-based applet is used to visualize DNA compositional variations in bacterial and viral genomes stored in the SeqWord database. Every genome in the database is represented by a set of statistical OU parameters (D, PS, GV, GRV, GC and GCS) calculated for genomic fragments, which were selected by a sliding window (sliding window length and step were set according to the total length of the sequence as demonstrated in Table 1). While in 70 to 99% of genomic fragments the OU compositional bias is similar to the complete genome OU pattern, some regions with atypical OU composition, however, are always present. Superimposition of different OU parameters allows discrimination of divergent genomic regions, as was published previously . Briefly: rRNA operons are characterized by extremely high PS and low RV; giant genes with multiple repeated elements have high or moderate PS and high RV; horizontally transferred genetic elements are characterized by increased divergence between RV and GRV accompanied by high D; and genes for ribosomal proteins show a moderate increase of D, PS and RV above genomic averages. Having analyzed 1243 sequences of different microorganisms including viruses and plasmids in the SeqWord database, we confirmed that the approaches we have developed and tested previously  (mainly on Pseudomonas putida KT2440 chromosomal DNA) are appropriate and useful for analysis of genomic sequences of other microorganisms and viruses.
The SWGB is basically comprised of two views, denoted by the 'Gene Map' and 'Diagram' tabs. The applet is instrumental for visualization of natural variation in DNA sequences by the interactive diagrams on the 'Gene Map' and 'Diagram' tabs. Users may save the current diagram in JPG format by using the 'Save picture' function in the 'File' menu.
The 'Zoom' function is straight-forward and allows users to control the amount of data viewed in the plot area. Clicking the 'Enter' button after setting the desired zoom value will then redraw the map. A 'Zoom into region' function under the 'Tools' drop-down menu allows users to zoom into exact genomic regions by merely entering their desired co-ordinates into the pop-up dialog box. The 'Tools' → 'Select region' menu item allows highlighting of selected regions without zooming. Use the option 'Clear ...' in the 'Tools' menu to undo zooming or highlighting. To locate a genomic region by gene, click the button 'Select Gene'. In the pop-up dialog box one may order the gene list by gene names, functionality or coordinates, then select a gene in the list and click 'OK'. When a gene annotation is not available, the values of the locus coordinates are used as a gene name. The applet window will be scrolled to the selected gene highlighted on the map (see Fig. 2).
The 'Diagram' tab allows flexible filtering of the underlying data based on the criteria chosen by users. Although the underlying data is pre-calculated, the user may, by simply changing selected parameters, generate very different images which give different insights into the natural genomic variation. To start with, the 'Diagram' view offers a bar chart or a dot-plot presentation of the pre-calculated data. To view a bar chart of the distribution statistics for a given OU parameter, select the desired parameters from the X or Y-axis drop-downs and click 'Enter'. The number of bars displayed can be adjusted using the '# Bars' selector.
Having set up the dot-plot, users will be able to identify divergent genomic regions (see next section). To retrieve annotations of genomic fragments corresponding to a group of dots, frame the dots of interest by clicking and dragging over the desired area. A selector frame then appears around the dots (Fig. 3). Clicking the 'Get' button displays the selected genomic fragments with their coordinates and gene annotations. Furthermore, identification and isolation of specific genomic regions may be improved significantly by filtering dots by OU parameters. The simplest way of filtering is by the third (Z axis) parameter. One may select an area on the colour bar to exclude all dots from the plot lying outside of the selected colour range (see an example in help files on-line). The hidden dots will not be selected by the 'Get' button. A more sophisticated way to filter genomic regions is provided by the 'Filter' button. An example will be discussed below.
Identification of divergent genomic islands
Several routines have been developed to identify the horizontally transferred genomic islands, genes for ribosomal RNA and proteins, non-functional pseudogenes and genes of other functional categories. All these routines are described in detail with illustrations in supplementary web-pages (use the 'Help' link in the applet window).
A known 40 kbp bacteriophage insertion [2586000–2626000] is, surprisingly, not among the genomic fragments selected in the SWGB using this filter. Although the prophage is still perceptible on the 'Gene Map' view (see a figure in the supplementary help web-pages), the OU parameters of the region do not differ markedly enough from the core sequence to be isolated automatically as a horizontally transferred region.
As the SWGB uses parameters that are based on comparison of local fragments to the global genomic average, strains with abundant insertions of homogenous DNA can confound this form of analysis. One example is the Methanosarcina acetivorans C2A genome which is composed of an estimated 25% of putatively horizontally acquired DNA, one of the highest amounts discovered to date . As a result of these insertions, the genomic signature has been strongly influenced, resulting in a large amount of scatter and a poorly defined core genome on the plots. On the other hand, this type of analysis allows estimation of genome stability in a simple, multi parameter view (see the Vibrio cholerae N16961-O1-eltor example in the online help files). To conclude, filtering provides a convenient way to automatically isolate divergent genomic regions of interest. However, some regions may erroneously remain undetected due to possible amelioration of older inserts  or a higher level of noise in unstable genomes. However, many problematic genomic fragments can in some cases be easily attributed to functional gene categories using the SWGB 'Diagram' window (see Fig. 2).
Methodologies for discovering long modular genes have already been discussed in a previous publication . Briefly, long genes display a particular tetranucleotide usage and can be discovered by plotting n0_4mer:D (X axis) versus n1_4mer:RV (Y axis). The positively linear correlated outlier fragments (towards the top right of the image) are often fragments of long genes with their characteristic repeats. An example using the gene encoding the 1.12 megadalton cell surface protein of Synechococcus sp. WH8102  marked on the dot-plot is shown in Fig. 4. Ribosomal RNA operons (but not genes for ribosomal proteins) are characterized by extremely high pattern skew and a large distance from the core genome (Fig. 2). Thus, there is a tendency to find many genomic fragments containing rRNA genes coloured dark brown to red in the bottom right section of the 'Diagram' tab. The annotation for rRNA operons is not present in the database; therefore, these are seen in the 'Gene Map' tab as unannotated areas with high pattern skew (Fig. 2). Ribosomal proteins tend to be increasingly present at a slightly greater than average RV and above average D (see Fig. 2), which is in agreement with observations that highly expressed genes for ribosomal proteins have a highly specific codon usage compared to housekeeping genes of the organism . The majority of genomic fragments form a cluster characterized by average and higher than average RV, stable OU patterns (low D) and low PS. These tend to be the core, or bulk genes and genomic regions with their typical tetranucleotide usage. Some other core sequence fragments spread from this area toward lower RV and less specific OU patterns (higher D and PS) – these are all characteristics of an unstable or randomly generated sequence . These regions were found to be enriched with many hypothetical genes, prophages and transposons. (The data is not shown but is easily verified with any genome using the 'Get' button. Consider, for example, this area in the pseudogene rich Mycobacterium leprae TN or Methanosarcina acetivorans C2A genomes [11, 31], and the relatively homogenous Alcanivorax borkumensis SK2 genome .) These regions were thus categorized as rich in 'vestigial' genes in contrast to the core genome regions rich in housekeeping genes (Fig. 4).
It must be stressed that with an average length of genes being around 1 kbp and overlapping sliding windows of 8 kbp, one cannot expect precise separation of housekeeping and vestigial genes by the method described above. However, when analyzing an unknown DNA sequence prior to annotation, it may be helpful to identify genomic regions enriched with a higher proportion of these so called housekeeping genes and other regions rich in vestigial genes. These tentative results should be verified with other complementary algorithms such as BLAST, gene finding and annotation techniques.
The SWGB applies novel OU statistics to visualize and discern divergent genomic regions. It has been extensively tested in practice for large scale genome analysis [32, 34], and for identification and comparison of horizontally transferred genomic islands . The applet is linked to a database of pre-calculated OU patterns of bacterial genomes (1243 complete sequences, including bacterial chromosomes, plasmids and some viruses were available at the time of manuscript submission, however, new sequences are regularly being added). The SWGB allows tentative annotation of the various divergent regions and provides overviews for use in comparative genomics. Users may download the command line version of the OligoWords program to analyze their own sequences. A packaged version of the SWGB allows users to view and manipulate their OligoWords results locally using a compatible web-browser.
Although there are several readily available tools for DNA compositional analysis, genomic island identification and large scale genome analysis [36–42], the SWGB surpasses previous approaches in making use of a wider range of parameters which allow identification of divergent genomic regions and even visual tentative attribution of these DNA fragments to various categories. We have found superimposition of these parameters to be more informative than a simple GC average or a relative OU frequency deviation since they allow discrimination of divergent genomic regions (large modular genes, ribosomal RNAs, ribosomal protein clusters and the horizontally transferred genomic islands, see Fig. 2) all of which are characterized by an alternative OU composition relative to the core sequence. In addition, our approach provides some insight into the physicochemical state of the analysed DNA and the stability/state of flux of a genome as tetranucleotides exert a strong structural signal [20, 21]. Consideration of flux inferred oligonucleotide usage is particularly interesting when comparing, for example, multiple replicating units of the same strain. Using the simple analysis described here, the second chromosome of Vibrio cholerae N16961-O1-eltor was demonstrated to be far less conserved than the first, with differences in mononucleotide content and distance from core genomic values implying a more heterogenous chromosome consistent with its role as a gene capture system .
Furthermore, no single oligonucleotide word size has been found to be optimal for all purposes, such as finding conserved or horizontally transferred DNA, plasmid host comparisons or testing distant homology . The SWGB crucially provides the opportunity to analyse DNA sequences with various oligomer lengths and normalisation schemes. For example, genomic regions of particular interest may be multiply analysed with progressively smaller oligomer sizes to provide more detailed information on oligomer usage in individual genes.
Availability and requirements
The SWGB applet is freely available to any researcher wishing to use it for non-commercial purposes via the Internet [22–24]. It has been tested on openSUSE 10.2, Gentoo Linux 2.6, Fedora Core 5 and Microsoft Windows XP workstations using Microsoft Internet Explorer 6.0, Maxthon 1.5.9, Mozilla Firefox 2.0, Mozilla SeaMonkey 1.1.1, Safari 3.0.4 for Mac, Konqueror 3.5.5 and 3.5.7, and Opera 9.10 browsers with Java 1.5. At the time of manuscript submission, a problem likely related to the local firewall was encountered with the Firefox browser on SUSE 10.2 and some other browsers (see 'Compatibility' link on the SWGB front page). The problem will be tackled in later releases of the SWGB. Feedback from users (addressed to the corresponding author) is very much appreciated.
SeqWord Genome Browser
distance between two oligonucleotide usage patterns
oligonucleotide usage pattern skew
relative variance of the oligonucleotide usage
globally normalized relative variance of the oligonucleotide usage
guanine + cytosine content
guanine versus cytosine skew in DNA strands.
This work was funded by the National Bioinformatics Network of South Africa http://www.nbn.ac.za. CFD is supported by the Deutsche Forschungsgemeinschaft grant DFG 653/3. The authors are thankful to Prof. Anton Nekrutenko from Penn State University (Pennsylvania, USA) for hosting the SeqWord applet at http://seqword.bx.psu.edu/.
- Karlin S, Burge C: Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 1995, 11(7):283–290. 10.1016/S0168-9525(00)89076-9View ArticlePubMedGoogle Scholar
- Karlin S, Mrázek J, Campbell A: Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol 1997, 179(12):3899–3913.PubMed CentralPubMedGoogle Scholar
- Karlin S: Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol 1998, 1: 598–610. 10.1016/S1369-5274(98)80095-7View ArticlePubMedGoogle Scholar
- Deschavanne PJ, Giron A, Vilain J, Fagot G, Fertil B: Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol Biol Evol 1999, 16(10):1391–1399.View ArticlePubMedGoogle Scholar
- Pride DT, Meinersmann RJ, Wassenaar TM, Blaser MJ: Evolutionary implications of microbial genome tetanucleotide frequency biases. Genome Res 2003, 13: 145–155. 10.1101/gr.335003PubMed CentralView ArticlePubMedGoogle Scholar
- Coenye T, Vandamme P: Use of the genomic signatures in bacterial classification and identification. System Appl Microbiol 2004, 27: 175–185. 10.1078/072320204322881790View ArticleGoogle Scholar
- van Passel MW, Bart A, Luyf AC, van Kampen AH, Ende A: The reach of the genome signature in prokaryotes. BMC Evol Biol 2006, 6: 84. 10.1186/1471-2148-6-84PubMed CentralView ArticlePubMedGoogle Scholar
- Mrázek J, Karlin S: Detecting alien genes in bacterial genomes. Ann NY Acad Sci 1999, 870: 314–329. 10.1111/j.1749-6632.1999.tb08893.xView ArticlePubMedGoogle Scholar
- Azad RK, Lawrence JG: Use of artificial genomes in assessing methods for atypical gene detection. PLoS Comput Biol 2005, 1: e56. 10.1371/journal.pcbi.0010056PubMed CentralView ArticlePubMedGoogle Scholar
- Becq J, Gutierrez MC, Rosas-Magallanes V, Rauzier J, Gicquel B, Neyrolles O, Deschavanne P: Contribution of horizontally acquired genomic islands to the evolution of tubercle bacilli. Mol Biol Evol 2008, 24: 1861–1871. 10.1093/molbev/msm111View ArticleGoogle Scholar
- Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P: Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Res 2005, 33: e6. 10.1093/nar/gni004PubMed CentralView ArticlePubMedGoogle Scholar
- Nakamura Y, Itoh T, Matsuda H, Gojobori T: Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat Genet 2004, 36: 760–766. 10.1038/ng1381View ArticlePubMedGoogle Scholar
- Pride DT, Blaser MJ: Identification of horizontally acquired elements in Helicobacter pylori and other prokaryotes using oligonucleotide difference analysis. Genome Let 2002, 1: 2–15. 10.1166/gl.2002.003View ArticleGoogle Scholar
- Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T: Informatics for unveiling hidden genome signatures. Genome Res 2003, 13: 693–702. 10.1101/gr.634603PubMed CentralView ArticlePubMedGoogle Scholar
- Teeling H, Meyerdierks A, Bauer M, Amann R, Glöckner FO: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol 2004, 6: 938–947. 10.1111/j.1462-2920.2004.00624.xView ArticlePubMedGoogle Scholar
- Jernigan RW, Baran RH: Pervasive properties of the genomic signature. BMC Genomics 2002, 3: 23. 10.1186/1471-2164-3-23PubMed CentralView ArticlePubMedGoogle Scholar
- Koski LB, Morton RA, Golding GB: Codon bias and base composition are poor indicators of horizontally transferred genes. Mol Biol Evol 2001, 18: 404–412.View ArticlePubMedGoogle Scholar
- Wang B: Limitations of compositional approach to identify horizontally transferred genes. J Mol Evol 2001, 53: 244–250. 10.1007/s002390010214View ArticlePubMedGoogle Scholar
- Bohlin J, Skjerve E, Ussery D: Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes. BMC Genomics 2008, 9: 104. 10.1186/1471-2164-9-104PubMed CentralView ArticlePubMedGoogle Scholar
- Reva ON, Tümmler B: Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns. BMC Bioinformatics 2004, 5: 90. 10.1186/1471-2105-5-90PubMed CentralView ArticlePubMedGoogle Scholar
- Reva ON, Tümmler B: Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 2005, 6: 251. 10.1186/1471-2105-6-251PubMed CentralView ArticlePubMedGoogle Scholar
- SWGB mirror site at the University of Pretoria in South Africa[http://www.bi.up.ac.za/SeqWord/mhhapplet.php]
- SWGB mirror site at Hannover Medical School in Germany[http://genomics1.mh-hannover.de/seqword/genomebrowser/mhhapplet.php]
- SWGB mirror site at Penn State University in the USA[http://seqword.bx.psu.edu/mhhapplet.php]
- Reva O, Tümmler B: Oligonucleotide usage signatures of the Pseudomonas putida KT2440 genome. In Pseudomonas: Genomics and Molecular Biology. Volume Chapter 3. Edited by: Cornelis P. Caister Academic Press; 2008:43–64.Google Scholar
- Weinel C, Nelson KE, Tümmler B: Global features of the Pseudomonas putida KT2440 genome sequence. Environ Microbiol 2002, 4: 809–818. 10.1046/j.1462-2920.2002.00331.xView ArticlePubMedGoogle Scholar
- Lawrence JG, Ochman H: Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 1997, 44: 383–397. 10.1007/PL00006158View ArticlePubMedGoogle Scholar
- Reva O, Tümmler B: Think big – giant genes in bacteria. Environ Microbiol 2008, 10: 768–777. 10.1111/j.1462-2920.2007.01500.xView ArticlePubMedGoogle Scholar
- McCarren J, Brahamsha B: SwmB, a 1.12-megadalton protein that is required for nonflagellar swimming motility in Synechococcus . J Bacteriol 2007, 189: 1158–1162. 10.1128/JB.01500-06PubMed CentralView ArticlePubMedGoogle Scholar
- Puigbò P, Romeu A, Garcia-Vallvé S: HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection. Nucleic Acids Res 2008, 36: D524-D527. 10.1093/nar/gkm831PubMed CentralView ArticlePubMedGoogle Scholar
- Lawrence JG, Hendrix RW, Casjens S: Where are the pseudogenes in bacterial genomes? Trends Microbiol 2001, 9: 535–540. 10.1016/S0966-842X(01)02198-9View ArticlePubMedGoogle Scholar
- Reva ON, Hallin PF, Willenbrock H, Sicheritz-Ponten T, Tümmler B, Ussery DW: Global features of the Alcanivorax borkumensis SK2 genome. Environ Microbiol 2008, 10: 614–625. 10.1111/j.1462-2920.2007.01483.xView ArticlePubMedGoogle Scholar
- SWGB FTP site[ftp://milliways.bi.up.ac.za/SeqWord/GenomeBrowser/]
- Chen XH, Koumoutsi A, Scholz R, Eisenreich A, Schneider K, Heinemeyer I, Morgenstern B, Voss B, Hess WR, Reva O, Junge H, Voigt B, Jungblut PR, Vater J, Süssmuth R, Liesegang H, Strittmatter A, Gottschalk G, Borriss R: Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol 2007, 25: 1007–1014. 10.1038/nbt1325View ArticlePubMedGoogle Scholar
- Klockgether J, Reva O, Tümmler B: Spread of genomic islands between clinical and environmental isolates. In Prokaryotic Diversity: Mechanisms and Significance. Edited by: Logan NA, Lapping-Scott HM, Oyston PCF. Cambridge: Cambridge University Press; 2006:187–200.View ArticleGoogle Scholar
- δρ-WEB and Compare_Islands[http://deltarho.amc.nl/cgi-bin/bin/index.cgi]
- The Horizontal Gene Transfer Database (HGT-DB)[http://www.tinet.org/~debb/HGT/]
- Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R: Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 2006, 7: 142. 10.1186/1471-2105-7-142PubMed CentralView ArticlePubMedGoogle Scholar
- Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Umayam L, Gill SR, Nelson KE, Read TD, Tettelin H, Richardson D, Ermolaeva MD, Vamathevan J, Bass S, Qin H, Dragoi I, Sellers P, McDonald L, Utterback T, Fleishmann RD, Nierman WC, White O, Salzberg SL, Smith HO, Colwell RR, Mekalanos JJ, Venter JC, Fraser CM: DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae . Nature 2000, 406: 477–483. 10.1038/35020000View ArticlePubMedGoogle Scholar
- NCBI, Complete Microbial Genomes[http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.