- Open Access
BorreliaBase: a phylogeny-centered browser of Borrelia genomes
BMC Bioinformatics volume 15, Article number: 233 (2014)
The bacterial genus Borrelia (phylum Spirochaetes) consists of two groups of pathogens represented respectively by B. burgdorferi, the agent of Lyme borreliosis, and B. hermsii, the agent of tick-borne relapsing fever. The number of publicly available Borrelia genomic sequences is growing rapidly with the discovery and sequencing of Borrelia strains worldwide. There is however a lack of dedicated online databases to facilitate comparative analyses of Borrelia genomes.
We have developed BorreliaBase, an online database for comparative browsing of Borrelia genomes. The database is currently populated with sequences from 35 genomes of eight Lyme-borreliosis (LB) group Borrelia species and 7 Relapsing-fever (RF) group Borrelia species. Distinct from genome repositories and aggregator databases, BorreliaBase serves manually curated comparative-genomic data including genome-based phylogeny, genome synteny, and sequence alignments of orthologous genes and intergenic spacers.
With a genome phylogeny at its center, BorreliaBase allows online identification of hypervariable lipoprotein genes, potential regulatory elements, and recombination footprints by providing evolution-based expectations of sequence variability at each genomic locus. The phylo-centric design of BorreliaBase (http://borreliabase.org) is a novel model for interactive browsing and comparative analysis of bacterial genomes online.
Spirochetes belonging to the genus Borrelia are obligate, tick- or louse-borne parasites of vertebrates and cause zoonotic infections in humans worldwide . Phylogeny based on multiple genomic loci has confirmed the presence of two major Borrelia lineages, one represented by B. burgdorferi, the agent of Lyme borreliosis (LB), and the other by B. hermsii, the agent of tick-borne relapsing fever (RF) . Members of the Borrelia appear to have the most complex genomic architecture among known prokaryotes. Sequencing of the first Borrelia genome revealed a 910-kilobase linear chromosome and twenty-one linear and circular plasmids with a total length of 610 kilobases in B. burgdorferi B31, a tick isolate from New York, USA [3, 4]. Subsequent sequencing efforts showed that the linear chromosome is conserved and co-linear between the LB and RF groups [5–13]. A group of 30-kilobase circular plasmids appear to be conserved and syntenic across the entire genus . A large ~160-kilobase linear plasmid specific to the RF group shares considerable syntenic regions with the lp54 plasmids of the LB group due to lateral transfer . The linear plasmid lp23 in two RF group genomes is syntenic to the cp26 plasmids in LB group genomes . Within the B. burgdorferi sensu lato species group (the LB group), a linear plasmid lp54 and a circular plasmid cp26 are universally present and largely syntenic among all species while the presence and the gene order of other plasmids are much less conserved [16, 17]. Nevertheless, the total gene complement is quite stable within the LB group Borrelia[16, 18].
The number of sequenced Borrelia genomes is rising rapidly. At the time of writing, the Bacterial Bioinformatics Resource Center (the PATRIC database, http://wwww.patricbrc.org) lists thirty-two completed and draft Borrelia genomes . These genomic sequences represent twenty-eight LB Borrelia strains and six RF Borrelia strains, covering eight recognized or proposed species of the LB Group including B. burgdorferi sensu stricto, B. afzelii, B. garinii, B. bavariensis, B. spielmanii, B. valaisiana, B. bissettii, and B. finlandensis, and five species of the RF group including B. recurrentis, B. duttonii, B. crocidurae, B. hermsii, and B. turicatae. The chromosome of B. miyamotoi, an RF species, has also recently been sequenced . A search on the GOLD genome-project registry (http://genomesonline.org/) yielded 47 completed, in-progress, and targeted Borrelia genome projects .
The rapid increase of sequenced Borrelia genomes allows for understanding Borrelia genome functions through comparative analysis, such as identification of host-interacting genes, regulatory elements, and pathways that are conserved among all species as well as species- and strain-specific genomic variations . The comparative analysis of Borrelia genomes is presently hindered by a lack of dedicated online portal for their archival, dissemination, and comparative analysis. The Spirochete Genome Browser (SGD, http://sgb.fli-leibniz.de) currently provides web-based tools for analyzing the genomes of four strains of LB-group Borrelia (PKo, ZS7, B31, and PBi), five strains of RF-group Borrelia (DAH, A1, 91E135, and Ly), three species of Leptospiro (L. biflexa, L. borgpetersenii, and L. interrogans), and two species of Treponema (T. deticola and T. pallidum). Web portals devoted to the comparative analysis of a single bacterial genus or species have been developed. A notable example is the Pseudomonas Genome Database, which facilitates population and evolutionary analysis of Pseudomonas genomes by providing up-to-date, computer-predicted orthologous gene sequences at both within- and between-species levels . GenoList, a comprehensive database of over 700 completed bacterial genomes, provides tools and information for comparative proteomics, such as identification of proteins specific to a bacterial group . At present, GenoList hosts genomic and proteomic information of three major Borrelia species causing Lyme disease (B. burgdorferi sensu stricto, B. garinii, and B. afzelii) and three Borrelia species causing relapsing fever (B. recurrentis, B. duttonii, and B. hermsii). The PATRIC database hosts a more comprehensive list of genomes of pathogenic bacterial genera including 34 genomes of Borrelia. Comparative genomic information is also available from OrthologueDB, another comprehensive database of orthologous genes, which uses a rigorous, phylogeny-based algorithm to improve predictions of orthology in completed bacterial and archaeal genomes . To the best of our knowledge, at present there is no genome database dedicated specifically to Borrelia or to providing comparative information such as sequences of orthologous ORFs in this genus.
We have recently released complete and draft genome sequences of twenty-one additional LB Group strains [10–13]. Population genomic analysis revealed genome-wide recombination among co-existing strains and adaptive genome diversification driven by antigenic variations within local Borrelia populations [17, 25]. Despite large variations in the plasmid composition and the gene order on the majority of plasmids, the total gene content is remarkably stable among LB-group genomes, suggesting that species divergence is driven mainly by copy-number and adaptive sequence variations of surface lipoprotein genes . Here we describe the design, content, and usage of BorreliaBase, a database developed during the course of the aforementioned studies. BorreliaBase is not only the first publicly available database dedicated to Borrelia genomes, it also implements a novel graphic user-interface design that encourages comparisons of bacterial genomes under a framework of their shared phylogenetic history.
Construction and content
BorreliaBase is currently populated with sequences of 35 genomes representing eight LB-group Borrelia species, including B. burgdorferi sensu stricto, B. garinii, B. afzelii, B. bavariensis, B. spielmanii, B. valaisiana, B. bissettii, and B. finlandensis, and seven RF-group Borrelia species, including B. miyamotoi, B. hermsii, B. parkeri, B. turicatae, B. crocidurae, B. recurrentis, and B. duttonii. B. burgdorferi sensu stricto, a major pathogenic species causing Lyme disease in North America and Europe, is represented by genomes of eleven North American and two European strains. Table 1 lists sequenced Borrelia genomes that are presently in the BorreliaBase.
While BorreliaBase includes basic genome annotations such as plasmid identity, ORF descriptions, and graphic maps of ORF locations, single-genome information is not the focus of BorreliaBase, considering that such information is available in primary or aggregate databases such as the NCBI Microbial Genomes website (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/microbial_taxtree.html), the UCSC Microbial Genome Browser (http://microbes.ucsc.edu/), and the PATRIC database (http://patricbrc.org). Instead, BorreliaBase focuses on providing high-quality comparative information for sequenced Borrelia genomes including genome phylogeny, genome synteny, and orthologous ORFs and intergenic spacers (IGSs) (Table 2).
Orthologous ORFs and IGSs
At its current state, the Borreliabase contains sequences of 750, 26, and 61 orthologous ORF families on the main chromosome, the cp26 plasmid, and the lp54 plasmid, respectively , as well as sequences of 541, 17, and 29 sets of unpublished orthologous IGSs on the main chromosome, the cp26 plasmid, and the lp54 plasmid, respectively. Methods on the identification of orthologous ORFs on the main chromosome, lp54, and cp26 have been described previously . Briefly, the ORF orthology was determined with a combination of sequence-similarity search using BLAST , clustering of homologous proteins using MCL , and manual validation based on genome synteny. Subsequently, we identified orthologous IGS sequences as those sharing a pair of flanking orthologous ORFs. To ensure the quality of IGS sequences in individual genomes, a consensus start-codon position was identified for each orthologous ORF family with the assumption that start-codon position and N-terminus sequences are conserved within as well as between species. As a result, many ORF sequences have been manually adjusted and their N-terminus sequences may differ from those in NCBI and other genome repositories. Plans have been made to reconcile ORF coordinates between BorreliaBase and the NCBI genome repository.
Currently, comparative data on BorreliaBase include a statistically robust genome phylogeny. First, 266 sets of single-copy ortholog ORF families present in all 35 Borrelia genomes were identified from the ortholog database (Figure 1). Subsequently, each ORF family was translated into protein sequences, which were then aligned with MUSCLE (v3.7) . Codon alignments corresponding to aligned protein sequences were obtained and then concatenated using customized PERL scripts based on BioPerl . A maximum likelihood tree was obtained by using RAxML with three partitions of nucleotide substitution rates corresponding to the three codon positions . Most of these orthologs are chromosome-borne. The phylogenetic position of the strain 297, the main chromosome of which remains to be sequenced, was determined and manually added to the RAxML tree based on phylogenies previously derived using plasmid-borne SNPs .
In addition to serving the raw orthologous sequences, BorreliaBase provides alignments of these sequences to allow direct interactive exploration of sequence variability (see Utility and Discussion). For orthologous IGS sequences, we provide online visualization of nucleotide alignments produced by MUSCLE (v3.7) with individual sequences ordered according to the genome phylogeny (see Use Case 3 below). For each set of orthologous ORF sequences, we provide MUSCLE-aligned amino-acid sequences as well as the corresponding codon alignment based on the amino-acid alignment (see Use Case 2 below).
Relational database and web interface
Utility and discussion
With BorreliaBase, we aim to facilitate comparative analysis of Borrelia genomes in a phylogenetic framework. For example, users can download manually curated orthologous ORF and IGS sequences or directly visualize sequence variations online. We do not duplicate existing single-genome features already provided by general-purpose microbial genome browsers such as the NCBI Microbial Genomes and the Integrated Microbial Genomes (IMG) databases. These databases provide comprehensive genome annotations including genome maps, plasmid identities, and ORF descriptions. In the following, we describe four examples to illustrate how BorreliaBase may be productively used for comparative exploration of Borrelia genomes.
Use case 1: synteny analysis- identify gene gains and losses
Synteny analysis compares co-linearity of genes on a replicon to identify evolutionary gains and losses of genes in genomes. A reliable synteny analysis requires the identification of a set of orthologous genes across multiple genomes as well as a phylogeny of these genomes. In this example, it has been identified that orthologs of ospB is absent in the genomes of two B. garinii strains PBr and Far04 (Figure 2). By displaying gene synteny in the context of the genome phylogeny, BorreliaBase allows us to conclude based on the principle of maximal parsimony that there is a single, lineage-specific deletion of ospB on the branch leading to B. garinii. This evolutionary history is consistent with the function of ospB being a gene expressed primarily in a mammalian environment  and with the fact that B. garinii uses birds as primary reservoir hosts . Since the synteny view on BorreliaBase serves dynamic rather than static images, one can obtained customized synteny views centered on a specific gene (ospA in this case, Figure 2). A further customization is that one may select or hide genomes by clicking on the branches or tips of the tree.
Use case 2: phylogenomics- visualize synonymous and nonsynonymous variations
Lipoproteins, especially those localized to the outer cell membrane, are primary candidates for the development of diagnostics for and vaccines against Borrelia infection, since they play key roles in establishing infections in arthropod vectors and vertebrate hosts [37, 38]. For example, B. burgdorferi undergoes phase changes in its outer surface protein composition through the expression of ospA in the tick phase, the expression of ospC during host invasion, and the expression of vls genes in the mammalian phase . Genes encoding outer surface lipoproteins tend to show elevated sequence variability within species, between species, or both within and between species [16, 40]. As such, sequence variability is a reliable guide for identifying Borrelia proteins and protein regions that interact directly with the host. A frequent request Borrelia researchers make to us is the identification of sequences orthologous to a gene of interest in all sequenced genomes. BorreliaBase fulfills this need by providing online access to orthologous ORF sequences.
A user begins by searching for a gene of interest using either a locus name (e.g., a15, the locus name for ospA in strain B31, the reference genome for LB group Borrelia) or a gene name (e.g., ospA). If the gene exists, a synteny view centered on the requested ORF appears showing its genomic context (Figure 2). At the same time, both the amino-acid alignment and the corresponding codon alignment of its orthologs are displayed (Figure 3). The integration of codon and amino-acid alignments allows easy determination of whether a nucleotide substitution is synonymous or non-synonymous (Figure 3). With the codon-alignment view, a user is able to visually scan for genes or regions that are potentially host-interacting based on a high density of non-synonymous substitutions before conducting a formal nonsynonymous-to-synonymous (K A /K S ) ratio test . Conversely, a user may identify gene regions with both low rates of synonymous and nonsynonymous substitutions between the Borrelia species. For example, using the codon-alignment view of BorreliaBase, one can visualize that the 5’ leader sequences of ospC is highly conserved across the eight LB-group Borrelia species in both synonymous and nonsynonymous nucleotide sites (Figure 3). Conservation of synonymous sites at the 5’ regions of a gene is common in bacteria and is typically associated with a preference for rare codons and a high level of gene expression .
A user may download aligned and unaligned DNA and protein sequences of a set of orthologous ORFs. Further, users may select or de-select strains using the genome phylogeny.
Use case 3: phylogenetic footprinting- identify regulatory sequences
For each set of orthologous IGS sequences, a user can browse a nucleotide alignment to identify potential cis-regulatory sequences by looking for nucleotide sites conserved among the Borrelia species. Functional elements embedded in IGSs, e.g., cis-regulatory sequences and sequences encoding small RNAs, tend to be conserved among closely related bacterial species, allowing identification of such elements through comparative genomics with an approach called phylogenetic footprinting [43–45]. Using BorreliaBase, for example, we were able to show a high level of cross-species conservation of cis-regulatory sequences upstream of the ospAB operon, including the ribosomal-binding site, the transcription start position, the −35 and −10 RpoD (σ70)-binding sites, and a T-rich region required for ospA/B repression in mammalian hosts  (Figure 4).
Use case 4: population genomics- identify recombination
Homologous recombination in bacteria, which typically takes the form of gene conversion, facilitates adaptive evolution in bacteria [47, 48]. Indeed, gene conversion underlines much of the hypervariability observed at loci encoding dominant antigens such as ospC in LB-group Borrelia and the rfb operon (coding for the O-antigen) and the fim locus (coding for an adhesion) in Escherichia coli[17, 49]. We have previously established that geographically coexisting Borrelia strains recombine at a rate approximately three times the mutation rate [17, 25]. Identification of homoplasy, a genetic variation that is inconsistent with the overall genome phylogeny, forms the basis of many tests of recombination such as the four-gamete test and the homoplasy test [50–52]. With sequence alignments displayed in the context of a genome phylogeny, a user of BorreliaBase can visually identify genomic regions affected by recombination and even estimate local recombination rates based on identification of homoplasies. In the first example, among strains of the same Borrelia species (B. burgdorferi sensu stricto), the presence of all four haplotypes at a pair of SNP sites is indicative of at least one past recombination event involving one of the SNP sites (Figure 5A). The second example shows that a portion of the b22 gene of the B. burgdorferi sensu stricto strain BOL26 is most similar to its homologs in B. afzelii, a different but sympatric species, than to its homologs in con-specific strains, indicating an unambiguous case of cross-species recombination (Figure 5B). Breakage points of this specific recombination event have been identified by recognizing where such phylogenetic inconsistency ends . In the last example, one estimates recombination rates relative to mutation rates by scanning for sequence differences between a pair of sister-group strains (Figure 5C). A SNP segregating between two phylogenetic sister-group strains may be caused either by mutation or by recombination. For such a SNP site, if one of the two SNP states does not occur in any other strains in a large population sample it is likely to be a recent point mutation, while it is more likely to be introduced by recombination if it appears in other strains as well . Since these visual methods for identifying footprints of recombination do not require performing any statistical analysis, in our experience BorreliaBase makes an effective pedagogical tool as well.
The phylogeny-centered design of BorreliaBase encourages comparing Borrelia genomes in the framework of their shared phylogenetic history. More than simply serving as a graphic guide for selecting and de-selecting strains, the genome phylogeny provides visual, evolution-based expectations on comparative genomic information such as an average level of sequence variability and the expected phylogenetic consistency of SNPs. When, and only when, an observed pattern deviates significantly from these evolutionary expectations should functional importance be conjectured. For example, the observed high levels of within-species sequence variability of lipoprotein genes such as ospC, dbpA, and vls deviate greatly from the expected sequence variability, suggesting strong diversifying selection at these loci driven by, e.g., immune escape from the host. Conversely, the observed high levels of sequence conservation of cis-regulatory elements deviate from the expected level of sequence variations between Borrelia species, consistent with gene-regulatory roles these IGS elements play (Figure 4). A phylogenetic expectation is especially valuable for identifying genomic footprints of recombination (Figure 5). Nexplorer, an online tool for analyzing evolution of characters (e.g., presence and absence of introns) associated with a protein family, similarly employs a phylogeny-centered user-interface design .
While allowing interactive comparative exploration of Borrelia genomes, the current version of BorreliaBase (public release 1.0) is admittedly limited in genomic scope. For example, comparative features, including gene synteny and orthologs, are only available for the three universally present replicons. We plan to include results of other replicons in future releases of BorreliaBase.
Open Reading Frame
- B. burgdorferi s.l.:
Borrelia burgdorferi sensu lato
- B. burgdorferi s.s.:
Borrelia burgdorferi sensu stricto
Barbour AG, Hayes SF: Biology of Borrelia species. Microbiol Rev. 1986, 50: 381-400.
Wu D, Jospin G, Eisen JA: Systematic identification of gene families for use as “Markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One. 2013, 8: e77033-
Fraser CM, Casjens S, Huang WM, Sutton GG, Clayton R, Lathigra R, White O, Ketchum KA, Dodson R, Hickey EK, Gwinn M, Dougherty B, Tomb J-F, Fleischmann RD, Richardson D, Peterson J, Kerlavage AR, Quackenbush J, Salzberg S, Hanson M, van Vugt R, Palmer N, Adams MD, Gocayne J, Weidman J, Utterback T, Watthey L, McDonald L, Artiach P, Bowman C, et al: Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature. 1997, 390: 580-586.
Casjens S, Palmer N, Van Vugt R, Mun Huang W, Stevenson B, Rosa P, Lathigra R, Sutton G, Peterson J, Dodson RJ, Haft D, Hickey E, Gwinn M, White O, Fraser MC: A bacterial genome in flux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi. Mol Microbiol. 2000, 35: 490-516.
Elbir H, Gimenez G, Robert C, Bergström S, Cutler S, Raoult D, Drancourt M: Complete genome sequence of Borrelia crocidurae. J Bacteriol. 2012, 194: 3723-3724.
Lescot M, Audic S, Robert C, Nguyen TT, Blanc G, Cutler SJ, Wincker P, Couloux A, Claverie J-M, Raoult D, Drancourt M: The genome of Borrelia recurrentis, the agent of deadly louse-borne relapsing fever, is a degraded subset of tick-borne Borrelia duttonii. PLoS Genet. 2008, 4: e1000185-
Hue F, Langeroudi AG, Barbour AG: Chromosome sequence of Borrelia miyamotoi, an uncultivable tick-borne agent of human infection. Genome Announc. 2013, 1: e00713-13.
Glöckner G, Lehmann R, Romualdi A, Pradella S, Schulte-Spechtel U, Schilhabel M, Wilske B, Sühnel J, Platzer M: Comparative analysis of the Borrelia garinii genome. Nucleic Acids Res. 2004, 32: 6038-6046.
Glöckner G, Schulte-Spechtel U, Schilhabel M, Felder M, Sühnel J, Wilske B, Platzer M: Comparative genome analysis: selection pressure on the Borrelia vls cassettes is essential for infectivity. BMC Genomics. 2006, 7: 211-
Casjens SR, Mongodin EF, Qiu W-G, Dunn JJ, Luft BJ, Fraser-Liggett CM, Schutzer SE: Whole-genome sequences of two Borrelia afzelii and two Borrelia garinii lyme disease agent isolates. J Bacteriol. 2011, 193: 6995-6996.
Casjens SR, Fraser-Liggett CM, Mongodin EF, Qiu W-G, Dunn JJ, Luft BJ, Schutzer SE: Whole genome sequence of an unusual Borrelia burgdorferi sensu lato isolate. J Bacteriol. 2011, 193: 1489-1490.
Schutzer SE, Fraser-Liggett CM, Qiu W-G, Kraiczy P, Mongodin EF, Dunn JJ, Luft BJ, Casjens SR: Whole-genome sequences of Borrelia bissettii, Borrelia valaisiana, and Borrelia spielmanii. J Bacteriol. 2012, 194: 545-546.
Schutzer SE, Fraser-Liggett CM, Casjens SR, Qiu W-G, Dunn JJ, Mongodin EF, Luft BJ: Whole-genome sequences of thirteen isolates of Borrelia burgdorferi. J Bacteriol. 2011, 193: 1018-1020.
Stevenson B, Porcella SF, Oie KL, Fitzpatrick CA, Raffel SJ, Lubke L, Schrumpf ME, Schwan TG: The relapsing fever spirochete Borrelia hermsii contains multiple, antigen-encoding circular plasmids that are homologous to the cp32 plasmids of lyme disease spirochetes. Infect Immun. 2000, 68: 3900-3908.
Miller SC, Porcella SF, Raffel SJ, Schwan TG, Barbour AG: Large linear plasmids of Borrelia species that cause relapsing fever. J Bacteriol. 2013, 195: 3629-3639.
Casjens SR, Mongodin EF, Qiu W-G, Luft BJ, Schutzer SE, Gilcrease EB, Huang WM, Vujadinovic M, Aron JK, Vargas LC, Freeman S, Radune D, Weidman JF, Dimitrov GI, Khouri HM, Sosa JE, Halpin RA, Dunn JJ, Fraser CM: Genome stability of lyme disease spirochetes: comparative genomics of Borrelia burgdorferi plasmids. PLoS One. 2012, 7: e33280-
Haven J, Vargas LC, Mongodin EF, Xue V, Hernandez Y, Pagan P, Fraser-Liggett CM, Schutzer SE, Luft BJ, Casjens SR, Qiu W-G: Pervasive recombination and sympatric genome diversification driven by frequency-dependent selection in Borrelia burgdorferi, the lyme disease bacterium. Genetics. 2011, 189: 951-966.
Mongodin EF, Casjens SR, Bruno JF, Xu Y, Drabek EF, Riley DR, Cantarel BL, Pagan PE, Hernandez YA, Vargas LC, Dunn JJ, Schutzer SE, Fraser CM, Qiu W-G, Luft BJ: Inter- and intra-specific pan-genomes of Borrelia burgdorferi sensu lato: genome stability and adaptive radiation. BMC Genomics. 2013, 14: 693-
Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, Gillespie JJ, Gough R, Hix D, Kenyon R, Machi D, Mao C, Nordberg EK, Olson R, Overbeek R, Pusch GD, Shukla M, Schulman J, Stevens RL, Sullivan DE, Vonstein V, Warren A, Will R, Wilson MJC, Yoo HS, Zhang C, Zhang Y, Sobral BW: PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014, 42 (D1): D581-D591.
Pagani I, Liolios K, Jansson J, Chen I-MA, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC: The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012, 40 (Database issue): D571-D579.
Norris SJ, Lin T: Out of the woods: the remarkable genomes of the genus Borrelia. J Bacteriol. 2011, 193: 6812-6814.
Winsor GL, Lam DKW, Fleming L, Lo R, Whiteside MD, Yu NY, Hancock REW, Brinkman FSL: Pseudomonas genome database: improved comparative analysis and population genomics capability for Pseudomonas genomes. Nucleic Acids Res. 2011, 39 (Database issue): D596-D600.
Lechat P, Hummel L, Rousseau S, Moszer I: GenoList: an integrated environment for comparative analysis of microbial genomes. Nucleic Acids Res. 2008, 36 (Database issue): D469-D474.
Whiteside MD, Winsor GL, Laird MR, Brinkman FSL: OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Res. 2012, 41: D366-D376.
Qiu W-G, Schutzer SE, Bruno JF, Attie O, Xu Y, Dunn JJ, Fraser CM, Casjens SR, Luft BJ: Genetic exchange and plasmid transfers in Borrelia burgdorferi sensu stricto revealed by three-way genome comparisons and multilocus sequence typing. Proc Natl Acad Sci U S A. 2004, 101: 14150-14155.
Brenner EV, Kurilshikov AM, Stronin OV, Fomenko NV: Whole-genome sequencing of Borrelia garinii BgVir, isolated from taiga ticks (Ixodes persulcatus). J Bacteriol. 2012, 194: 5713-
Jiang B, Yao H, Tong Y, Yang X, Huang Y, Jiang J, Cao W: Genome sequence of Borrelia garinii strain NMJW1, isolated from China. J Bacteriol. 2012, 194: 6660-6661.
Jiang B-G, Zheng Y-C, Tong Y-G, Jia N, Huo Q-B, Fan H, Ni X-B, Ma L, Yang XF, Jiang J-F, Cao W-C: Genome sequence of Borrelia afzelii strain HLJ01, isolated from a patient in China. J Bacteriol. 2012, 194: 7014-7015.
Barbour AG, Campeau Miller S: Genome Sequence of Borrelia parkeri, an agent of enzootic relapsing fever in Western North America. Genome Announc. 2014, 2 (1): doi:10.1128/genomeA.00018-14
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinformatics. 2009, 10: 421-
Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30: 1575-1584.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797.
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618.
Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinforma Oxf Engl. 2014, 30: 1312-1313.
Liang FT, Caimano MJ, Radolf JD, Fikrig E: Borrelia burgdorferi outer surface protein (osp) B expression independent of ospA. Microb Pathog. 2004, 37: 35-40.
Vollmer SA, Feil EJ, Chu C-Y, Raper SL, Cao W-C, Kurtenbach K, Margos G: Spatial spread and demographic expansion of Lyme borreliosis spirochaetes in Eurasia. Infect Genet Evol J Mol Epidemiol Evol Genet Infect Dis. 2013, 14: 147-155.
Caimano MJ, Iyer R, Eggers CH, Gonzalez C, Morton EA, Gilbert MA, Schwartz I, Radolf JD: Analysis of the RpoS regulon in Borrelia burgdorferi in response to mammalian host signals provides insight into RpoS function during the enzootic cycle. Mol Microbiol. 2007, 65: 1193-1217.
Revel AT, Talaat AM, Norgard MV: DNA microarray analysis of differential gene expression in Borrelia burgdorferi, the Lyme disease spirochete. Proc Natl Acad Sci. 2002, 99: 1562-1567.
Tilly K, Bestor A, Rosa PA: Lipoprotein succession in Borrelia burgdorferi: similar but distinct roles for OspC and VlsE at different stages of mammalian infection. Mol Microbiol. 2013, 89: 216-227.
Coutte L, Botkin DJ, Gao L, Norris SJ: Detailed analysis of sequence changes occurring during vlsE antigenic variation in the mouse model of Borrelia burgdorferi infection. PLoS Pathog. 2009, 5: e1000293-
Hurst LD: The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet TIG. 2002, 18: 486-
Goodman DB, Church GM, Kosuri S: Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013, 342: 475-479.
Eddy SR: A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 2005, 3: e10-
Su J, Teichmann SA, Down TA: Assessing computational methods of cis-regulatory module prediction. PLoS Comput Biol. 2010, 6: e1001020-
Degnan PH, Ochman H, Moran NA: Sequence conservation and functional constraint on intergenic spacers in reduced genomes of the obligate symbiont Buchnera. PLoS Genet. 2011, 7: e1002252-
Xu Q, McShan K, Liang FT: Two regulatory elements required for enhancing ospA expression in Borrelia burgdorferi grown in vitro but repressing its expression during mammalian infection. Microbiology. 2010, 156: 2194-2204.
Didelot X, Maiden MCJ: Impact of recombination on bacterial evolution. Trends Microbiol. 2010, 18: 315-322.
Fraser C, Hanage WP, Spratt BG: Recombination and the nature of bacterial speciation. Science. 2007, 315: 476-480.
Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, Bonacorsi S, Bouchier C, Bouvet O, Calteau A, Chiapello H, Clermont O, Cruveiller S, Danchin A, Diard M, Dossat C, Karoui ME, Frapy E, Garry L, Ghigo JM, Gilles AM, Johnson J, Le Bouguénec C, Lescat M, Mangenot S, Martinez-Jéhanne V, Matic I, Nassif X, Oztas S, et al: Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009, 5: e1000344-
Hudson RR, Kaplan NL: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics. 1985, 111: 147-164.
McVean G, Awadalla P, Fearnhead P: A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002, 160: 1231-1241.
Smith JM, Smith NH: Detecting recombination from gene trees. Mol Biol Evol. 1998, 15: 590-599.
Guttman DS, Dykhuizen DE: Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science. 1994, 266: 1380-1383.
Gopalan V, Qiu W-G, Chen MZ, Stoltzfus A: Nexplorer: phylogeny-based exploration of sequence family data. Bioinforma Oxf Engl. 2006, 22: 120-121.
We thank two anonymous reviewers for constructive critiques. We thank Sabeel Kazi for preparing the Entity-Relation diagram. We thank Sviatoslav Kendall for assistance in literature search. We thank Patricia Rosa for bringing our attention to the codon-usage conservation in ospC leader sequences. This work was supported by Public Health Service grants AI107955 (to WGQ), AI37256 (BJL), AI49003 (SRC), and AI30071 (CMF and SES) from the National Institute of Allergy and Infectious Diseases (NIAID), and the grant MD007599 (Hunter College) from the National Institute on Minority Health and Health Disparities (NIMHD) of the National Institutes of Health (NIH). The content of this manuscript is solely the responsibility of the authors and do not necessarily represent the official views of NIAID, NIMHD, or NIH.
The authors declare that they have no competing interests.
LD designed and implemented the web interface. PP and CM developed prototype websites for ORFs and IGSs respectively. PP matched GenBank Locus Tags to individual ORFs in the back-end genome database. PP and SA identified consensus start-codon positions for each orthologous ORF family. GR prepared figures and online documentation. DP configures and administrates the website. EM, CF, SE BL, SC, and WQ sequenced and annotated the majority of the genomes in the current BorreliaBase. SC identified plasmids in these genomes. WQ conceived of the project, developed the back-end database, and drafted the manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
Di, L., Pagan, P.E., Packer, D. et al. BorreliaBase: a phylogeny-centered browser of Borrelia genomes. BMC Bioinformatics 15, 233 (2014) doi:10.1186/1471-2105-15-233
- Lyme disease
- Vector-borne relapsing fever
- Genome browser
- Population genomics