SynteBase/SynteView: a tool to visualize gene order conservation in prokaryotic genomes
© Lemoine et al; licensee BioMed Central Ltd. 2008
Received: 04 September 2008
Accepted: 16 December 2008
Published: 16 December 2008
It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances.
After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at http://www.synteview.u-psud.fr.
SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.
As prokaryotic species diverge, their gene order is increasingly fading away, except in rare locations where a few genes retain their neighborhood. Such observations gave rise to the concept of genomic context [1–9]. Accordingly, it is assumed that a small number of genes remain adjacent either because their expressions occur at the same time, or because they encode proteins that are constituents of the same molecular machine (e.g. membrane ATPase) or involved in the same cellular function . These genes that remain persistently adjacent in constantly moving genomes form synteny blocks. In a recent work , we have identified such synteny blocks in a large and diverse set of nearly 600 microbial genomes using a three-step process. In step one, we compared each protein encoded by a completely sequenced genome with all other available microbial proteomes in order to identify the full set of homologous proteins they share. In step two, we outlined an approach allowing the identification of bona fide orthologues among all recognized homologues when comparing many pairs of genomes. This second step is based on an adaptation of the method designed by Wall et al.  to compute the reciprocal smallest distance (RSD) that separates the homologues present in a pair of genomes. Step three allowed further research among the correctly identified orthologues to pinpoint those that belong to a minimal unit that is conserved in each pair of genomes, i.e., a pair of positional orthologous genes (POGs) that remain adjacent in each genome. Then, after extending these minimal units as far as possible, it becomes feasible to assess the relative amount and size of synteny blocks in close and distant species. Such synteny blocks are appealing candidates in the study of the mechanisms of genome evolution and in the verification of the functional annotation of neighboring genes. Accordingly, visualizing these blocks in a large number of genomes at various taxonomic distances help to study their features. In this paper, we describe how to assemble all these synteny data in a relational database (SynteBase) and we develop a tool (SynteView) to visualize all conserved synteny blocks in a large number of completely sequenced prokaryotic genomes.
SynteView was designed to display homology and gene context data that are organized in a relational database, SynteBase, described in detail below.
Creating a relational database for synteny data and populating its tables with a dedicated suite of softwares and other tools
Step one: searching for homologues
A suite of programs to detect and identify synteny blocks
identifying orthologues by RBH
Perl script rsd ortho
FL, this work
Perl script famtrans
FL, this work
graph algorithm (Perl library)
FL, this work
extracting significant clusters
identifying pairs of adjacent orthologous genes
SQL query on SynteBase
FL, this work
discovering synteny blocks
Perl script synblock
FL, this work
Step two: identifying orthologues among the collected homologues
We further adapted the Reciprocal Best Blast Hit approach  to analyze the Blast results obtained in the first step. The best RSD orthologous pairs were determined in each comparison of two proteomes as follows. Protein a encoded by genome G A and protein b encoded by genome G B form the best pair of orthologues if the distance separating a from b is smaller than the distance separating both a from any other protein encoded by G B and b from any other protein encoded by G A . We automated this search (Table 1, step 2a). The data obtained were used to populate the orth o table (Fig. 1).
Step three: identifying positional orthologous genes among the collected orthologues
Once populated, the first three tables were used to identify the synteny blocks. We devised a specific SQL query (see [Additional file 2]) to discover the pairs of adjacent orthologous genes (Table 1, step 3a). Then, blocks of size greater than 2 were detected by progressive accretion of blocks of size 2 which shared a common pair of orthologues (Table 1, step 3b). These computed data were entered in the neighborpair s and synten y blocks tables, respectively (Fig. 1).
Architecture of SynteView
Visualizing synteny data with SynteView
Using SynteView for comparative analysis of gene context
Using SynteView for comparative analysis of multiple views
SynteView was also designed to allow complex studies by means of easy and simple operations. For example, looking at a peculiar set of species makes it possible to immediately visualize new assortments of synteny blocks. This is done simply by selecting a new reference species by clicking on a species name on the left of the display and/or by changing the list of compared genomes. Moreover, contrary to challenging tools (see Discussion below), SynteView allows global analyses of the synteny data using various points of view. Scrolling up and down the same window, one can assess the level of conservation of gene order at various taxonomic depths, the relative density of the synteny blocks along the whole genome, the relative size of the blocks, and the respective events of gene insertion/deletion in close and distant species.
Using SynteView to quantify synteny data
Obtaining information on synteny blocks
B. subtilis a Taxonomy
Synechocysti s specie s
Mycobacteriu m tuberculosi s
Methanosarcin a acetivoran s
SynteView was designed to allow fast and easy visualization of the conservation of gene adjacency in many genomes for which orthology and neighborhood data were computed and stocked in a dedicated relational database SynteBase. Our goal was to develop a flexible yet powerful tool to work directly with home-computed data obtained after comparing large and diverse sets of species. Indeed, our tool can be easily installed on any personal computer endowed with one of the main operating systems (Windows, Mac OS X or Linux). Moreover, SynteView can be customized in many aspects. In particular, it can be used with another, home-made, database in place of SynteBase. We observed that among the other tools to visualize synteny data [16–20] that have been designed to be locally installed, not one is adapted to the use of the abundant genomic data for prokaryotic species. Contrary to these previously published softwares [16–20], SynteView allows the user to compare the gene order in many different genomes in the same window. Finally, the strict relationship between SynteBase and SynteView allows their user to enlarge the study of gene order by means of specific queries on SynteBase. In addition to the visualization of synteny blocks, it is possible to obtain productive information through various requests such as "How many genes are involved in a neighbouring relationship, for each pair of genomes?"
We anticipate that we will be inundated by thousands of completely sequenced genomes in the next few years . Our tool SynteBase/SynteView has been designed to support such large sets of prokaryotic data. This tool will serve to quickly evaluate the conservation of gene order in newly-published genomes as soon as they have been compared to those already analyzed.
Availability and requirements
Project name: SynteView/SynteBase
Project home page: http://www.synteview.u-psud.fr
Operating System(s): Windows, Linux, MacOS X (Java web start)
Programming Language: Java
Other requirements: Java 1.5
License: GNU GPL
Any restrictions to use by non-academics: none
Perl scripts: available on request
Positional Orthologous Genes
Reciprocal Smallest Distance
Structured Query Language.
FL is a PhD student supported by the French Ministry of Research. This work was funded by the CNRS (UMR 8621) and the Agence Nationale de la Recherche (ANR-05-MMSA-0009 MDMS NV 10). We gratefully acknowledge Stéphane Descorps-Declère for his help in designing the genome comparison pipeline and Mary Bouley (Université de Bourgogne) for her aid in improving the quality of our manuscript.
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends in Biochemical Sciences 1998, 23: 324–328. 10.1016/S0968-0004(98)01274-2View ArticlePubMedGoogle Scholar
- Huynen MA, Bork P: Measuring genome evolution. Proc Natl Acad Sci USA 1998, 95: 5849–5856. 10.1073/pnas.95.11.5849PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Natur 1999, 402: 86–90. 10.1038/47056View ArticleGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285(5428):751–753. 10.1126/science.285.5428.751View ArticlePubMedGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 1999, 96: 2896–2901. 10.1073/pnas.96.6.2896PubMed CentralView ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96: 4285–4288. 10.1073/pnas.96.8.4285PubMed CentralView ArticlePubMedGoogle Scholar
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000, 18: 609–613. 10.1038/76443View ArticlePubMedGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Exploitation of gene context. Curr Opin Struct Biol 2000, 10: 366–370. 10.1016/S0959-440X(00)00098-1View ArticlePubMedGoogle Scholar
- Wolf Y, Rogozin I, Kondrashov A, Koonin E: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Research 2001, 3: 356–372. 10.1101/gr.GR-1619RView ArticleGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences. Genome Research 2000, 10: 1204–1210. 10.1101/gr.10.8.1204PubMed CentralView ArticlePubMedGoogle Scholar
- Lemoine F, Lespinet O, Labedan B: Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data. BMC Evol Biol 2007, 7: 237. 10.1186/1471-2148-7-237PubMed CentralView ArticlePubMedGoogle Scholar
- Wall D, Fraser H, Hirsh A: Detecting putative orthologs. Bioinformatic 2003, 19: 1710–1711. 10.1093/bioinformatics/btg213View ArticleGoogle Scholar
- PostgreSQL database management systems[http://www.postgresql.org/]
- Le Bouder-Langevin S, Capron-Montaland I, De Rosa R, Labedan B: A strategy to retrieve the whole set of protein modules in microbial proteomes. Genome Research 2002, 12: 1961–1973. 10.1101/gr.393902PubMed CentralView ArticlePubMedGoogle Scholar
- Java Technology[http://java.sun.com/]
- Sinha AU, Meller J: Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. BMC Bioinformatics 2007, 8: 82. 10.1186/1471-2105-8-82PubMed CentralView ArticlePubMedGoogle Scholar
- Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger JC: SynView: a GBrowse-compatible approach to visualizing comparative genome data. Bioinformatic 2006, 22: 2308–2309. 10.1093/bioinformatics/btl389View ArticleGoogle Scholar
- Hunt E, Hanlon N, Leader DP, Bryce H, Dominiczak AF: The visual language of synteny. OMIC 2004, 8: 289–305. 10.1089/omi.2004.8.289View ArticleGoogle Scholar
- Pan X, Stein L, Brendel V: SynBrowse: a synteny browser for comparative sequence analysis. Bioinformatic 2005, 21: 3461–3468. 10.1093/bioinformatics/bti555View ArticleGoogle Scholar
- Byrne KP, Wolfe KH: The Yeast Gene Order Browser: Combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Research 2005, 15: 1456–1461. 10.1101/gr.3672305PubMed CentralView ArticlePubMedGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005, 33: 5691–5702. 10.1093/nar/gki866PubMed CentralView ArticlePubMedGoogle Scholar
- protein BLAST[http://blast.ncbi.nlm.nih.gov/]
- MCL – a cluster algorithm for graphs[http://micans.org/mcl/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.