SynteBase/SynteView: a tool to visualize gene order conservation in prokaryotic genomes

Background It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. Results After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at . Conclusion SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.


Setting up the Database
A PostgreSQL database has to be built to contain your home-made synteny data. Your are free to design your schema but the database must contain mandatory data about Proteins (identifier, coding strand, sequence, function, and length), genomes (species name, species name abbreviation, strain name, taxonomy), and synteny blocks (id, and pairs of orthologous proteins belonging to this block).

Configuring SynteView
Once the database is operational, several mandatory queries must be introduced that are required for visualizing gene order and conserved synteny blocks.
Here, the abbreviation field stands for a short name of the species. It will be displayed in the synteny panel for visualization purpose. The taxonomy must be in the form of a list of node names, separated by a ";". The speciesname field describes the full name of the species, including the strain name if any. The nbprotein field specifies the number of proteins encoded by the genome of the species.

WHERE ...pid=%pid
Here, the condition pid=%pid is mandatory. SynteView will replace, when necessary, %pid by the identifier of the protein under analysis. The pid field stands for the identifier of the protein, which has to be an integer. The chromosome field stands for the name of the replicon in which the gene encoding the protein under analysis is located. The species field stands for the short name of the species related to the protein. "function" is a string field representing a text description of the functional annotation of the protein. Start, stop, genename, are fields describing properties of the gene encoding the protein under analysis, respectively, and length is the amino acids length of the protein.

retrieving information on a genome:
SELECT pid, genename, strand, start, chromosome, function FROM ... WHERE ... abbreviation='%spabrv' The condition abbreviation='%spabrv' is mandatory. SynteView will replace %spabrv by the corresponding species abbreviation when necessary. The definition of the different fields is as previously described.

Instruction manual Initializing SynteView before use
To use the local acces mode, please fill in the database connection fields on the settings panel as follows: database server name, port, login, and password. Note that these fields are filled in by default in the web service mode.

Tool bar
All the actions that can be performed with SynteView are accessible via the toolbar on the left hand side of the graphical user interface. Various buttons allow to display the following panels: Choosing the reference species and the set of compared species

The central panel
It displays the strict conservation of gene order, in comparing to a reference species (first line), chosen by the user, all other available species which are automatically sorted by taxonomy. Each gene present in the genome of the reference species is depicted by a rectangle with a color code: blue (positive strand) or yellow (negative strand). Grey rectangles are genes without orthologs belonging to a synteny block in another species.
Conserved gene neighbouring in the compared species are exhibited in the next lines using the same colour code. Please remember that blocks which appear as adjacent in compared genomes are not necessarily physically neighbouring in the actual genome.

Choosing the reference species and the set of compared species
The figure below shows the species choosing panel. It contains two tabs. The first one is dedicated to choose the reference species, and the second one to choose the set of compared species. All these process are guided with the species taxonomy (according to the NCBI).

Selecting the reference species
When browsing the taxonomy, please select a particular taxon (for instance, Rhizobiales). Once selected (on the left hand side of the panel), all the species contained in this taxon are displayed on the right hand side of the panel. Clicking on a particular species will select it as a reference species.

Selecting the compared species
When browsing the species tree and selecting a particular taxon as described above, its species are displayed and can be drag and drop to the right panel. This can be repeated. For instance, in the screenshot below, several cyanobacteria have been selected and it is now possible to add various archaeal species.