mGSV was implemented with free open-source software under Linux environment. PHP (http://www.php.net) and MySQL (http://www.mysql.com) were used to implement the web interface and backend database, respectively. JavaScript, jQuery (http://jquery.com) and Raphael (http://raphaeljs.com/) were extensively embedded in the PHP code for interactive browsing features.
Input data
A mandatory genome synteny data file is required to use mGSV along with an optional genome annotation data file: both are tab-delimited text files as described in the original GSV package [11]. The synteny data file allows users to specify the genomic location of each conserved region in each pair of genomic sequences. One noteworthy characteristic of the synteny data file is its open-ended format, in which users can provide additional information such as alignment score or percentage of similarity or identity to characterize each of the conserved regions. Such additional information can be used for selecting regions of interest for visualization in the synteny browser described below. An optional genome annotation file can also be submitted to list the accompanying genomic features (e.g., genes) to be displayed as annotation tracks along with the reference genomes. Users can define how each feature is displayed, i.e., the shape and color of each annotation track, in the annotation file. Such display settings can also be dynamically changed in the synteny browser. In addition, if an HTML-style hyperlink is provided to annotate the feature name, then clicking on that feature in the synteny display will open the URL in another tab. This would be useful to link to external information about a particular genomic feature. The mGSV/GSV data formats were developed because none of the existing formats can achieve the above goals easily. Additional details about the input file format are available at http://cas-bioinfo.cas.unt.edu/mgsv/tutorial.php. The input files can be either submitted in plain text or in compressed format (i.e., .zip or .gz) to facilitate fast file uploads in mGSV. Besides file upload, the input files can also be accessed through a user-specified URL.
Backend database
After users have submitted their data through mGSV, the data will be stored in a MySQL relational database as described in Revanna et al.[11]. For each submitted dataset, a separate set of synteny and annotation tables is created so that different datasets are stored separately without interfering with each other. Such design allows the backend database structure to easily support the “open-ended” format of the synteny file. For example, if a synteny data file is uploaded with additional columns such as “score”, “evalue” or any other columns, the dynamically generated database synteny table will contain these additional columns as fields, specifically matching the columns in the submitted data file. Additional details about the mGSV database architecture can be found in the ARCHITECTURE file in the mGSV downloadable package.
Synteny browser
After the data upload, users are first presented with a summary page (Figure 1), in which all the input genomes are arranged in a circle showing the overall conserved regions among each other (similar to how the Circos software, a standalone visualization tool, can be used for visualizing multiple genome sequence comparison; http://www.circos.ca). An “Associations Provided” chart is also shown in the overview page listing all pairs of genomes specified in the user-uploaded input data and the number of conserved regions for each pair. Below this chart are buttons allowing the user to select either the pairwise or multiple viewing mode.
The pairwise viewing mode displays the conserved genomic regions between adjacent genomes (Figure 2). At the top of the synteny browser, multiple pull-down menus are available that allow users to select specific genomes to display in the order of their choice. Additional pull-down menus can be added and removed, so that each genome can be displayed more than once if necessary. For example, to compare a genome A to three other genomes B, C, and D, the display can be ordered as ‘B-A-C-A-D’ by using the multiple pull-down menus. Note that there is no designated “reference” genome, which allows for any order of the displayed genomes under the full control of users. Buttons at the top left corner allow users to control all the genomes displayed by zooming in/out, moving left/right or viewing entire genomes on all genomes. mGSV is then divided into two main display windows with control panels (for zoom and filtering functions) on the left and synteny displays on the right. In the synteny display window, each selected genome is represented as a horizontal ruler with tick marks showing its genomic position. The conserved regions between adjacent pairs of selected genomes are displayed as colored translucent blocks. When users click on a conserved region, a pop-up menu appears showing its numerical start and end positions. Users can zoom in/out, move left/right or select specific regions on individual genomes for display by using the embedded control panels on the left of the view. Users can also filter the conserved regions based on their associated characteristics listed in the synteny files such as length of the conserved regions, similarity score, and so on. Selecting and filtering allow users to focus on the regions of interest that meet certain criteria. For example, by applying a stringent similarity cutoff users can choose to only display highly conserved regions. By default, each conserved region is colored differently, but users can change all the displayed regions in a synteny track to be uniformly colored via the Colors option. If an annotation file is also provided, a selected annotation track (e.g., gene) will be displayed inside each selected genome. If multiple types of annotations are provided in the annotation file (e.g., both gene and expression profile), only one track per genome can be displayed at a time to avoid overcrowding the display in the current implementation. However, users can easily switch among the tracks or change the colors and shapes of the selected tracks on the fly.
The multiple viewing mode (Figure 3) differs from the pairwise viewing mode in several ways. Most importantly, by default conserved genomic regions are shown among all displayed genomes, rather than just between adjacent tracks. The display of the conserved regions between any pair of genomes can be switched on and off, by clicking on the highlighted synteny pairs displayed above the synteny view. Any genome can be included or removed from the display, provided that each genome is shown only once. Because of the overlapping nature of the conserved region blocks in this view, genome annotations are not shown. The conserved regions in each synteny track have the same color so that overlapping regions can be discerned. All conserved regions can be filtered using a single filter panel above the synteny view.
As mentioned above, mGSV provides two different viewing modes because each method has its own advantages to address specific needs of the user. For example, when many conserved regions are shown, the pairwise viewing mode can appear less crowded. It also allows genome annotations to be shown in each genome block. Users have more control over filtering in this mode, since each synteny track can be filtered independently. On the other hand, the multiple viewing mode can show synteny for more genome pairs in the same viewing window, since all pairwise combinations of the on-screen genomes can be visible. An example of preference to this compactness is seen with just a few genomes allowing the user to visualize all the conserved regions in the same screen. Because the conserved regions for any pair of genomes can be selectively turned off, even complex patterns may be explored in this mode.
Improving the genome display order
By default, mGSV displays the genomes in the same order as they are specified in the user-supplied synteny files. However, such order may not always be optimal, i.e., the display of the conserved regions may be improved if different adjacent genomes are chosen. Although users can manually adjust the order, we have developed greedy heuristic algorithms for both the pairwise and multiple viewing modes. The algorithmic details are described in the additional file [see Additional file 1]. Although the algorithms do not guarantee to always generate the most optimal orders, they can be used for improving visual clarity by re-arranging the order of the adjacent genomes based on the total size of the conserved genomic regions between each genome pairs. The algorithms involve graph theory and sorting techniques, thus they can be time consuming for datasets with many genomes (e.g., hundreds of genomes, which are unsuitable for being visualized manually anyway). If the optimization can be done rapidly (i.e., before the PHP server times out), the button “Optimize order” will appear in the synteny browser as an option for users to select (Figure 2 and 3).
Web service for machine-to-machine communication
Besides allowing users to manually specify input files to the mGSV web server, we have also implemented a Web Service to allow machine-to-machine communication so that other programs (e.g., remote bioinformatics databases) may automatically send input data to the mGSV server and obtain the results. The mGSV Web Service is based on standard SOAP specification (http://www.w3.org/TR/soap/). A standalone server application is implemented in Java and runs alongside the mGSV web server as a background process. The Web Service listens to the port 8081 for the requests from remote client programs, which make standard XML-SOAP requests by providing either the synteny and annotation data or URLs pointing to the data files. After receiving the data, the Web Service responds back with a unique ID, which can be used to access the visualization results in the mGSV web server. The binary and source files for both the server and client programs are distributed in the mGSV downloadable version with a detailed documentation on their installation and usage. The documentation also includes the mGSV Web Service protocol specification that is provided as a WSDL file (http://www.w3.org/TR/wsdl).
Utility programs
The mGSV package also provides scripts for converting outputs of BLAST [12] and BLASTZ [13], as well as GFF3 (http://gmod.org/wiki/GFF3#GFF3) format files into mGSV input files.
Additional features
If the user submits an email address, an email will be sent immediately to the user with two URLs. One URL links to the current mGSV submission, and the other URL is the access to all the results associated with that email address obtained in the last sixty days.