Selection of the technology to generate the graphics
Currently, the dynamic creation of graphics according to web standards can be generated using two different techniques: Scalable Vector Graphics (SVG) and Canvas. Both are well supported in modern browsers and their manipulation is possible with standard JavaScript.
SVG is a markup language like HTML, and as such, each component of the graph is represented as an element of the document and is held in memory in the browser as part of the Document Object Model (DOM). Because of this, the whole content of the graphic can be embedded in the same document as its parent web page. Canvas, on the other hand, is a single element in the web document, whose content manipulation is carried out via JavaScript. Only the final result is stored in memory. As a consequence, Canvas requires less memory than SVG, but lacks the control per element that SVG offers [18].
We chose to use SVG in the development of PINV because of the atomic control over the elements: proteins (SVGPathElement) and interactions (SVGLineElement), which allows us to bind events and apply styles in the same way as any other HTML element.
There are JavaScript libraries for Canvas that allow manipulation of the elements painted on the Canvas, however the poorer quality of zooming and the lack of styling via CSS are factors that motivated us to choose SVG.
The result of not using Canvas is evident in PINV when there are a large number of elements to display (depending on the machine this can number around one thousand interactions). The fact that SVG holds the information of all the components in the DOM is memory intensive because it stores data for components that can be hidden, are out of focus or so small that they are not actually visible. In contrast, a Canvas strategy is able to deal with many more elements. For this reason we have included the study of the performance of using Canvas on PINV instead of SVG (or as a hybrid solution) in our road map for future development e.g. the use of Canvas when there are more than X elements, otherwise using SVG.
Comparison with similar tools
Most interactive online visualization tools only allow the viewing of networks that are preloaded, meaning that the user cannot view a network he/she has generated.
The online version of Graphle [19] is limited to graphs generated by bioPIXIE in yeast, MEFIT in E. coli, or HEFalMp for human data. STRING [15], a database for known and predicted protein-protein interactions, and STITCH [20], its sister project for chemical-protein interactions, integrate interaction information from various sources in addition to in-house predictions. Both projects use the same library to display their data. However, the network visualized in both databases is barely customizable. The public implementation of VisANT [21], a web-based (via Java Applets) workbench for the integrative analysis of biological networks, is based on the Predictome database. PINV on the other hand, allows the user to view preloaded data but also to load their own data.
Another alternative is Cytoscape Web [22]. However, the fact that its introduction tutorial [23] requires its users to code in JavaScript indicates that this tool is mainly intended for developers to display networks on the web, not for the scientist who has a network to visualize. In this regard, Cytoscape Web was considered as an alternative to D3 for our background technology to generate the graphics as it is closer to the biological concepts tied to this tool. Unfortunately Cytoscape Web uses Adobe Flash for the generation of the graphics, which goes against our objective of providing a native web application (i.e. developed using recent web standards). A more recent development is Cytoscape.js [24]. The principles behind this project are similar to those followed by PINV: A modern web toolset to display interactions. Nonetheless, the two projects differ from one another as Cytoscape.js is a library for programmers, while PINV is an application. A combined strategy of Cytoscape Web and its well known stand alone parent project is discussed further below.
We also compared PINV with the stand-alone version of Cytoscape by using the network in the biological example mentioned in the following section. The original network contains all interactions from the three organisms and the orthologs with a total of 165,000 interactions. The performance of Cytoscape notably exceeded PINV when dealing with a network of this size. Moreover, there are considerably more manipulation options in Cytoscape than the ones currently provided by PINV. However, Cytoscape and its many plugins need to be installed by the user and any collaborator sharing the results.
Cytoscape Web includes a showcase demo that allows files to be opened in several formats. In order to reproduce the same graphic as in Figure 3, we preprocessed the network on the stand-alone tool in order to filter it, and focus on the target orthologs. This step was necessary because Cytoscape web fails when loading a network of this size. Subsequently, we successfully uploaded the subnetwork. Filters and styles can be manipulated online but extending the subnetwork requires loading of additional data.
We are aware that PINV will also struggle to display the 165,000 interactions of the network. However, the strategy of only visualizing by request and the use of prefilters allows the user to navigate the network by limiting the graphic to the interactions of interest. In contrast Cytoscape web does not provide tools to explore a large dataset and requires the use of the stand-alone application (or other software) in order to filter and create a subnetwork.
A biological use case scenario
To illustrate the richness of information that can be conveyed using PINV graphics, we use the biological example of Mycobacterium tuberculosis (MTB), its host Homo sapiens (HS), and the two related mycobacteria species M. leprae (MLP) and M. smegmatis (MSM). In particular, we look at the DNA translocase protein FtsK (UniProt accession O33290), which coordinates cell division and chromosome segregation in MTB, and is considered a high confidence drug target. We view this protein in the context of its interaction network within MTB, its host-pathogen interactions with proteins in HS, and orthologous proteins in MSM and MLP.
Figure 3[25] shows FtsK and its interactors in MTB, along with orthologous proteins (and their interactors) in MSM and MLP. Using custom rules to control the rendering of elements, we are able to highlight various annotations such as species (by node shape and placement), interaction type (by line color), functional class (by node color) and the network property of betweenness (by node size). The higher number of orthologs shared between MTB and MLP is immediately evident in the graphic, and reflects the closer relationship between these two species, which are both slow-growing bacteria in contrast to the fast-growing MSM.
Figure 4[26] shows FtsK with its MTB interactions again, this time in the context of host-pathogen interactions with Homo sapiens proteins. A recursive mode query has been used to extract and highlight multiple connections between subnetworks in MTB and HS, specifically FtsZ’s direct interaction with HS’s C-X-C motif chemokine 13, as well as an indirect path to the same HS protein via the MTB protein Q7D8P2 and its interaction with the human tyrosine aminotransferase.
Both of these visualisations were created using simple queries and display rules, and can easily be shared in their fully interactive form with a simple URL. An example of the former graphic, embedded in a third-party web page can be found here [27].