MetNetGE: interactive views of biological networks and ontologies
© Jia et al; licensee BioMed Central Ltd. 2010
Received: 15 March 2010
Accepted: 17 September 2010
Published: 17 September 2010
Linking high-throughput experimental data with biological networks is a key step for understanding complex biological systems. Currently, visualization tools for large metabolic networks often result in a dense web of connections that is difficult to interpret biologically. The MetNetGE application organizes and visualizes biological networks in a meaningful way to improve performance and biological interpretability.
MetNetGE is an interactive visualization tool based on the Google Earth platform. MetNetGE features novel visualization techniques for pathway and ontology information display. Instead of simply showing hundreds of pathways in a complex graph, MetNetGE gives an overview of the network using the hierarchical pathway ontology using a novel layout, called the Enhanced Radial Space-Filling (ERSF) approach that allows the network to be summarized compactly. The non-tree edges in the pathway or gene ontology, which represent pathways or genes that belong to multiple categories, are linked using orbital connections in a third dimension. Biologists can easily identify highly activated pathways or gene ontology categories by mapping of summary experiment statistics such as coefficient of variation and overrepresentation values onto the visualization. After identifying such pathways, biologists can focus on the corresponding region to explore detailed pathway structure and experimental data in an aligned 3D tiered layout. In this paper, the use of MetNetGE is illustrated with pathway diagrams and data from E. coli and Arabidopsis.
MetNetGE is a visualization tool that organizes biological networks according to a hierarchical ontology structure. The ERSF technique assigns attributes in 3D space, such as color, height, and transparency, to any ontological structure. For hierarchical data, the novel ERSF layout enables the user to identify pathways or categories that are differentially regulated in particular experiments. MetNetGE also displays complex biological pathway in an aligned 3D tiered layout for exploration.
Biological Pathway Visualization
The availability of high-throughput experimental data provides new possibilities for understanding biological systems and creates new challenges for visualization tools as well. Data in high-throughput experiments can encompass thousands of RNAs, metabolites and/or polypeptides. Mapping such data onto a network that represents the interactions in an organism is essential for biologists to understand how the parts of the system influence each other and to generate data-driven new hypotheses [1–3]. Popular representations of such networks include node-link graphs and the adjacency matrixes. In the node-link graph, nodes represent genes, gene products, metabolites and reactions, and edges represent specific interactions, e.g., transcription, translation, catalysis, and different types of regulation.
A number of publicly accessible pathway databases containing data about genes, gene products, and interactions are available, e.g., BioCyc  MetNetDB  and KEGG . In order to get better insight from such vast data sets, many graph visualization tools have been developed such as Cytoscape  and VisANT . Suderman et al.  reviewed 35 such visualization tools and noted key useful features such as generation of good layouts and integration with analysis software. They also pointed out several key drawbacks of those tools. For example, the use of generic layout algorithms often produces network drawings that are very messy with many crossing edges. Furthermore, most tools can not visually represent dynamic information (e.g., gene expression data) at a large-scale, systems-wide level.
3D approaches to display networks may offer methods to display more information [9, 10], however most popular bioinformatics tools do not support 3D directly. The use of stacked 2D layouts was introduced in , where similar pathways across several species are compared. This representation is very effective at highlighting small differences between two species; however it cannot be directly applied to pathway diagrams since adjoining pathways do not have common structures.
Arena3D  puts nodes into different layers to reveal interactions between node types. BioCichlid  divides protein and genes into separate layers in 3D to look at genetic regulation. These works show the promise of using an extra dimension where the network complexity is reduced by separating the whole graph into several 2D planes. However, since BioCichlid and Arena3D compute separate layouts for each layer, edges between layers are often cluttered and difficult to follow. MetNetGE uses a 3D tiered layout for each individual pathway where the pathway algorithm aligns the layouts in each plane based on the most important plane. This helps the basic pathway structure stand out and create a clearer and easily understandable drawing.
Requirements for Visualizing Biological Ontologies
In addition to being able to visualize a network organized in a meaningful manner, biologists need an overview of broader functional categories and network performance under different experimental conditions, e.g., to be able to ask questions such as whether degradation pathways have many highly expressed genes, or which biological process categories are overrepresented in the data. The remainder of this paper will use graph terminology to describe the ontology and visualization techniques. Thus, the term "tree" means the data structure, "leaf" node means the node in the tree structure that does not have any children, and "non-leaf" node means the node with at least one child, and is not related to the organism of a plant.
Controlled vocabularies are graph-theoretical structures consisting of terms (which form the nodes of each corresponding graph) linked together by means of edges called relations . Structured hierarchical ontologies such as the Pathway Ontology (PO) and Gene Ontology (GO) impose biological relationships on the network which can aid in interpretation. For example, the PO can be used for analysis tasks such as identifying pathways that belong to multiple biological categories. Pathways that belong to multiple categories are ontological terms with multiple inheritance, which is not easily presented by most visualization tools . The GO and PO can be combined with experimental data as well to see if any categories or terms are statistically overrepresented by mapping the experimental data and its aggregated values onto the ontology of interest. After mapping, biologists need to evaluate which categories are significantly different among the experimental conditions and establish how those categories are related in the hierarchy. Based on the task types that biologists typically perform, the basic requirements for biological ontology visualization are to:
View the whole ontology on a single screen to gain a global feeling for the data and the main hierarchical structure.
View ontology details by navigation and/or interaction (zoom, pan, rotation).
Map attributes on the ontology so that they are easily visible.
Clearly show non-tree connections.
Normally, one single visualization method cannot satisfy all of the user requirements or facilitate all types of tasks. Therefore, MetNetGE links the global view of an ontology using ERSF to more traditional visualizations and data representation methods, including indented lists, parallel coordinate plots, and spreadsheets.
Related Work in Ontology Visualization
Biologists with more computer experience often use specialized Java tools, e.g., MetNetDB's PathTree, to map expression data on different ontologies, and then export the file to well established visualization platforms, e.g., Cytoscape . In Cytoscape, they can use well-known 2D layouts, e.g., hierarchic and organic, to view data. We outline the procedure to create visualizations via this method in usage scenario section as a comparison to our approach.
Treemap based systems  are able to visualize the whole GO with mapped data in one screen, and are suitable for identifying regions of interest. However, the hierarchical structure is hard to see in a treemap since it is a nesting-based layout which overplots the parent nodes with their children nodes . Another limitation of treemap is that it lacks a meaningful representation of non-tree edges, a key requirement. Although Fekete  added non-tree edges as an overlay to the treemap, this method created many edge-crossings which made the task of tracing those edges very difficult. As observed in , treemaps and other space-filling layouts normally duplicate nodes which have multiple parents. If the node being duplicated is a non-leaf node, the whole substructure rooted at this node will be duplicated as well. Thus duplicating nodes in hierarchic dataset may greatly increase a graph's visual complexity. Duplication also causes confusion for the user. For example, when user finds two regions have similar visual patterns in a treemap, they may think that they have discovered two groups of genes functioning similarly. Unfortunately, they often turn out to be the identical GO terms being drawn twice.
Besides the visualization methods mentioned above, Katifori et al.  have also presented many tools and layout algorithms to visualize ontologies and graphs in general. For example, a hyperbolic tree  can handle thousands of nodes. However, in a hyperbolic tree visualization, it is difficult to distinguish between tree and non-tree edges among hundreds of edges since they are all represented as links. Another disadvantage is that hyperbolic trees are not space efficient, and normally only a couple of pixels are used for each node. Therefore attributes (like gene expression data) mapped on nodes become hard to distinguish and interpret.
Space-filling methods are considered very space-efficient and are good for mapping attributes on node regions. Despite the disadvantages of rectangular space-filling (such as treemap), evaluations  find that radial space-filling (RSF) methods  are quite effective at preserving hierarchical relations.
Researchers in economics have utilized 3D RSF to study hierarchical time-dependent data . However, like the traditional RSF and treemaps, their method also suffers the problem in duplicating nodes for non-tree edges. We propose the enhanced RSF (ERSF) algorithm, which uses an intuitive orbit metaphor to explicitly visualize the non-tree edges, and make it appear different than the major hierarchic structure.
Contributions of MetNetGE
In order to show the overall structure of complete ontologies, MetNetGE provides a space-filling view of the biological network and maps the experimental data onto this view using the Google Earth software and API . MetNetGE is also designed to aid biologists in better understanding complex individual pathways, using a 3D tiered layout, where different entity types and interactions are located on different tiers. MetNetGE allows exploration of new patterns in the data. The contributions of the MetNetGE system to biological data visualization are:
A 3D tiered layout that shows the main pathway structure and cross layer patterns.
A novel representation and interaction based on our enhanced radial space filling (ERSF) technique with orbits for visualizing cross-links in an ontology dataset.
Methods that link the summary statistics of experimental high-throughput data to the ontology visualization.
MetNetGE is implemented in Python. The pathway and ontology drawings were created as Keyhole Markup Language (KML) files, and were loaded into Google Earth through its COM API . The graphical user interface (GUI) is written with PyQt .
The major research focus of MetNetGE is on summarizing large biological networks on a single display. There are many well-known visualization platforms that are tailored to show biological pathways, e.g., Cytoscape, VisANT, however, none of them can handle the large number of 3D geometries used in the ERSF technique. Google Earth was designed to smoothly handle large 3D geometric datasets. The GE API provides methods for controlling the level of detail and zooming, selection etc. Moreover, GE is a well established and widely used platform with a well-known user interface, thus users may not need intensive training for successful use of MetNetGE.
However, GE also has many limitations when used as an information visualization tool. For example, GE's COM API which we have been using currently does not support dynamic modification and removal of content. As a result, our program can not support many interactive actions like dragging and rearranging the network at this stage.
3D Aligned Tiered Layout
MetNetGE features a 3D layout that is both aligned and tiered. Separating the nodes into different layers according to their types (e.g., metabolite, polypeptide, RNA, or DNA) provides a visually-clearer structure on which to superimpose the data.
In the MetNetGE tiered layout, the node placement is based on the results from a user-selected major plane, rather than computing each planar layout independently. The layout of nodes occurs on the major plane first, and then other nodes are set based on their relation to the major plane nodes. For example, for the metabolic pathways, the metabolite layer serves as a natural choice of major plane. In Figure 1, a spring-embedded layout has been selected to first place nodes in the metabolite layer; the algorithm then identifies the nodes in the protein plane that connect to these metabolites so that they line up with the major plane. Next, MetNetGE places nodes that connect to these proteins. Finally, the remaining RNAs and DNAs are placed under the respective polypeptides.
Visualizing Tree Structure of Ontologies using Radial Space-Filling Methods
To summarize metabolic networks in a meaningful global view, MetNetGE employs the BioCyc pathway ontology, which hierarchically organizes the pathways as a directed acyclic graph, where many parents may point to the same child. For simplicity, we first assume the ontology is a pure tree structure which does not have any non-tree edges, and explain how the traditional radial space-filling (RSF) technique can visualize this simplified biological ontology. The next section shows how the Enhanced RSF algorithm can visualize a biological ontology which contains non-tree edges.
Tree visualization is a widely studied topic. Among all the existing tree visualization techniques, we implement the RSF  because it effectively utilizes the screen space and clearly shows the hierarchical relationships between concepts. In addition, in RSF each non-leaf node has its own region, which provides the ability to map cumulative values onto those regions.
To the best of our knowledge, MetNetGE is the first application of 3D RSF in biology and the first algorithm to visualize non-tree edges on a RSF plot.
RSF visualization of a pure tree uses the following rules:
Each circular region represents one node in the tree. The leaf nodes are placed on the edge of the drawing and the root node is placed at the center. Nodes with the same depth form one layer in the visualization, i.e. the root node forms layer 0, the nodes with depth 1 form layer 1.
Each circular region has five variables: sweeping angle, depth, radius length, height, and color.
The sweeping angle of a leaf node is determined by an attribute of the corresponding pathway. For the pathway ontology, we have set each pathway to an equal weight, thus spanning the same angle. For visualizing the gene ontology, it could also be linked to other factors, such as the number of differently expressed genes within a category.
The sweeping angle of a non-leaf node is the sum of all its children's sweeping angles.
In the initial network view, we use structure-based coloring  where the leaf node regions are colored according to the color wheel and the non-leaf node regions are colored as the weighted average of its children's color to convey the hierarchical relationships. The height of each region is set proportional to the height of the subtree rooted at that node. Since color and height only apply to the individual region of each pathway or category, we later use them to map experiment values.
Figure 2a shows a small tree with eight leaf nodes and five non-leaf nodes, labeled as graph G1. Figure 2b shows the result of using RSF in 3D on graph G1. In the PO, Non-leaf nodes correspond to pathway categories, e.g., "A" may represent the category of acid resistance, and the leaf nodes represent the pathways that relate to this function, e.g., "A2" may represent the arginine dependent acid resistance pathway. In this example, we use a uniform radius length, structure-based coloring, and map the height of the subtree to the region's height.
Visualize Ontology as Directed Acyclic Graph
As noted earlier, RSF cannot support non-tree edges which are very common in the pathway and gene ontologies. As a result, we developed the Enhanced RSF (ERSF) layout which uses orbits to represent non-tree edges.
This concept is illustrated in Figure 3. Graph G2 (Figure 3a) adds four non-tree edges to G1. The metaphor of "satellite orbits" represents such non-tree edges as circular links. For each child tree node with at least two parents, one orbit circle is drawn on the layer of that node (Figure 3b). The parent that connects the node in the spanning tree is the major parent and other parents are minor parents. The region of each node is placed under the region of its major parent. For every minor parent, a green edge from the center of its region to the orbit of the child is called the 'downlink'. The intersections between links and orbits are called access points which are represented by red dots.
To help users find and trace interesting non-tree edges, the orbits need to be distinguishable from one another. The orbits are first restricted to span in the middle area of each layer, thus leaving a visually apparent gap between orbits in adjacent layers. To distinguish orbits in the same layer, our algorithm puts them at different heights and distances from the center. The orbit with most downlinks is placed as the most distant and highest. This arrangement can help users answer questions like 'Does the aldehyde degradation pathway belong to many categories?'.
Coloring strategies also help the user visually distinguish orbits that are located on the same layer. When the orbit color is the same as the child's region, and non-leaf categories' regions are transparent, it becomes easier to see links between regions.
Figure 3 shows that visualizing the ontology using the orbit metaphor have several advantages. First, this design clearly distinguishes between spanning tree relationships and non-tree edges. Second, compared to treemaps with a crosslink overlay , there are much fewer edge-crossings. Third, all downlinks of a parent share only one link edge. Thus, the total length of those edges is the same as the length of the longest link. This property reduces the graph complexity, especially when one parent is the minor parent for many other child nodes or a child node belongs to many parent nodes.
Mapping Experimental Values onto an Ontology
The strategy of a biological scientist evaluating experimental data is to look for which parts of the network show significantly different measurements across different conditions. Questions such as 'Which pathways or categories are most changed under anaerobic stress?' can be addressed by mapping the values onto the Pathway or Gene Ontologies.
MetNetGE uses animation to show the values of a series of experiments. For instance, a time-series experiment with 7 time points is presented as an animation of 7 frames. Users can use either the time controller from Google Earth or the animation control panel in MetNetGE to control the animation.
Differentially expressed can also be mapped directly on the ontology drawing. The user first defines a threshold based on the experimental data, e.g., 0.7 fold-changes is the default suggested by collaborating biologists. Every gene with a value change more than the threshold is considered differentially expressed. A differentially expressed gene is down-regulated if the treatment value is lower; otherwise it is up-regulated.
Another important value for biologists is the statistical significance of the observed differences in the experimental data. Simple analyses of experimental data, such as p-values for over-representation of PO terms, can be calculated using Fisher's exact test. One typical working scenario is: identify a list of genes of interest that are differentially expressed between conditions, or are of particular interest; calculate p-values of over-representation of pathways and categories using a Fisher's exact test; visualize these p-values on the ontology drawing.
This section demonstrates the use of MetNetGE through illustrations of how the visualizations work along with the results of a pilot user study for the ERSF visualization of the pathway ontology. A use case comparing the standard methods for analyzing microarray expression data using ontology show the effectiveness of the method for providing a global understanding.
3D Tiered Layout
In this section, we present the 3D Tiered layout for Arabidopsis pathways. Two examples illustrate how MetNetGE enhances insight in both signaling and metabolic pathways. Figure 1 shows the ethylene biosynthesis and methionine cycle in Arabidopsis in both Cytoscape and MetNetGE. The 3D tiered layout in MetNetGE shows a clear circular structure in the top metabolite layer which illustrates the flow of mass through the metabolic cycle. The three blue edges from the protein layer indicate that three protein complexes catalyze the metabolic reactions. This structure separates out the signaling control from the metabolic flow very effectively. In Cytoscape, the same pathway is more difficult to interpret as it combines metabolic flow and regulation in a complex network.
Figure 4 shows the Arabidopsis ethylene signaling. The MetNetGE 3D tiered layout shows how one metabolite (ethylene) and one protein (erf1) have many regulation links to other layers. This linkage is not obvious in the Cytoscape view.
Use Case Comparison of ERSF and Standard Methods
This section compares how biologists can work with E. coli Pathway Ontology using an experiment on BaeSR  from the E. coli gene expression database, M3D . This case study compares a traditional workflow to the usage of the ERSF in MetNetGE to highlight differences between the approaches.
Exploring Data with Traditional Methods
The standard working method for each of our test biologists is to use three tools together: Microsoft™ Excel, PathTree , and Cytoscape . The PO is recorded in PathTree, and is represented as an expandable indented list similar to that shown in Additional file 1, Figure 2. The gene expression data is recorded in Excel files, one row for a gene, and one column for a condition. The summarized data for PO is also stored in an Excel file, where each row represents one pathway/category and each column shows one value, e.g. average gene expression or p-value.
To view the ontology structure, users need to export the PO to an XML file that is loaded into Cytoscape. Users can then use Cytoscape's layout to arrange the nodes automatically (normally using hierarchic or organic layout). However, due to the limitations of these node-link based layouts, the visual result is not comprehensible for the whole ontology (see Additional file 1, Figure 3). In addition, the indented list is not linked with Cytoscape which makes searching category or pathway difficult. However, this difficulty can be overcome by writing PathTree as a plugin for Cytoscape in the future.
Although the node-link based layout can show the non-tree edges in PO, those edges are buried in the cluttered edges, and are difficult to detect. In order to trace the related ontology terms of a given node, users need to make the surrounding edges non-overlapping by manually moving the nodes around in the local region. Due to the difficulty of those operations, users normally export only a small subset of the whole PO, e.g. only pathways from a specific category such as biosynthesis. The normal size of the output network is between 10 to 50 nodes.
Exploring Data with ERSF using MetNetGE
Using the same ontology data from E. coli, MetNetGE presents the user with the ERSF view shown in Figure 5. It is clear that the most orbits are concentrated on the third layer, and one category (methylglyoxal detoxification) contains many children in other categories because its green uplink intersects many light blue orbits.
Users can tilt the view to see the height of each region (Figure 7e). In this view, one category (unusual fatty acid biosynthesis) stands out, because it and its descendants have very high CoV and expression values. This discovery demonstrates the benefit of using 3D to show these two attributes together. Another similar interesting discovery is the pathway alanine biosynthesis III, which also has very high CoV but very low expression values.
By switching between these two conditions, we notice that most of the pathways and categories have a greenish color under the treatment, which indicates lower expression values in the treatment condition than in the controlled condition. This is an interesting trend, since in most experiments the treatments normally have greater values. To confirm this trend, we can map the difference between these two conditions directly on the ontology.
To reflect the up/down regulation property of an entire category or pathway, the ratio of the number of up-regulated genes to that of down-regulated genes is calculated and mapped onto the region's color. As a result, the reddish regions in are mainly up-regulated while greenish regions are mainly down-regulated. Other non-interesting regions (ones with few genes differentially expressed) are left transparent to let the interesting ones stand out. It is clear that there are more interesting regions than the dozens listed in supplemental Figure 6, and their relationships can be seen. We also confirmed the hypothesis that most categories are mainly down-regulated since most of the regions are greenish.
After tilting the view (as in the bottom of Figure 8), shows that among 12 superpathways, only two have many differentially expressed genes. Clicking one of these, superpathway of chorismate, gives a pop-up dialog with information about this pathway, and shows its details in the linked indented list. To get more information about the genes in this pathway one can select the "add genes" button, to add all its genes to a spreadsheet table and a parallel coordinate plot.
To enable biologists to create various views of data, MetNetGE provides customization of the mappings of color and height of each region. Moreover, biologists can easily switch between different views to gain combined knowledge.
In summary, the "differential view" approach (Figure 8) can help biologists answer critical questions like: which pathways or categories have many genes differentially expressed, or is one particular pathway mainly up-regulated?
Pilot User Testing
We conducted a qualitative pilot user test involving four users. The goal was to better understand the needs of the biologist-users and to test the effectiveness of the ERSF and MetNetGE. Users were presented with several tasks in two categories following the analysis procedure presented in the ERSF use case presented above. The goal was to see how well the users understood the ontology structure and what pathways were affected by the gene expression data. The users worked in a relaxed setting where the tasks were not timed. Users worked through those tasks with assistance as required on both the traditional method and MetNetGE, and were encouraged to think aloud during the whole procedure. The users were interviewed at completion to determine preferences and their overall impression. Among the four users, two are postdoctoral biologists in plant research, one is a graduate student in the program of Bioinformatics and Computational Biology, and one is a graduate student in Computer Science who has been involved in a biology-related project.
All users who participated in the pilot user test preferred the MetNetGE solution to the traditional one. The users cited the ability to show the whole ontology structure and see the relationship between concepts as an important feature. Visualizing the entire network is expecially useful when viewing a system-scale experimental dataset. Users found this difficult when they are viewing a small subset of the system at a time. Moreover, users generally gave up on some time-consuming tasks. For example, finding the pathways that belong to at least two categories is extremely difficult using indented lists and node-link based layouts.
The user testing also exposed a critical drawback of structure-based coloring. Users thought that the red color regions around zero degree should represent similar types of pathways, but they turned out to be pathways in completely different categories. Similarly, the spatial proximity of adjacent pathway categories led users to believe that they are similar, but in many cases they are not. As a result, a future improvement of the layout will be to give sufficient blank space between each category and use distinct color mappings for some major categories.
Animation proved useful to show trends in time-variant experiments. However, users said that the ability to rearrange a sequence of conditions and quickly switch back-and-forth between two conditions is much more important. MetNetGE provides this ability with a simple table containing all the conditions, thus users can easily click the name of the condition to see its data mapping.
When users worked with MetNetGE, the team noted that although ERSF provides a 3D view of the ontology, users mostly viewed it from the top down orientation, which is essentially a 2D ERSF layout. Therefore, when users were given the choice to map an attribute to either color or height, all of them prefered mapping the most important attribute to color. Some possible reasons include: biologists are used to traditional 2D tools, and height is hard to interpret precisely due to foreshortening . Nevertheless, the 3D view provides the benefit of mapping two variables simultaneously (color and height). This ability is important for some tasks that may lead to interesting discoveries, e.g. finding pathways that have high CoV and high expression value.
Users in our pilot study were eager to convert their own statistical analyses onto the ontology and proposed many new statistical approaches that would be useful. For example, besides the Fisher exact test currently provided by MetNetGE, users requested other models to calculate statistical significance of ontology terms. To facilitate this request, MetNetGE allows importation of a list of genes or statistical test results in a simple comma-separated-values (CSV) format. Users can then use their own data to generate values via statistical tools in R or exploRase , and visualize these on ERSF drawings.
Another drawback of MetNetGE is the lack of interactive modification of the drawing. Since Google Earth's COM API currently does not support dynamic creation and deletion of KML content, MetNetGE is constrained to show only static drawings and framed animations.
Case Study with Arabidopsis Pathway Ontology
The MetNetGE analysis found that some pathway ontology terms including sugar degradation are down-regulated in transgenic line that was not found in previous analysis (Figure 9). By selecting that ontology term, we were able to see the data from 163 genes in this ontology with parallel coordinate plot. Some genes which were clearly down-regulated (e.g. AT1G12240) were not found in previous analysis as their p-values were not small enough to be in the selected list.
Conclusions and Future Work
This work is motivated to find methods that help biologists understand changes in system-wide datasets in large metabolic networks. The key step is using existing controlled vocabularies such as the Plant Ontology and the Gene Ontology to structure the data in a metabolic pathway or a functional category context. The proposed ERSF algorithm provides easy visual identification and navigation of non-tree edges in an ontology and it allows large scale experimental data to be mapped and navigated on the context of the hierarchical structure of the ontology, which may lead to discoveries on a system level. To facilitate the study of system-level experimental data, multiple types of summary statistics can be mapped onto the ERSF visualization to characterize the behavior of groups of genes. The ERSF method is best suited for visualizing medium-sized multivariate hierarchic data (contains 100 to 1000 nodes) and with multiple inheritances.
MetNetGE also uses the 3D Aligned Tiered Layout to visualize individual pathways. This layout groups nodes into distinct layers based on node type or sub-cellular location. Instead of generating layouts for each layer independently, the nodes' positions are established on one major plane, and then the positions of other nodes are computed. This layout helps the user visualize cross-layer patterns as well as helping the main metabolic reactions to stand out. One of our ongoing works is to improve this layout to visualize more complex pathways which may contain both metabolic and signaling reactions. The natural distinction of layers can help distinguish the mass flow from the signaling and regulation in complex pathways.
Pilot user testing shows that users prefer using the global ERSF approach to their current working solutions, which have been based on indented lists and node-link layouts. Future works will include evaluations of representations for non-tree edges along with linked views of the data including pathways and statistical plots. Initial user testing shows that using an ontology structure linked to statistical graphics is powerful in real world usage. Therefore, a larger quantitative user study about the effectiveness of ERSF and the linked views is planned in the near future.
Availability and requirements
Project name: MetNetGE
Project home page: http://www.metnetge.org
Operating systems: Windows Only. Since Google Earth COM API only supports Windows systems, our program can not be used on Mac OS, Linux and any other operating systems.
Programming language: Python
Other requirements: Python 2.5 or higher; Google Earth, PyQt and other required libraries (listed in the documentation on project home page)
License: Freely available under GNU GPL license.
Restrictions to use by non-academics: None
Source code: The program's source codes are submitted as the additional file 2.
This work is supported by NSF grants #IIS0612240 and # EEC-0813570. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We would like to thank Muhieddine Kaissi for insightful discussions on software design. We are grateful to Drs. Li Ling, Dr. Siva Swaminathan and Dr. Sudhansu Dash for valuable biological input during development of this tool. Thank you to Erin Boggess for narration of the demonstration video.
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic acids research 2003, 31(1):248–250. 10.1093/nar/gkg056View ArticlePubMedPubMed CentralGoogle Scholar
- Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N: Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic acids research 2005, 33(19):6083–6089. 10.1093/nar/gki892View ArticlePubMedPubMed CentralGoogle Scholar
- Wurtele E, Li L, Berleant D, Cook D, Dickerson J, Ding J, Hofmann H, Lawrence M, Lee E, Li J, Mentzen W, Miller L, Nikolau B, Ransom N, Wang Y: MetNet: Systems Biology Software for Arabidopsis. In Concepts in Plant Metabolomics. Springer Verlag; 2007:145–158. full_textView ArticleGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T: KEGG for linking genomes to life and the environment. Nucleic acids research 2008, (36 Database):D480–484.
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 2003, 13(11):2498–2504. 10.1101/gr.1239303View ArticlePubMedPubMed CentralGoogle Scholar
- Hu Z, Ng DM, Yamada T, Chen C, Kawashima S, Mellor J, Linghu B, Kanehisa M, Stuart JM, DeLisi C: VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic acids research 2007, (35 Web Server):W625–632. 10.1093/nar/gkm295
- Suderman M, Hallett M: Tools for visually exploring biological networks. Bioinformatics (Oxford, England) 2007, 23(20):2651–2659. 10.1093/bioinformatics/btm401View ArticleGoogle Scholar
- yWorks GmbH: yFiles for Java Developer's Guide. Volume Chapter 5. Tübingen, Germany: yWorks GmbH, the diagramming company; 2010. Automatic Graph LayoutGoogle Scholar
- Rojdestvenski I: Metabolic pathways in three dimensions. Bioinformatics (Oxford, England) 2003, 19(18):2436–2441. 10.1093/bioinformatics/btg342View ArticleGoogle Scholar
- Yang Y, Engin L, Wurtele ES, Cruz-Neira C, Dickerson JA: Integration of metabolic networks and gene expression in virtual reality. Bioinformatics (Oxford, England) 2005, 21(18):3645–3650. 10.1093/bioinformatics/bti581View ArticleGoogle Scholar
- Brandes U, Dwyer T, Schreiber F: Visual Understanding of Metabolic Pathways across Organisms Using Layout in Two and a Half Dimensions. Journal of Integrative Bioinformatics 2004, 1(1):2.Google Scholar
- Pavlopoulos GA, O'Donoghue SI, Satagopam VP, Soldatos TG, Pafilis E, Schneider R: Arena3D: visualization of biological networks in 3D. BMC systems biology 2008, 2: 104. 10.1186/1752-0509-2-104View ArticlePubMedPubMed CentralGoogle Scholar
- Ishiwata RR, Morioka MS, Ogishima S, Tanaka H: BioCichlid: central dogma-based 3D visualization system of time-course microarray data on a hierarchical biological network. Bioinformatics (Oxford, England) 2009, 25(4):543–544. 10.1093/bioinformatics/btp008View ArticleGoogle Scholar
- Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C: Relations in biomedical ontologies. Genome Biol 2005, 6(5):R46. 10.1186/gb-2005-6-5-r46View ArticlePubMedPubMed CentralGoogle Scholar
- Katifori A, Halatsis C, Lepouras G, Vassilakis C, Giannopoulou E: Ontology visualization methods a survey. ACM Computing Surveys 2007, 39(4):10. 10.1145/1287620.1287621View ArticleGoogle Scholar
- Katifori A, Torou E, Vassilakis C, Lepouras G, Halatsis C: Selected results of a comparative study of four ontology visualization methods for information retrieval tasks. Research Challenges in Information Science, 2008 RCIS 2008 Second International Conference on: 2008 2008, 133–140. full_textView ArticleGoogle Scholar
- PathTree and GOTree in Miscellaneous Tools[http://metnet.vrac.iastate.edu/misc/]
- Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S: AmiGO: online access to ontology and annotation data. Bioinformatics (Oxford, England) 2009, 25(2):288–289. 10.1093/bioinformatics/btn615View ArticleGoogle Scholar
- Baehrecke EH, Dang N, Babaria K, Shneiderman B: Visualization and analysis of microarray and gene ontology data with treemaps. BMC bioinformatics 2004, 5: 84. 10.1186/1471-2105-5-84View ArticlePubMedPubMed CentralGoogle Scholar
- Day-Richter J, Harris MA, Haendel M, Lewis S: OBO-Edit--an ontology editor for biologists. Bioinformatics (Oxford, England) 2007, 23(16):2198–2200. 10.1093/bioinformatics/btm112View ArticleGoogle Scholar
- Ellson J, Gansner ER, Koutsofios E: Graphviz and dynagraph static and dynamic graph drawing tools. Technical report, AT&T Labs - Research 2003.Google Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics (Oxford, England) 2005, 21(16):3448–3449. 10.1093/bioinformatics/bti551View ArticleGoogle Scholar
- Tekusova T, Schreck T: Visualizing Time-Dependent Data in Multivariate Hierarchic Plots - Design and Evaluation of an Economic Application. Information Visualisation, 2008 IV '08: 9–11 July 2008 2008; Columbus, OHIO, USA 2008, 143–150.Google Scholar
- Fekete J, Wang D: Overlaying Graph Links on Treemaps. Information Visualization 2003 Symposium Poster Compendium, IEEE: 2003 2003, 82–83.Google Scholar
- Munzner T: Exploring Large Graphs in 3D Hyperbolic Space. IEEE Computer Graphics and Applications 1998, 18(4):18–23. 10.1109/38.689657View ArticleGoogle Scholar
- John S: An evaluation of space-filling information visualizations for depicting hierarchical structures. Volume 53. Academic Press, Inc; 2000:663–694.Google Scholar
- Yang J, Ward MO, Rundensteiner EA, Patro A: InterRing: a visual interface for navigating and manipulating hierarchies. Volume 2. Palgrave Macmillan; 2003:16–30.Google Scholar
- Jia M, Swaminathan S, Wurtele E, Dickerson J: MetNetGE: Visualizing Biological Networks in Hierarchical Views and 3D Tiered Layouts. First International Workshop on Graph Techniques for Biomedical Networks: Nov. 1–4 2009; Washington D.C., USA 2009.Google Scholar
- Google Earth COM API[http://earth.google.com/comapi/]
- PyQt Website[http://www.riverbankcomputing.co.uk/]
- Zoetendal EG, Smith AH, Sundset MA, Mackie RI: The BaeSR two-component regulatory system mediates resistance to condensed tannins in Escherichia coli. Applied and environmental microbiology 2008, 74(2):535–539. 10.1128/AEM.02271-07View ArticlePubMedPubMed CentralGoogle Scholar
- Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, Gardner TS: Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic acids research 2008, (36 Database):D866–870.Google Scholar
- Munzner T: Process and Pitfalls in Writing Information Visualization Research Papers. In Information Visualization: Human-Centered Issues and Perspectives. Springer-Verlag; 2008:134–153.View ArticleGoogle Scholar
- Lawrence M, Eun-Kyung L, Cook D, Hofmann H, Wurtele E: exploRase: Exploratory Data Analysis of Systems Biology Data. Coordinated and Multiple Views in Exploratory Visualization, 2006 Proceedings International Conference on: 2006 2006, 14–20. full_textGoogle Scholar
- Choi SY: Metabolomic and transcriptomic analysis of polyhydroxybutyrate (PHB) accumulating Arabidopsis and switchgrass: Unveiling metabolic consequnces of bioplastic accumulation in plant plastids. Ames: Iowa State Univ; 2009.Google Scholar
- Storey JD, Tibshirani R: Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods in molecular biology (Clifton, NJ) 2003, 224: 149–157.Google Scholar
- Google Earth API[http://code.google.com/apis/earth/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.