LucidDraw: Efficiently visualizing complex biochemical networks within MATLAB
© He et al; licensee BioMed Central Ltd. 2010
Received: 13 July 2009
Accepted: 15 January 2010
Published: 15 January 2010
Biochemical networks play an essential role in systems biology. Rapidly growing network data and versatile research activities call for convenient visualization tools to aid intuitively perceiving abstract structures of networks and gaining insights into the functional implications of networks. There are various kinds of network visualization software, but they are usually not adequate for visual analysis of complex biological networks mainly because of the two reasons: 1) most existing drawing methods suitable for biochemical networks have high computation loads and can hardly achieve near real-time visualization; 2) available network visualization tools are designed for working in certain network modeling platforms, so they are not convenient for general analyses due to lack of broader range of readily accessible numerical utilities.
We present LucidDraw as a visual analysis tool, which features (a) speed: typical biological networks with several hundreds of nodes can be drawn in a few seconds through a new layout algorithm; (b) ease of use: working within MATLAB makes it convenient to manipulate and analyze the network data using a broad spectrum of sophisticated numerical functions; (c) flexibility: layout styles and incorporation of other available information about functional modules can be controlled by users with little effort, and the output drawings are interactively modifiable.
Equipped with a new grid layout algorithm proposed here, LucidDraw serves as an auxiliary network analysis tool capable of visualizing complex biological networks in near real-time with controllable layout styles and drawing details. The framework of the algorithm enables easy incorporation of extra biological information, if available, to influence the output layouts with predefined node grouping features.
The prevalence of computer-aided technologies for modeling large-scale biochemical networks causes a strong demand on visualization tools for intuitive presentation of the complex network structures. The key part of drawing a network is to place nodes in low dimensional (mostly, 2D) space such that the geometric distances between nodes reflect topological proximities described by the network. For very large complex networks involving many thousands of nodes, drawings may aim at grasping the global features, or macro characteristics, of the whole networks [1, 2], the network details are often not readable. In contrast, a typical biochemical network like a metabolic network has some hundreds of nodes, which needs the visualization to clearly show both the global features (modules) and all individual links. To meet the needs, grid layout methods are developed recently and shown to have advantages in generating compact layouts with biologically comprehensible modules of biochemical networks [3–8].
A main issue of grid layout methods is the high computational cost, which seriously limits the applications. Miyano and co-workers proposed a method termed sweep calculation to speed up layout process . Biological attributes of nodes as extra input are also used to reduce the search space and yield biologically interesting layouts [3–5]. Barsky et al. use similar strategy in their software Cerebral in which nodes are placed in predefined layers according to the subcellular localizations. They also use a technique to bundle edges connected to hub nodes and improve visual effect dramatically when high degree nodes are present . Recently, Cerebral is developed further as a new visualization tool for analyzing experimental data in the context of an interaction graph model .
Extra biological attributes like subcellular localizations can be employed as constraints of node positions and consequently decrease the computational complexity substantially. In certain cases this helps generate high quality layouts [3–5]. Nonetheless, the use of such information is confined by several factors: 1) the extra information is often unavailable or incomplete; 2) it is rather artificial to decide how to arrange the layout areas allocated for nodes with different attributes; 3) when the number of nodes with some attribute is large, good placement of these nodes relies merely on the topology. To this end, speeding up node placement without additional constraints remains still an essential problem, which is the first motivation of this work.
As more and more interests are attracted on deep research of network properties, there arises another demand for automatic visualization as an auxiliary analysis tool. Available drawing tools for biochemical networks are designed to work in certain network modeling platforms such as Cytoscape , PATIKA , VisANT , Cell Illustrator [13, 14], and CADLIVE [15, 16]. Because these modeling platforms are designed for specific purposes, most network analysis related numerical utilities are not provided. In this respect, a drawing tool accessible within a more versatile numerical software environment will be convenient. For example, integrated in the Bioinformatics Toolbox of MATLAB, GraphViz http://www.graphviz.org provides researchers a way to visualize networks while making use of powerful numerical analysis functions of MATLAB. However, the implemented general graph drawing algorithms of GraphViz are usually not adequate to produce satisfactory drawings for complex biochemical networks. This is another motivation of this work.
In this paper, we present our solution, LucidDraw, for easy and quick visualization of complex biochemical networks. The tool is powered by a new grid layout algorithm and accessible from within MATLAB.
The cost function and the weight matrix
A network layout is a configuration of the nodes and edges properly placed on a 2D plane. Generally, all nodes are represented as points without regard to their sizes and all edges are drawn as straight lines. Under such a drawing convention, a layout is fully described by the nodes' coordinates, denoted by R = (r1, r2, ..., r n ), where n is the number of nodes and r i = (x i , y i ) the coordinates. Because nodes are placed on grid points, all x i and y i are forced to be integers.
where w ij is the interaction weight of nodes i and j, which describes the way nodes interplay. The weights between all node pairs constitute the weight matrix. The term d ij is the Manhattan distance between nodes i and j. For detailed explanations about the design principles of the cost function, please refer to Ref .
The layout algorithm
The layout algorithm aims to find the best layout by optimizing the cost function, which can be described as follows:
Set R to a random layout
Repeat the following steps for niter times
Generate R' by perturbing R
Locally optimize R'
If cost(R')< cost(R), set R = R'
(Otherwise, R remains unchanged.)
Output R as the final result
At beginning, a random layout R is set as the initial state, then the algorithm optimizes R through a neighborhood-test procedure that repeatedly tries to move every single node to its adjacent vacant sites to lower down the cost score. As neighborhood-test proceeds, the layout eventually arrives at such a state that its quality cannot be further improved by moving any single nodes, i.e., the cost function attains a local minimum. To fully optimize R, the layout should be managed to escape from the local minimum. For this reason, the algorithm perturbs the layout by moving each node with a given probability p to a randomly chosen neighboring location. The perturbed layout is then set to the neighborhood-test procedure. When this re-optimization-after-perturbation process repeats sufficiently many times, the layout becomes hopefully satisfactory and the whole computation ends.
An important feature of the algorithm is that it uses a simple global search strategy relies on the perturbation probability p. When p = 0, no node is perturbed, the output layout remains unchanged. When p = 1, all nodes change their positions, the output layout is little related to the input. For 0<p < 1, some parts of the input layout are unchanged, or "memorized". Heavy perturbations (i.e., perturbations with large p) lead to significant losses of previous optimization efforts, and consequently the re-optimization will demand relatively high computational expense. In practice, however, the performance is not very sensitive to p; moderate values, say, 0.3-0.7, work usually well. In LucidDraw, the default value of p is set to 0.7.
Generally, computation speed and layout quality are largely controlled by niter, the number of iterations. A small niter is obviously preferred for computation speed but usually results in relatively low quality of layouts. Though layout quality benefits from more iterations, very large niter is usually not necessary because as the optimization proceeds, better layouts are harder and harder to obtain by re-optimization-after-perturbation. To balance effort and gain, the whole layout process should stop when search efficiency becomes very low. In practice, a moderate value of niter = 60 is usually enough to generate satisfied layouts.
The graphical user interface
Treatment of node labels
Node labels are necessary to comprehend network structures shown graphically. To display labels appropriately is not trivial because for drawings of large biochemical networks, room for labels is limited and hence incautious label placement usually causes additional visual complexity. It is usually not satisfied to show all labels simultaneously due to overlaps of labels and nodes. Barsky et al.  use a greedy method to select as many as possible labels to display without label overlaps, featuring an advantage that more labels are shown at higher zoom levels.
In this work we use three kinds of labels to avoid increasing much visual complexity while making desired node information readable. The first kind is the engraved labels that are shown within the node symbols if the space is large enough. The second kind is the floating labels. A floating label is automatically shown when the mouse pointer is hovering over a node, and disappears when the mouse is moved away. The third kind is the mandatory labels that are statically shown for the right-clicked nodes, staying displayed until the zoom level is changed or the "clear labels" button is pressed.
For maximal computation speed, the layout algorithm was implemented in C++ and compiled into a .mexw32 file to work in MATLAB. The GUI for displaying layout results and controlling drawing details were written in Java based on the JGraph library. All executables can be used seamlessly in conjunction with MATLAB with a few auxiliary MATLAB programs, providing users a convenient way to visually analyze complex networks.
Network data and example layouts
Network analysis with the help of LucidDraw
A good layout algorithm depends on two factors: a proper cost function and an efficient optimization method. LucidDraw adopts a similar cost function as the previous work  but a new optimization procedure with much higher efficiency. With the search area of every node reducing drastically, the neighborhood-test method greatly lowers the computational cost. To fully optimize the cost function, the re-optimization-after-perturbation strategy is used to force the layout to escape from current local minimum and search for better layouts. The perturbation strategy, despite its simplicity, achieves rather good performance comparing to other sophisticated heuristics like simulated annealing. The technique was also employed in other discrete global optimization problems [22, 23]. Together with the neighbourhood-test approach, the technique speeds up the layout process dramatically and makes it possible for LucidDraw to serve as an instant visualization tool in the context of a wide range of network analysis tasks. The effect of the optimization strategy is substantial. For a network with 677 nodes, our new algorithm takes ~30 sec to generate an acceptable layout; while our previous algorithm  needs >3 hr CPU time and a large amount (~1 GB) of memory. Another available grid layout software, Cerebral , can produce a layered layout in ~3 min with the prerequisite that all nodes of the entire network are already divided into appropriate groups, and the order of the layers is provided in advance by the user.
For ease of use in case of large networks, LucidDraw provides a comprehensive solution to aid users to get node information conveniently through three kinds of labels. As comparison, other network modeling tools have fewer choices to display node labels. For instance, Cytoscape , VisAnt , and YANAsquare  use two labeling methods (engraved and floating); VANTED  uses only engraved labels.
In LucidDraw, we design more flexible weight matrices and provide three elaborated evaluation schemes of the weight matrix through extensive experiments. Compared to previous work implemented for network modeling software, LucidDraw also provides flexibility to make customized drawings to aid visual network analysis with the help of the powerful numeric capabilities of MATLAB.
LucidDraw does not depend on predefined module information to produce layouts with nodes belonging to the same modules located closely (Figures 6(A-C)). This does not exclude the possibility to use the module data; instead, such data are easy to incorporate through modifying the weights to force nodes to distribute with desired position propensities (Figure 6(D)).
It should be noted that some network modeling software such as Cytoscape  and VANTED  provide grid based visualizations, but the underlying layout methods are obviously different from ours. For comparison, please refer to Additional file 1. A remained issue of LucidDraw is the edge-node crossings which occur occasionally but indeed confuse the relations between a few nodes. To relieve the problem, Miyano and co-workers introduced penalty terms in the cost function [4, 5] at the expense of higher computational complexity. Another feasible choice is to use curved edges . It should be noticed that a thorough solution of the edge-node crossing problem must take node sizes into account, which is a future direction of this work.
We present a MATLAB tool, LucidDraw, to meet the needs of convenient visulization of complex biochemical networks. The tool is fully accessible within MATLAB and capable of drawing typical networks in seconds with appropriately separated modules in a compact space. Users can control layout styles, drawing details, as well as extra biological attributes to get sufficiently customized drawings.
Availability and Requirements
Project name: LucidDraw
Project home page: http://bioinf.jiangnan.edu.cn
Operating system (s): Windows (32bit version)
Programming language: Java, C++
Other requirements: MATLAB 7.5 (32bit version), Java 1.6
License: Free for non-commercial use.
The LucidDraw programs and sample data are given in Additional file 2. A demonstration video is provided in Additional file 3. Latest software and more example networks can be found at http://bioinf.jiangnan.edu.cn.
We thank the support of State High-Tech Development Program of China (no. 2006AA020204).
- Hashimoto T, Nagasaki M, Kojima K, Miyano S: BFL: a node and edge betweenness based fast layout algorithm for large scale networks. BMC Bioinformatics 2009, 10: 19. 10.1186/1471-2105-10-19View ArticlePubMedPubMed CentralGoogle Scholar
- Li W, Kurata H: Visualizing Global Properties of Large Complex Networks. PLoS ONE 2008, 3(7):e2541. 10.1371/journal.pone.0002541View ArticlePubMedPubMed CentralGoogle Scholar
- Barsky A, Gardy JL, Hancock REW, Munzner T: Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 2007, 23(8):1040–1042. 10.1093/bioinformatics/btm057View ArticlePubMedGoogle Scholar
- Kato M, Nagasaki M, Doi A, Miyano S: Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform 2005, 16(2):22–31.PubMedGoogle Scholar
- Kojima K, Nagasaki M, Jeong E, Kato M, Miyano S: An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics 2007, 8: 76. 10.1186/1471-2105-8-76View ArticlePubMedPubMed CentralGoogle Scholar
- Kojima K, Nagasaki M, Miyano S: Fast grid layout algorithm for biological networks with sweep calculation. Bioinformatics 2008, 24(12):1433–1441. 10.1093/bioinformatics/btn196View ArticlePubMedGoogle Scholar
- Li W, Kurata H: A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics 2005, 21(9):2036–2042. 10.1093/bioinformatics/bti290View ArticlePubMedGoogle Scholar
- Suderman M, Hallett M: Tools for visually exploring biological networks. Bioinformatics 2007, 23(20):2651–2659. 10.1093/bioinformatics/btm401View ArticlePubMedGoogle Scholar
- Barsky A, Munzner T, Gardy J, Kincaid R: Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. IEEE transactions on visualization and computer graphics 2008, 14(6):1253–1260. 10.1109/TVCG.2008.117View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research 2003, 13(11):2498–2504. 10.1101/gr.1239303View ArticlePubMedPubMed CentralGoogle Scholar
- Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, Cetin-Atalay R, Ozturk M: PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics 2002, 18(7):996–1003. 10.1093/bioinformatics/18.7.996View ArticlePubMedGoogle Scholar
- Hu Z, Mellor J, Wu J, DeLisi C: VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics 2004, 5: 17. 10.1186/1471-2105-5-17View ArticlePubMedPubMed CentralGoogle Scholar
- Nagasaki M, Doi A, Matsuno H, Miyano S: Genomic Object Net: I. A platform for modelling and simulating biopathways. Applied Bioinformatics 2003, 2(3):181–184.PubMedGoogle Scholar
- Doi A, Nagasaki M, Fujita S, Matsuno H, Miyano S: Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension. Applied Bioinformatics 2003, 2(3):185–188.PubMedGoogle Scholar
- Kurata H, Masaki K, Sumida Y, Iwasaki R: CADLIVE dynamic simulator: Direct link of biochemical networks to dynamic models. Genome Research 2005, 15(4):590–600. 10.1101/gr.3463705View ArticlePubMedPubMed CentralGoogle Scholar
- Kurata H, Matoba N, Shimizu N: CADLIVE for constructing a large-scale biochemical network based on a simulation-directed notation and its application to yeast cell cycle. Nucleic Acids Research 2003, 31(14):4071–4084. 10.1093/nar/gkg461View ArticlePubMedPubMed CentralGoogle Scholar
- Oberhardt MA, Puchalka J, Fryer KE, Martins dos Santos VAP, Papin JA: Genome-Scale Metabolic Network Analysis of the Opportunistic Pathogen Pseudomonas aeruginosa PAO1. Journal of Bacteriology 2008, 190(8):2790–2803. 10.1128/JB.01583-07View ArticlePubMedPubMed CentralGoogle Scholar
- Holme P, Huss M, Jeong H: Subnetwork hierarchies of biochemical pathways. Bioinformatics 2003, 19(4):532–538. 10.1093/bioinformatics/btg033View ArticlePubMedGoogle Scholar
- Barry CE: Interpreting cell wall 'virulence factors' of Mycobacterium tuberculosis. Trends in Microbiology 2001, 9(5):237–241. 10.1016/S0966-842X(01)02018-2View ArticlePubMedGoogle Scholar
- Bhave DP, MuseIII WB, Carroll KS: Drug Targets in Mycobacterial Sulfur Metabolism. Infect Disord Drug Targets 2007, 7(2):140–158. 10.2174/187152607781001772View ArticlePubMedPubMed CentralGoogle Scholar
- Jain M, Petzold CJ, Schelle MW, Leavell MD, Mougous JD, Bertozzi CR, Leary JA, Cox JS: Lipidomics reveals control of Mycobacterium tuberculosis virulence lipids via metabolic coupling. PNAS 2007, 104(12):5133–5138. 10.1073/pnas.0610634104View ArticlePubMedPubMed CentralGoogle Scholar
- Zhipeng L, Jin-Kao H: A Critical Element-Guided Perturbation Strategy for Iterated Local Search. In Proceedings of the 9th European Conference on Evolutionary Computation in Combinatorial Optimization. Tübingen, Germany: Springer-Verlag; 2009:1–12.Google Scholar
- Mei J, He S, Shi G, Wang Z, Li W: Revealing network communities through modularity maximization by a contraction-dilation method. New Journal of Physics 2009, 11: 043025. 10.1088/1367-2630/11/4/043025View ArticleGoogle Scholar
- Schwarz R, Liang C, Kaleta C, Kuhnel M, Hoffmann E, Kuznetsov S, Hecker M, Griffiths G, Schuster S, Dandekar T: Integrated network reconstruction, visualization and analysis using YANAsquare. BMC Bioinformatics 2007, 8: 313. 10.1186/1471-2105-8-313View ArticlePubMedPubMed CentralGoogle Scholar
- Junker B, Klukas C, Schreiber F: VANTED: A system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics 2006, 7: 109. 10.1186/1471-2105-7-109View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.