LucidDraw: Efficiently visualizing complex biochemical networks within MATLAB
 Sheng He^{1, 2, 3},
 Juan Mei^{1, 2},
 Guiyang Shi^{1, 2},
 Zhengxiang Wang^{1, 2} and
 Weijiang Li^{1, 2}Email author
DOI: 10.1186/147121051131
© He et al; licensee BioMed Central Ltd. 2010
Received: 13 July 2009
Accepted: 15 January 2010
Published: 15 January 2010
Abstract
Background
Biochemical networks play an essential role in systems biology. Rapidly growing network data and versatile research activities call for convenient visualization tools to aid intuitively perceiving abstract structures of networks and gaining insights into the functional implications of networks. There are various kinds of network visualization software, but they are usually not adequate for visual analysis of complex biological networks mainly because of the two reasons: 1) most existing drawing methods suitable for biochemical networks have high computation loads and can hardly achieve near realtime visualization; 2) available network visualization tools are designed for working in certain network modeling platforms, so they are not convenient for general analyses due to lack of broader range of readily accessible numerical utilities.
Results
We present LucidDraw as a visual analysis tool, which features (a) speed: typical biological networks with several hundreds of nodes can be drawn in a few seconds through a new layout algorithm; (b) ease of use: working within MATLAB makes it convenient to manipulate and analyze the network data using a broad spectrum of sophisticated numerical functions; (c) flexibility: layout styles and incorporation of other available information about functional modules can be controlled by users with little effort, and the output drawings are interactively modifiable.
Conclusions
Equipped with a new grid layout algorithm proposed here, LucidDraw serves as an auxiliary network analysis tool capable of visualizing complex biological networks in near realtime with controllable layout styles and drawing details. The framework of the algorithm enables easy incorporation of extra biological information, if available, to influence the output layouts with predefined node grouping features.
Background
The prevalence of computeraided technologies for modeling largescale biochemical networks causes a strong demand on visualization tools for intuitive presentation of the complex network structures. The key part of drawing a network is to place nodes in low dimensional (mostly, 2D) space such that the geometric distances between nodes reflect topological proximities described by the network. For very large complex networks involving many thousands of nodes, drawings may aim at grasping the global features, or macro characteristics, of the whole networks [1, 2], the network details are often not readable. In contrast, a typical biochemical network like a metabolic network has some hundreds of nodes, which needs the visualization to clearly show both the global features (modules) and all individual links. To meet the needs, grid layout methods are developed recently and shown to have advantages in generating compact layouts with biologically comprehensible modules of biochemical networks [3–8].
A main issue of grid layout methods is the high computational cost, which seriously limits the applications. Miyano and coworkers proposed a method termed sweep calculation to speed up layout process [6]. Biological attributes of nodes as extra input are also used to reduce the search space and yield biologically interesting layouts [3–5]. Barsky et al. use similar strategy in their software Cerebral in which nodes are placed in predefined layers according to the subcellular localizations. They also use a technique to bundle edges connected to hub nodes and improve visual effect dramatically when high degree nodes are present [3]. Recently, Cerebral is developed further as a new visualization tool for analyzing experimental data in the context of an interaction graph model [9].
Extra biological attributes like subcellular localizations can be employed as constraints of node positions and consequently decrease the computational complexity substantially. In certain cases this helps generate high quality layouts [3–5]. Nonetheless, the use of such information is confined by several factors: 1) the extra information is often unavailable or incomplete; 2) it is rather artificial to decide how to arrange the layout areas allocated for nodes with different attributes; 3) when the number of nodes with some attribute is large, good placement of these nodes relies merely on the topology. To this end, speeding up node placement without additional constraints remains still an essential problem, which is the first motivation of this work.
As more and more interests are attracted on deep research of network properties, there arises another demand for automatic visualization as an auxiliary analysis tool. Available drawing tools for biochemical networks are designed to work in certain network modeling platforms such as Cytoscape [10], PATIKA [11], VisANT [12], Cell Illustrator [13, 14], and CADLIVE [15, 16]. Because these modeling platforms are designed for specific purposes, most network analysis related numerical utilities are not provided. In this respect, a drawing tool accessible within a more versatile numerical software environment will be convenient. For example, integrated in the Bioinformatics Toolbox of MATLAB, GraphViz http://www.graphviz.org provides researchers a way to visualize networks while making use of powerful numerical analysis functions of MATLAB. However, the implemented general graph drawing algorithms of GraphViz are usually not adequate to produce satisfactory drawings for complex biochemical networks. This is another motivation of this work.
In this paper, we present our solution, LucidDraw, for easy and quick visualization of complex biochemical networks. The tool is powered by a new grid layout algorithm and accessible from within MATLAB.
Implementation
The cost function and the weight matrix
A network layout is a configuration of the nodes and edges properly placed on a 2D plane. Generally, all nodes are represented as points without regard to their sizes and all edges are drawn as straight lines. Under such a drawing convention, a layout is fully described by the nodes' coordinates, denoted by R = (r_{1}, r_{2}, ..., r_{ n }), where n is the number of nodes and r_{ i }= (x_{ i }, y_{ i }) the coordinates. Because nodes are placed on grid points, all x_{ i }and y_{ i }are forced to be integers.
where w_{ ij }is the interaction weight of nodes i and j, which describes the way nodes interplay. The weights between all node pairs constitute the weight matrix. The term d_{ ij }is the Manhattan distance between nodes i and j. For detailed explanations about the design principles of the cost function, please refer to Ref [7].
The layout algorithm
The layout algorithm aims to find the best layout by optimizing the cost function, which can be described as follows:
Set R to a random layout
Repeat the following steps for niter times
Generate R' by perturbing R
Locally optimize R'
If cost(R')< cost(R), set R = R'
(Otherwise, R remains unchanged.)
End repeat
Output R as the final result
At beginning, a random layout R is set as the initial state, then the algorithm optimizes R through a neighborhoodtest procedure that repeatedly tries to move every single node to its adjacent vacant sites to lower down the cost score. As neighborhoodtest proceeds, the layout eventually arrives at such a state that its quality cannot be further improved by moving any single nodes, i.e., the cost function attains a local minimum. To fully optimize R, the layout should be managed to escape from the local minimum. For this reason, the algorithm perturbs the layout by moving each node with a given probability p to a randomly chosen neighboring location. The perturbed layout is then set to the neighborhoodtest procedure. When this reoptimizationafterperturbation process repeats sufficiently many times, the layout becomes hopefully satisfactory and the whole computation ends.
An important feature of the algorithm is that it uses a simple global search strategy relies on the perturbation probability p. When p = 0, no node is perturbed, the output layout remains unchanged. When p = 1, all nodes change their positions, the output layout is little related to the input. For 0<p < 1, some parts of the input layout are unchanged, or "memorized". Heavy perturbations (i.e., perturbations with large p) lead to significant losses of previous optimization efforts, and consequently the reoptimization will demand relatively high computational expense. In practice, however, the performance is not very sensitive to p; moderate values, say, 0.30.7, work usually well. In LucidDraw, the default value of p is set to 0.7.
Generally, computation speed and layout quality are largely controlled by niter, the number of iterations. A small niter is obviously preferred for computation speed but usually results in relatively low quality of layouts. Though layout quality benefits from more iterations, very large niter is usually not necessary because as the optimization proceeds, better layouts are harder and harder to obtain by reoptimizationafterperturbation. To balance effort and gain, the whole layout process should stop when search efficiency becomes very low. In practice, a moderate value of niter = 60 is usually enough to generate satisfied layouts.
Computational complexity
The graphical user interface
Treatment of node labels
Node labels are necessary to comprehend network structures shown graphically. To display labels appropriately is not trivial because for drawings of large biochemical networks, room for labels is limited and hence incautious label placement usually causes additional visual complexity. It is usually not satisfied to show all labels simultaneously due to overlaps of labels and nodes. Barsky et al. [9] use a greedy method to select as many as possible labels to display without label overlaps, featuring an advantage that more labels are shown at higher zoom levels.
In this work we use three kinds of labels to avoid increasing much visual complexity while making desired node information readable. The first kind is the engraved labels that are shown within the node symbols if the space is large enough. The second kind is the floating labels. A floating label is automatically shown when the mouse pointer is hovering over a node, and disappears when the mouse is moved away. The third kind is the mandatory labels that are statically shown for the rightclicked nodes, staying displayed until the zoom level is changed or the "clear labels" button is pressed.
Results
For maximal computation speed, the layout algorithm was implemented in C++ and compiled into a .mexw32 file to work in MATLAB. The GUI for displaying layout results and controlling drawing details were written in Java based on the JGraph library. All executables can be used seamlessly in conjunction with MATLAB with a few auxiliary MATLAB programs, providing users a convenient way to visually analyze complex networks.
Network data and example layouts
Network analysis with the help of LucidDraw
Discussion
A good layout algorithm depends on two factors: a proper cost function and an efficient optimization method. LucidDraw adopts a similar cost function as the previous work [7] but a new optimization procedure with much higher efficiency. With the search area of every node reducing drastically, the neighborhoodtest method greatly lowers the computational cost. To fully optimize the cost function, the reoptimizationafterperturbation strategy is used to force the layout to escape from current local minimum and search for better layouts. The perturbation strategy, despite its simplicity, achieves rather good performance comparing to other sophisticated heuristics like simulated annealing. The technique was also employed in other discrete global optimization problems [22, 23]. Together with the neighbourhoodtest approach, the technique speeds up the layout process dramatically and makes it possible for LucidDraw to serve as an instant visualization tool in the context of a wide range of network analysis tasks. The effect of the optimization strategy is substantial. For a network with 677 nodes, our new algorithm takes ~30 sec to generate an acceptable layout; while our previous algorithm [7] needs >3 hr CPU time and a large amount (~1 GB) of memory. Another available grid layout software, Cerebral [3], can produce a layered layout in ~3 min with the prerequisite that all nodes of the entire network are already divided into appropriate groups, and the order of the layers is provided in advance by the user.
For ease of use in case of large networks, LucidDraw provides a comprehensive solution to aid users to get node information conveniently through three kinds of labels. As comparison, other network modeling tools have fewer choices to display node labels. For instance, Cytoscape [10], VisAnt [12], and YANAsquare [24] use two labeling methods (engraved and floating); VANTED [25] uses only engraved labels.
In LucidDraw, we design more flexible weight matrices and provide three elaborated evaluation schemes of the weight matrix through extensive experiments. Compared to previous work implemented for network modeling software, LucidDraw also provides flexibility to make customized drawings to aid visual network analysis with the help of the powerful numeric capabilities of MATLAB.
LucidDraw does not depend on predefined module information to produce layouts with nodes belonging to the same modules located closely (Figures 6(AC)). This does not exclude the possibility to use the module data; instead, such data are easy to incorporate through modifying the weights to force nodes to distribute with desired position propensities (Figure 6(D)).
It should be noted that some network modeling software such as Cytoscape [10] and VANTED [25] provide grid based visualizations, but the underlying layout methods are obviously different from ours. For comparison, please refer to Additional file 1. A remained issue of LucidDraw is the edgenode crossings which occur occasionally but indeed confuse the relations between a few nodes. To relieve the problem, Miyano and coworkers introduced penalty terms in the cost function [4, 5] at the expense of higher computational complexity. Another feasible choice is to use curved edges [3]. It should be noticed that a thorough solution of the edgenode crossing problem must take node sizes into account, which is a future direction of this work.
Conclusions
We present a MATLAB tool, LucidDraw, to meet the needs of convenient visulization of complex biochemical networks. The tool is fully accessible within MATLAB and capable of drawing typical networks in seconds with appropriately separated modules in a compact space. Users can control layout styles, drawing details, as well as extra biological attributes to get sufficiently customized drawings.
Availability and Requirements

Project name: LucidDraw

Project home page: http://bioinf.jiangnan.edu.cn

Operating system (s): Windows (32bit version)

Programming language: Java, C++

Other requirements: MATLAB 7.5 (32bit version), Java 1.6

License: Free for noncommercial use.
The LucidDraw programs and sample data are given in Additional file 2. A demonstration video is provided in Additional file 3. Latest software and more example networks can be found at http://bioinf.jiangnan.edu.cn.
Declarations
Acknowledgements
We thank the support of State HighTech Development Program of China (no. 2006AA020204).
Authors’ Affiliations
References
 Hashimoto T, Nagasaki M, Kojima K, Miyano S: BFL: a node and edge betweenness based fast layout algorithm for large scale networks. BMC Bioinformatics 2009, 10: 19. 10.1186/147121051019View ArticlePubMedPubMed CentralGoogle Scholar
 Li W, Kurata H: Visualizing Global Properties of Large Complex Networks. PLoS ONE 2008, 3(7):e2541. 10.1371/journal.pone.0002541View ArticlePubMedPubMed CentralGoogle Scholar
 Barsky A, Gardy JL, Hancock REW, Munzner T: Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 2007, 23(8):1040–1042. 10.1093/bioinformatics/btm057View ArticlePubMedGoogle Scholar
 Kato M, Nagasaki M, Doi A, Miyano S: Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform 2005, 16(2):22–31.PubMedGoogle Scholar
 Kojima K, Nagasaki M, Jeong E, Kato M, Miyano S: An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics 2007, 8: 76. 10.1186/14712105876View ArticlePubMedPubMed CentralGoogle Scholar
 Kojima K, Nagasaki M, Miyano S: Fast grid layout algorithm for biological networks with sweep calculation. Bioinformatics 2008, 24(12):1433–1441. 10.1093/bioinformatics/btn196View ArticlePubMedGoogle Scholar
 Li W, Kurata H: A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics 2005, 21(9):2036–2042. 10.1093/bioinformatics/bti290View ArticlePubMedGoogle Scholar
 Suderman M, Hallett M: Tools for visually exploring biological networks. Bioinformatics 2007, 23(20):2651–2659. 10.1093/bioinformatics/btm401View ArticlePubMedGoogle Scholar
 Barsky A, Munzner T, Gardy J, Kincaid R: Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. IEEE transactions on visualization and computer graphics 2008, 14(6):1253–1260. 10.1109/TVCG.2008.117View ArticlePubMedGoogle Scholar
 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research 2003, 13(11):2498–2504. 10.1101/gr.1239303View ArticlePubMedPubMed CentralGoogle Scholar
 Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, CetinAtalay R, Ozturk M: PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics 2002, 18(7):996–1003. 10.1093/bioinformatics/18.7.996View ArticlePubMedGoogle Scholar
 Hu Z, Mellor J, Wu J, DeLisi C: VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics 2004, 5: 17. 10.1186/14712105517View ArticlePubMedPubMed CentralGoogle Scholar
 Nagasaki M, Doi A, Matsuno H, Miyano S: Genomic Object Net: I. A platform for modelling and simulating biopathways. Applied Bioinformatics 2003, 2(3):181–184.PubMedGoogle Scholar
 Doi A, Nagasaki M, Fujita S, Matsuno H, Miyano S: Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension. Applied Bioinformatics 2003, 2(3):185–188.PubMedGoogle Scholar
 Kurata H, Masaki K, Sumida Y, Iwasaki R: CADLIVE dynamic simulator: Direct link of biochemical networks to dynamic models. Genome Research 2005, 15(4):590–600. 10.1101/gr.3463705View ArticlePubMedPubMed CentralGoogle Scholar
 Kurata H, Matoba N, Shimizu N: CADLIVE for constructing a largescale biochemical network based on a simulationdirected notation and its application to yeast cell cycle. Nucleic Acids Research 2003, 31(14):4071–4084. 10.1093/nar/gkg461View ArticlePubMedPubMed CentralGoogle Scholar
 Oberhardt MA, Puchalka J, Fryer KE, Martins dos Santos VAP, Papin JA: GenomeScale Metabolic Network Analysis of the Opportunistic Pathogen Pseudomonas aeruginosa PAO1. Journal of Bacteriology 2008, 190(8):2790–2803. 10.1128/JB.0158307View ArticlePubMedPubMed CentralGoogle Scholar
 Holme P, Huss M, Jeong H: Subnetwork hierarchies of biochemical pathways. Bioinformatics 2003, 19(4):532–538. 10.1093/bioinformatics/btg033View ArticlePubMedGoogle Scholar
 Barry CE: Interpreting cell wall 'virulence factors' of Mycobacterium tuberculosis. Trends in Microbiology 2001, 9(5):237–241. 10.1016/S0966842X(01)020182View ArticlePubMedGoogle Scholar
 Bhave DP, MuseIII WB, Carroll KS: Drug Targets in Mycobacterial Sulfur Metabolism. Infect Disord Drug Targets 2007, 7(2):140–158. 10.2174/187152607781001772View ArticlePubMedPubMed CentralGoogle Scholar
 Jain M, Petzold CJ, Schelle MW, Leavell MD, Mougous JD, Bertozzi CR, Leary JA, Cox JS: Lipidomics reveals control of Mycobacterium tuberculosis virulence lipids via metabolic coupling. PNAS 2007, 104(12):5133–5138. 10.1073/pnas.0610634104View ArticlePubMedPubMed CentralGoogle Scholar
 Zhipeng L, JinKao H: A Critical ElementGuided Perturbation Strategy for Iterated Local Search. In Proceedings of the 9th European Conference on Evolutionary Computation in Combinatorial Optimization. Tübingen, Germany: SpringerVerlag; 2009:1–12.Google Scholar
 Mei J, He S, Shi G, Wang Z, Li W: Revealing network communities through modularity maximization by a contractiondilation method. New Journal of Physics 2009, 11: 043025. 10.1088/13672630/11/4/043025View ArticleGoogle Scholar
 Schwarz R, Liang C, Kaleta C, Kuhnel M, Hoffmann E, Kuznetsov S, Hecker M, Griffiths G, Schuster S, Dandekar T: Integrated network reconstruction, visualization and analysis using YANAsquare. BMC Bioinformatics 2007, 8: 313. 10.1186/147121058313View ArticlePubMedPubMed CentralGoogle Scholar
 Junker B, Klukas C, Schreiber F: VANTED: A system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics 2006, 7: 109. 10.1186/147121057109View ArticlePubMedPubMed CentralGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.