SpectralNET – an application for spectral graph analysis and visualization
© Forman et al; licensee BioMed Central Ltd. 2005
Received: 01 June 2005
Accepted: 19 October 2005
Published: 19 October 2005
Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that can carry different weights. SpectralNET is a flexible application for analyzing and visualizing these biological and chemical networks.
Available both as a standalone .NET executable and as an ASP.NET web application, SpectralNET was designed specifically with the analysis of graph-theoretic metrics in mind, a computational task not easily accessible using currently available applications. Users can choose either to upload a network for analysis using a variety of input formats, or to have SpectralNET generate an idealized random network for comparison to a real-world dataset. Whichever graph-generation method is used, SpectralNET displays detailed information about each connected component of the graph, including graphs of degree distribution, clustering coefficient by degree, and average distance by degree. In addition, extensive information about the selected vertex is shown, including degree, clustering coefficient, various distance metrics, and the corresponding components of the adjacency, Laplacian, and normalized Laplacian eigenvectors. SpectralNET also displays several graph visualizations, including a linear dimensionality reduction for uploaded datasets (Principal Components Analysis) and a non-linear dimensionality reduction that provides an elegant view of global graph structure (Laplacian eigenvectors).
SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. SpectralNET is publicly available as both a .NET application and an ASP.NET web application from http://chembank.broad.harvard.edu/resources/. Source code is available upon request.
The field of graph theory concerns itself with the formal study of graphs – structures containing vertices and edges linking these vertices. Scientifically, graphs can be used to represent networks embodying many different relationships among data, including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as nodes (vertices) and interactions (edges) that can carry different weights.
Graph-theoretic metrics, including eigenspectra, have been used to analyze diverse sets of data in the fields of computational chemistry and bioinformatics. Protein-protein interaction networks in Saccharomyces cerevisiae, for example, have been shown to exhibit scale-free properties , and databases of mRNAs can be mined using spectral properties of graphs created by their secondary structure . Graph theory has also been used in conjunction with combinations of small-molecule probes to derive signatures of biological states using chemical-genomic profiling .
Despite the widespread use of graph theory in these fields, however, there are few user-friendly tools for analyzing network properties. SpectralNET is a graphical application that calculates a wide variety of graph-theoretic metrics, including eigenvalues and eigenvectors of the adjacency matrix (a simple matrix representation of the nodes and edges of a graph) , Laplacian matrix , and normalized Laplacian matrix, for networks that are either randomly generated or uploaded by the user. SpectralNET is available both as an ASP.NET web application and as a standalone .NET executable. While SpectralNET was originally written to analyze chemical genetic assay data, it should be of use to any researcher interested in graph-theoretic metrics and eigenspectra.
SpectralNET was originally written as an ASP.NET application in C#, and has subsequently been ported to a standalone .NET executable version (also written in C#). ASP.NET was originally chosen because it offered a fast, easy way to offer a thin client to users, obviating the need for large amounts of computational power on the client machine, as is often needed to perform large matrix calculations. A standalone version was created for three primary reasons: it avoids the problem of time-outs inherent when using a web interface (a potential issue when performing long-duration calculations), it is more easily distributable, and porting from ASP.NET to a .NET executable is a relatively simple matter.
Many computations are performed directly in C#, such as graph instantiation and metric calculation. Matrix computations (including eigendecomposition) are performed using the NMath Suite (CenterSpace Software, Corvallis, Oregon). Because the NMath Suite is a commercially licensed library, those receiving source code from the authors must supply their own means of performing matrix eigendecomposition in order to modify and redeploy the application. The implementation of the Fermi-Dirac integral, used in the calculation of spectral density, is ported from Michele Goano's implementation in FORTRAN (Goano, 1995). Because SpectralNET uses a third-party library for matrix calculations that is partially implemented using Managed Extensions for C++, SpectralNET will not be portable to Linux until the Mono implementation of this C++ language feature is complete.
Results and discussion
Idealized random networks can be automatically generated by the application, or networks can be uploaded by the user for analysis. SpectralNET can automatically generate random Erdos-Renyi graphs , Barabasi-Albert (scale-free) graphs , re-wiring Barabasi-Albert graphs , Watts-Strogatz (small-world) graphs , or hierarchical graphs . Each automatically generated graph type is customizable with algorithmic parameters. SpectralNET was designed with extensibility in mind, so that users may request additional random graph types provided they submit a succinct algorithm to the author or create their own.
Networks can be uploaded by the user in the form of a Pajek file  or a tab-delimited text file with one edge per line (see additional file 1: HumanPPI_nodenodeweight.txt for an example network definition file defining a network of human protein-protein interactions). Raw data files can also be uploaded to the application, where each line of data is represented as a labeled vertex. Vertices can be connected with edge weights equal to the square of the correlation of their associated input data, or according to their Euclidean distance as defined by the Eigenmap algorithm . If raw data is uploaded by the user, principal component analysis (PCA) [14, 15] can optionally be performed on the data before calculating edge weights.
After processing the input network, SpectralNET displays for the user a wide variety of graph-analytic metrics. For example, the degree and clustering coefficient is displayed for each vertex. The degree of a vertex is the number of edges incident upon that vertex; for weighted graphs, SpectralNET calculates this as the sum of these edges' weights. The clustering coefficient of a node represents the proportion of its neighbors that are connected to each other, and is calculated for a node i as:
where n i denotes the number of edges connecting neighbors of node i to each other, and k i denotes the number of neighbors of node i . In addition, the minimum, average, and maximum distances of each vertex are displayed, which are defined as the shortest, average, and maximum distances, respectively, from the node to any other node in the graph. The components of the adjacency, Laplacian, and normalized Laplacian eigenvectors corresponding to the vertex are also shown, where the adjacency matrix is defined as the matrix A with the following elements:
the Laplacian matrix is defined as the matrix L with the following elements:
where d i denotes the degree of node i . It should be noted that Chung defines the Laplacian matrix as the normalized form above, but we use the more commonly found definition (for an example, see Mohar ).
Many large networks derived from biological data are composed of multiple subgraphs that are not always connected together. SpectralNET computes many properties based on the selected or "active" connected component. For the active connected component, its size and average diameter are displayed in addition to graphs of degree distribution , clustering coefficient by degree, and average distance by degree . Graphs of eigenvalues, eigenvectors, inverse participation ratios, and spectral densities of the three matrix types are also displayed. The inverse participation ratio is defined for each eigenvector as:
where λ is the eigenvalue and δ represents the delta function, implemented as described above . Most graphs can be mouse-clicked to select the vertex corresponding to a desired data point, and eigenvalue graphs can be sorted by value or by vertex degree. All calculated graph metrics can be exported as a tab-delimited text file for further analysis.
Visualization and dimensionality reduction
The main graph display window of SpectralNET offers two interactive graphical networks displays that support zooming and allow vertex selection by mouse-click. The default display view is the resulting graph processed by the Fruchterman-Reingold algorithm , which positions vertices by force-directed placement. The other available display is the network's Laplacian embedding, which locates vertices in two-dimensional Euclidean space using the corresponding second and third Laplacian eigenvector components (the first eigenvector component of the Laplacian matrix is degenerate). Exportation of the other Laplacian eigenvector components allows for visualization in higher dimensions.
In conjunction with uploaded raw data, Laplacian embedding allows the user to see a reduced-dimensionality view of high-dimensionality input, once this input is converted into a network. If the user chooses to process input data using the Eigenmap algorithm, Laplacian embedding shows the reduced-dimensionality result . Dimensionality reduction has proven to be a useful tool in computational chemistry and bioinformatics; for example, Agrafiotis  used multidimensional scaling (MDS) to reduce the dimensionality of combinatorial library descriptors, and Lin  used PCA to analyze single nucleotide polymorphisms from genomic data. We chose to implement Laplacian embedding rather than MDS or other algorithms in SpectralNET because of promising results in the field of machine learning . Although dimensionality reduction is especially useful for analyzing high-dimensional data, Laplacian embedding is an elegant display choice for any input network (see the next section for an example using a scale-free biological network). For a simpler (linear) dimensionality-reduced view of the input data, SpectralNET also has the option of viewing the results of PCA (though this view is not available when a network definition file, such as a Pajek file, is used). Both Laplacian embedding and PCA can be viewed in three dimensions with a Virtual Reality Modeling Language (VRML) viewer.
Example analysis of a randomly-generated small-world network and a biological scale-free network
SpectralNET provides an easy-to-use interface for creating a randomly generated small-world network. All that is required is to supply the desired number of nodes, the desired number of neighbors to which to connect each node, and the desired random probability that an edge is re-wired. For this example we create a network with 300 nodes in which each node is connected to four neighbors, and edges are rewired with 4% probability.
Dimensionality reduction of a real-world chemical dataset to analyze QSAR
In addition to performing spectral analysis of networks, SpectralNET can also perform dimensionality reduction on chemical datasets to analyze quantitative structure activity relationships (QSAR). In this example, we upload a set of chemical descriptor data into SpectralNET and analyze it using the Laplacian Eigenmap algorithm originally developed by Belkin and Niyogi . This dataset contains one small molecule, each created by the same diversity-oriented synthesis pathway , per row of the input file. Each column of the data represents a different molecular descriptor – metrics used to capture an aspect of the compound, such as volume, surface area, number of rings, etc.
The Laplacian Eigenmap algorithm in SpectralNET connects these small molecules to their K-nearest neighbors (measured by Euclidean distance), where K is an algorithmic parameter supplied by the user. In this example, we choose K = 7 to yield a reasonable number of edges in the resulting graph. Weights are assigned to each edge in one of two ways – every edge can have a weight of one, or weights can be assigned to edges by the following formula:
where W ij represents the weight of an edge connecting edges i and j and t is an algorithmic parameter . For the molecular descriptor dataset, edge weights of one were chosen (it should be noted that when applying the second method to this dataset, increasing values of t eventually resulted in convergence to the same result as this method around t = 20,000). SpectralNET also offers the choice of performing PCA on input data before performing the Laplacian Eigenmap algorithm, which is performed by default and remains enabled for this example.
SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. The software allows users to analyze idealized random networks or uploaded real-world datasets, and exposes metrics like the clustering coefficient, average distance, and degree distribution in an easy-to-use graphical manner. In addition, SpectralNET calculates and plots eigenspectra for three important matrices related to the network and provides several powerful graph visualizations.
SpectralNET is available as both a standalone .NET executable and an ASP.NET web application. Source code is available by request from the author.
Availability and requirements
Project name: SpectralNET
Project home page: http://chembank.broad.harvard.edu/resources/
Operating system(s): Windows
Programming language: C#
Other requirements: The .NET framework v1.1 or higher
License: The SpectralNET software is provided "as is" with no guarantee or warranty of any kind. SpectralNET is freely redistributable in binary format for all non-commercial use. Source code is available to non-commercial users by request of the primary author. Any other use of the software requires special permission from the primary author.
Any restriction to use by non-academics: Contact authors
We gratefully acknowledge the Broad Institute of Harvard University and MIT, the National Cancer Institute (Initiative for Chemical Genetics), and the National Institute of General Medical Sciences (Center of Excellence for Chemical Methodology and Library Development) for support of this research. S.L.S. is an Investigator at the Howard Hughes Medical Institute.
- Eisenberg E, Levanon E: Preferential attachment in the protein network evolution. Phys Rev Lett 2003, 91: 138701. 10.1103/PhysRevLett.91.138701View ArticlePubMed
- Fera D, Kim N, Shiddelfrim N, Zorn J, Laserson U, Gan HH, Schlick T: RAG: RNA-As-Graphs web resource. BMC Bioinformatics 2004, 5: 88. 10.1186/1471-2105-5-88PubMed CentralView ArticlePubMed
- Haggarty S, Clemons P, Schreiber S: Chemical genomic profiling of biological networks using graph theory and combinations of small molecule perturbations. J Am Chem Soc 2003, 125: 10543–10545. 10.1021/ja035413pView ArticlePubMed
- Chartrand G: Introductory Graph Theory. New York: Dover; 1985.
- Chung F: Spectral Graph Theory. Providence: American Mathematical Society; 1997.
- Goano M: Algorithm 745: Computation of the complete and incomplete Fermi-Dirac integral. ACM Trans Math Software 1995, 21: 221–232. 10.1145/210089.210090View Article
- Erdõs P, Rényi A: On random graphs I. Publ Math Debrecen 1959, 6: 290–297.
- Barabási AL, Albert R: Emergence of scaling in random networks. Science 1999, 286: 509512.
- Albert R, Barabási AL: Topology of evolving networks: Local events and universality. Phys Rev Lett 2000, 85: 5234–5237. 10.1103/PhysRevLett.85.5234View ArticlePubMed
- Watts D, Strogatz S: Collective dynamics of 'small-world' networks. Nature 1998, 393: 440–442. 10.1038/30918View ArticlePubMed
- Barabási AL, Dezso Z, Ravasz E, Yook SH, Oltvai Z: Scale-free and hierarchical structures in complex networks. Modeling of Complex Systems 2003, 661: 1–16.View Article
- Batagelj V, Mrvar A: PAJEK – program for large network analysis. Connections 1998, 21: 47–57.
- Belkin M, Niyogi P: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 2003, 15: 1373–1396. 10.1162/089976603321780317View Article
- Hotelling H: Analysis of a complex of statistical variables into principal components. J Educ Psychol 1931, 24: 417441.
- Lin Z, Altman R: Finding haplotype tagging SNPs by use of principal components analysis. Am J hum Genet 2004, 75: 850–861. 10.1086/425587PubMed CentralView ArticlePubMed
- Nacher JC, Ueda N, Yamada T, Kanehisa M, Akutsu T: Clustering under the line graph transformation: application to reaction network. BMC Bioinformatics 2004, 5: 207. 10.1186/1471-2105-5-207PubMed CentralView ArticlePubMed
- Mohar B: Some applications of Laplace eigenvalues of graphs. In Graph Symmetry: Algebraic Methods and Applications. Edited by: Hahn G, Sabidussi G. Dordrecht: Kluwer Academic Publishers; 1997:225–275.View Article
- Barabási AL, Ravasz E, Vicsek T: Deterministic scale-free networks. Physica A 2001, 299: 559564. 10.1016/S0378-4371(01)00369-7View Article
- Chung F: The average distance and the independence number. J Graph Theory 1988, 12: 229–235.View Article
- Farkas I, Derenyi I, Barabási AL, Vicsek T: Spectra of "real-world" graphs: Beyond the semicircle law. Phys Rev E 2001, 64: 026704. 10.1103/PhysRevE.64.026704View Article
- Fruchterman T, Reingold E: Graph drawing by force-directed placement. Software: Practice and Experience 1991, 21: 1129–1164.
- Agrafiotis D, Lobanov V: Multidimensional scaling of combinatorial libraries without explicit enumeration. J Comput Chem 2001, 22: 1712–1722. 10.1002/jcc.1126View Article
- Belkin M, Niyogi P: Semi-supervised learning on Riemannian manifolds. Machine Learning 2004, 56: 209–239. 10.1023/B:MACH.0000033120.25363.1eView Article
- Comellas F, Sampels M: Deterministic small-world networks. Physica A 2002, 309: 231–235. 10.1016/S0378-4371(02)00741-0View Article
- Stavenger R, Schreiber S: Asymmetric Catalysis in Diversity-Oriented Organic Synthesis: Enantioselective Synthesis of 4320 Encoded and Spatially Segregated Dihydropyrancarboxamides. Angew Chem Intl Ed 2001, 40: 3417–3421. Publisher Full Text 10.1002/1521-3773(20010917)40:18<3417::AID-ANIE3417>3.0.CO;2-EView Article
- Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenback I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes H, Ruepp A, Frishman D: The MIPS mammalian protein-protein interaction database. Bioinformatics 2005, 21: 832–834. 10.1093/bioinformatics/bti115View ArticlePubMed
- Kalkhoven E, Valentine J, Heery D, Parker M: Isoforms of steroid receptor co-activator 1 differ in their ability to potentiate transcription by the oestrogen receptor. The EMBO Journal 1998, 17: 232–243. 10.1093/emboj/17.1.232PubMed CentralView ArticlePubMed
- Farkas I, Jeong H, Vicsek T, Barabási AL, Oltvai Z: The topology of the transcription regulatory network in the yeast, S. cerevisiae . Physica A 2003, 318: 601–612. 10.1016/S0378-4371(02)01731-4View Article
- Jeong H, Mason S, Barabási AL, Oltvai Z: Lethality and centrality in protein networks. Nature 2001, 411: 41–42. 10.1038/35075138View ArticlePubMed
- Douali L, Villemin D, Cherqauoi C: Neural networks: Accurate nonlinear QSAR model for HEPT derivatives. Chem Inf Comput Sci 2003, 43: 1200–1207. 10.1021/ci034047qView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.