- Methodology article
- Open Access
Finding local communities in protein networks
© Voevodski et al; licensee BioMed Central Ltd. 2009
Received: 21 April 2009
Accepted: 18 September 2009
Published: 18 September 2009
Protein-protein interactions (PPIs) play fundamental roles in nearly all biological processes, and provide major insights into the inner workings of cells. A vast amount of PPI data for various organisms is available from BioGRID and other sources. The identification of communities in PPI networks is of great interest because they often reveal previously unknown functional ties between proteins. A large number of global clustering algorithms have been applied to protein networks, where the entire network is partitioned into clusters. Here we take a different approach by looking for local communities in PPI networks.
We develop a tool, named Local Protein Community Finder, which quickly finds a community close to a queried protein in any network available from BioGRID or specified by the user. Our tool uses two new local clustering algorithms Nibble and PageRank-Nibble, which look for a good cluster among the most popular destinations of a short random walk from the queried vertex. The quality of a cluster is determined by proportion of outgoing edges, known as conductance, which is a relative measure particularly useful in undersampled networks. We show that the two local clustering algorithms find communities that not only form excellent clusters, but are also likely to be biologically relevant functional components. We compare the performance of Nibble and PageRank-Nibble to other popular and effective graph partitioning algorithms, and show that they find better clusters in the graph. Moreover, Nibble and PageRank-Nibble find communities that are more functionally coherent.
The Local Protein Community Finder, accessible at http://xialab.bu.edu/resources/lpcf, allows the user to quickly find a high-quality community close to a queried protein in any network available from BioGRID or specified by the user. We show that the communities found by our tool form good clusters and are functionally coherent, making our application useful for biologists who wish to investigate functional modules that a particular protein is a part of.
Using the link structure of a network to gain insight into the function of its nodes is a ubiquitous technique in biological, social, and computer networks [1–11]. For example, Kleinberg used the link structure of the Internet to give each node a hub and an authority index , and Brin and Page utilized the structure of the Web, rather than its content, to rank Web pages [10, 11]. Of particular interest is the identification of network communities, also in the context of the Internet [6–8], and social and biological networks [1–5]. Communities are especially relevant in protein-protein interaction (PPI) networks because they often represent protein complexes or other modules with related function.
There are several ways to define a community in a network. One definition is to say that a community is a group of highly interconnected nodes. Finding large cliques in a graph, which are subsets of nodes that are completely connected, is a well-studied problem. It is computationally infeasible for large networks: finding the size of the largest clique in a graph is NP-Complete , and approximating it is hard as well . Thus many heuristic methods have been developed, which look for defective cliques (cliques that are missing some edges), or more generally dense components.
In the context of biological networks, Bu et al. use spectral analysis of the adjacency matrix of a graph to find nodes in cliquish components . They apply this technique to yeast PPI networks to identify imperfect cliques, in order to predict the functions of the unknown proteins that they contain. In addition to using spectral approaches, a simpler strategy for finding defective cliques is to first enumerate maximal cliques using an exact (but time consuming) solution, and then combine them if they have significant overlap .
A different way to characterize a community is to look at it as a cluster: a group of nodes that are highly interconnected among themselves, but sparsely connected to the rest of the network. The quality of a cluster can be measured by the ratio of the number of its outgoing edges to the sum of the degrees of its nodes, known as conductance . For example, a cluster has a conductance of 0 if it is disconnected from the rest of the network, and 1 if there are no edges within the cluster.
In order to find clusters in a network, we can use two kinds of algorithms that differ in whether or not they consider the entire graph. A global clustering algorithm seeks to partition the entire graph into clusters, while a local clustering algorithm only considers a small part of the graph close to a given vertex. Much effort has been devoted to using global partitioning algorithms on protein networks [17–22], but the same cannot be said about local techniques. Local algorithms have an obvious advantage because they are often faster. In addition, they may be more useful if the user is only interested in a certain neighborhood of the network. Even though a local algorithm uses less information, it can utilize a local view of the graph to find a quality cluster .
The objective of this work is to develop a tool (Local Protein Community Finder) for finding high-quality communities near a queried protein in a PPI network. Here we measure the quality of a community by its conductance, and the functional coherence of its proteins. We expect clusters of interacting proteins to be functional modules, yet this may not always be the case. Therefore, we need to validate the biological significance of the found communities. To evaluate functional coherence, we consider how functionally related the proteins inside the cluster are to each other, and to the other proteins in the network. In order to determine the functional distance between a pair of proteins, we use a measure derived from annotation based on biological literature .
The Local Protein Community Finder uses a recently developed local clustering algorithm called Nibble , and its extension PageRank-Nibble . Both of them start from a single vertex, and look for a cluster of good conductance in its neighborhood. To our knowledge, our study is the first application of non-heuristic local clustering algorithms to protein networks. We compare Nibble and PageRank-Nibble to Metis, a popular and effective graph partitioning algorithm , and a common spectral clustering implementation . Metis partitions the entire network into mutually disjoint balanced clusters, keeping the edge cut, which is the set of edges with endpoints in different clusters, as small as possible. The intuition behind spectral clustering is described in [27–34], and Metis works better than commonly used spectral clustering methods in terms of the size of the resulting edge cut . We show that among the algorithms considered, Nibble and PageRank-Nibble find better clusters in terms of conductance and functional coherence, making our tool useful for finding interesting communities close to a queried protein.
We model a protein network as an undirected, unweighted graph where the nodes are the proteins, and two nodes are connected by an edge if the corresponding proteins are annotated as interacting with each other.
In a random walk it is useful to consider a probability distribution vector p over all the nodes in the graph. Here p is a row vector, where p(u) is the probability that the walk is at node u, and ∑u ∈ Vp(u) = 1. Because we transition between nodes with probabilities given by W, if p t is the probability distribution vector at time t, then pt+1= p t W.
The s vector specifies the probability distribution for where the walk transitions when it resets. The original PageRank algorithm used a uniform unit vector for s [10, 11]. PageRank with non-uniform starting vectors is known as personalized PageRank, and has been used in context-sensitive search on the Web [35, 36].
In other words, L G = D G - A G . The eigenvectors of L G reveal some structure of the graph , and are often used by spectral graph partitioning algorithms.
The lower the conductance, the better the cluster. Notice that a cluster can have low conductance without being dense. Using the minimum of vol(S) and vol( ) in the definition disregards vacuous clusters (for example, when S = ∅ and = V).
Nibble, the local clustering algorithm of Spielman and Teng , works by looking for a cluster of low conductance among the most popular destinations of a short random walk from the starting vertex. The algorithm starts with a probability distribution vector p that has all of its probability in the starting vertex, and at every iteration updates p by setting p t = pt-1W, where W is the lazy random walk transition probability matrix. A lazy random walk is a modified random walk where the probability of remaining at each vertex is 1/2; it is used to ensure that the walk reaches a steady state. After each iteration of the random walk, Nibble checks the probability distribution vector for a cluster of low conductance by performing a "sweep" of p.
The algorithm finds Φ (p) by sorting the entries of p by degree normalized probability, and then computing the conductance of each sweep set to find the minimum. The degree of each vertex v, denoted by d(v), is proportional to the amount of probability that v has in the stationary distribution of the random walk. Therefore the sweep sets contain vertices that have significantly more probability in p than they do in the stationary distribution, meaning that they are visited more often in a walk from the starting node than they are at steady-state.
In order to bound the runtime of the algorithm, Nibble only looks at a small part of the graph close to the starting vertex by using a truncation operation on the probability distribution vector. Given a parameter ϵ, after each iteration of the random walk, we set p(v) = 0 for every v such that p(v) ≤ ϵ d(v). Nibble takes as input the number of iterations that it performs, as well as ϵ, and returns the sweep set of smallest conductance over all iterations.
Deviating from the algorithm presented in , we also implement a constrained version of Nibble, which always reports a cluster containing the starting vertex. Here when we perform a sweep of the probability distribution vector, we always put the starting vertex s first in the order (set v1 = s), no matter how much probability there is at that vertex. Therefore, Constrained-Nibble only considers sweep sets that include the starting vertex. Similarly, Nibble can also be modified to only consider sweep sets of a certain size, which is useful when we wish to find a cluster in a specified size range.
PageRank-Nibble  is similar to Nibble in that it looks for a cluster containing the closest vertices to the starting node. However, instead of using an evolving probability distribution of a random walk from starting node s, PageRank-Nibble uses a personalized PageRank vector that gives the stationary distribution of a random walk that always returns to s when it resets. Once the personalized PageRank vector is computed, the same sweep technique described in the previous section is used to return the cluster of lowest conductance among its sweep sets.
In addition, to bound the amount of time necessary to compute the PageRank vector and perform a sweep, the algorithm uses an approximation of it. The approximation algorithm computes an ϵ-approximate PageRank vector by conducting a random walk that only considers vertices v that have more than ϵ d(v) probability in them. The resulting PageRank vector has few non-zero entries, and the amount of error in it for any subset of vertices S is less than ϵ·vol(S).
PageRank-Nibble uses the same sweep technique to find a partition, so it can also be constrained to only consider sweep sets of a certain size, if we wish to find a cluster in a specified size range. Unlike Nibble, PageRank-Nibble always reports a cluster containing the starting vertex because the starting vertex has the most degree normalized probability in the computed PageRank vector. For calculations of PageRank, α (the reset probability in the PageRank equation) is typically chosen to be 0.15. However, we find that using a lower value of α (such as 0.02) gives us clusters of lower conductance.
Metis is a global graph partitioning algorithm that outperforms other state-of-the-art methods . It takes the number of clusters (k) as an argument, and maps each vertex to one of k balanced clusters, minimizing the size of the edge cut, which is the set of edges with endpoints in different clusters. Metis is a multilevel algorithm, which coarsens the graph to perform the partitioning, and then improves it as the graph is rebuilt. There are two variations of Metis algorithms for partitioning graphs: Kmetis and Pmetis. We try both, and decide to use Pmetis because it gives us clusters of lower conductance. Pmetis works by recursively bisecting the graph, it is slower but returns clusters that are more balanced in size. The partitions reported by Pmetis are almost all of exactly the same size, so to get clusters of a certain size we simply set k accordingly. Because Pmetis is a global algorithm, we partition the entire graph once, and for starting vertex s return the cluster containing s.
We use a common spectral clustering implementation, taking the first d eigenvectors of the Laplacian matrix of the graph (other than the one corresponding to the lowest eigenvalue), to put each vertex in a d-dimensional space. We then use k-means clustering to partition the vertices into k clusters, again choosing k to get clusters of the desired size. However, the sizes of the found clusters vary greatly, so we also use a variation where we simply return the k closest vertices to the starting vertex in the spectral embedding space. Once again, if we partition the entire graph, we return the cluster that contains the starting vertex.
Measuring Functional Distance
In order to assess the functional coherence of the found clusters, we use functional distances from Yu et al. . These values are derived using the Gene Ontology (GO) Process classification scheme, where functional-distance(a, b) is the number of gene pairs that share all of the least common ancestors of a and b in the classification hierarchy. A low functional distance score means that two proteins are more functionally related, because there are few protein pairs that have the same functional relationship.
The functional distance measure of Yu et al., which the authors refer to as the "total ancestry measure for GO," has the obvious advantage that it considers all known functions of a pair of proteins, allowing for a great degree of precision in assessing functional similarity. Moreover, unlike other methods that derive distances from the GO classification scheme, this method is very resilient to rough functional descriptions, because it still assigns low distances to pairs of proteins that only share very broad terms, as long as there are few other protein pairs that share all of those terms.
Calculating Functional Coherence
Correlating Conductance and Functional Coherence
To determine whether communities with better conductance are more likely to be functionally coherent, we choose groups of proteins from each network, rank these groups by conductance, absolute, and relative functional coherence, and compute the correlation between the ranks using the Pearson Correlation Coefficient . How to choose the protein groups for this experiment is non-trivial. They cannot be selected by taking random subsets of nodes in each network, which will most certainly produce disconnected groups with bad conductance and functional coherence. Furthermore, they cannot be selected by using algorithms that minimize conductance, which will produce groups with strong bias towards low conductance. A better way to choose these groups is to first randomly select a vertex, and then choose k - 1 of its nearest neighbors, breaking ties in distance randomly. Such "random" protein groups will be connected in the network, with reasonable and variable conductance and coherence values. The size of each group is randomly chosen in the range 10 ≤ k ≤ 40, because we expect biologically relevant communities to be in this size range.
The Protein Networks
The protein interaction data that we use in our study is from BioGRID , Version 2.0.53, updated June 1, 2009. BioGRID lists interacting protein pairs, and for each pair gives the experimental method used to observe the interaction, as well as the source that submitted it. In our study we use several yeast protein-protein interaction (PPI) networks formed from interactions detected by different methods.
Two of the networks, where protein interactions are detected from bait-and-prey type experiments are Affinity Capture-Western (referred to as ac-western in the figures), and Affinity Capture-MS (ac-ms). These networks tend to be much more cliquish and contain dense components, which is due to the nature of the experiment used to detect the interactions. A single protein (bait) is used to pull in a set of other proteins (prey), and an interaction is predicted either between the bait and each prey (the spoke model), or between every protein in the group (the matrix model) . We also use Two-Hybrid data in our study. Two-Hybrid methods detect binary interactions, therefore PPI networks based on Two-Hybrid data tend to be less dense and cliquish than ones derived from Affinity Capture experiments.
In addition to using a network formed from the union of all Two-Hybrid interactions listed in BioGRID (two-hybrid), we also consider a subset of this data submitted by  (two-hybrid-2). This network is sparser, but is believed to be of higher quality.
Results and Discussion
A Biological Example
First we provide a concrete example where using our tool reveals a biologically meaningful community, while a different commonly used approach does not produce a functionally coherent group. We use our tool to query the local community of yeast protein GET3 in the network from , which returns a group of 38 proteins. To investigate the functions of these proteins, we use the GO Term Finder  to look for Gene Ontology biological process terms that are significantly enriched in this group (p-value ≤ 0.001). GET3 is a subunit of the GET complex, which is involved in Golgi to ER trafficking and insertion of proteins into the ER membrane . We find that the local community of GET3 contains many proteins with similar functions. The community is significantly enriched for relevant Gene Ontology process categories such as protein localization (13 genes, p = 2 × 10-6), cellular protein localization (12 genes, p = 3 × 10-6), intracellular protein transport (11 genes, p = 9 × 10-6), and establishment of localization in cell (15 genes, p = 5 × 10-5). On the other hand, when we simply take the 38 nearest neighbors of GET3 in the protein network (breaking ties in distance randomly), we get a group of proteins that are not significantly enriched for any of these categories. Thus here it is clear that considering the local community of a protein in a PPI network is much more meaningful than simply looking at its immediate interaction partners.
Nibble and PageRank-Nibble have theoretic performance guarantees [23, 25], but their effectiveness has not been tested in practice. In order to justify our choice of partitioning algorithms, we compare the performance of Nibble and PageRank-Nibble to two other algorithms commonly used for this purpose: Metis and spectral clustering. We use two variations of spectral clustering: one that uses k-means clustering in the spectral embedding space and reports the cluster containing the starting vertex, and another that simply returns the closest neighbors of the starting vertex in the spectral embedding space (see Methods). To determine whether partitioning a protein interaction network is an effective approach to finding functionally coherent communities, we also compare with a simple nearest neighbor heuristic. Given a starting vertex, nearest neighbor returns the k nearest neighbors in the graph, ties broken randomly. Of course, we do not expect this heuristic to return clusters of low conductance, but we do expect it to find functionally coherent groups, because nodes that are connected in a PPI network tend to be much more functionally related .
Comparing the Algorithms
To compare the performance of the algorithms, we run all of them from the same set of nodes in each PPI network described in Methods, and record the conductance and absolute/relative functional coherence of the found clusters. We then average the statistics of every algorithm in each network, and report the standard error to see if the differences are statistically significant. Moreover, we compare the algorithms when they search for clusters of varying size for the following reasons:
Functional coherence is sensitive to cluster size (it is usually lower for smaller clusters), therefore we need to control the size of the reported clusters for a rigorous comparison.
This gives a more thorough performance comparison because it is possible that different algorithms are better at finding clusters of different size.
Thus we compare all methods when they search for small (size 10-20), medium (size 20-30) and large (size 30-40) clusters. We consider this size range because we expect biologically relevant functional groups to be of roughly this size (10-40 proteins). Setting the parameters of each algorithm to only report clusters in a specified size range is straightforward: for k-nearest neighbor methods we simply set k accordingly; we restrict the local clustering algorithms to only consider sweep sets of a certain size (see Methods); and we vary the desired number of clusters when using the global partitioning algorithms, which gives us clusters in the right size range if the partition is balanced.
Comparing Performance in Terms of Conductance
From Figure 3 we also see that the local clustering algorithms outperform the global ones, which may seem surprising at first. However, a global partitioning algorithm assigns every node to a cluster in order to minimize the edge cut over all clusters in the graph. It is not clear why the same algorithm should be best at minimizing the edge cut for a single cluster in a given neighborhood. The local algorithms, on the other hand, use a view of the graph relative to the starting node, which is specific to each vertex, to find the partition. The intuition for why Nibble works well is that if we take a node inside a cluster of low conductance, and start a random walk from that node, most of the probability will stay inside the cluster because there are proportionally few outgoing edges (and probability in a random walk moves along the edges). Therefore, most of the nodes in the cluster will have a lot of probability in them, and will be in one of the sweep sets (sets of nodes that have the most probability in them) considered by Nibble. Even though PageRank-Nibble does not perform quite as well on all of our data, its theoretic performance guarantee is better than that of Nibble , and we believe that it can be especially useful for other networks.
Comparing Performance in Terms of Functional Coherence
However, we do not expect to see as large a difference in performance between the algorithms used by our tool and other methods in terms of functional coherence of the found communities. There are several reasons for this:
Our functional coherence calculations depend on the quality of the Gene Ontology annotations from which the pairwise functional distances are derived.
The protein networks that we use are noisy, and contain false positive and false negative interactions.
We should not expect methods designed to minimize conductance to perform equally well when their performance is assessed using other, albeit related, measures.
Still, we expect our local partitioning algorithms to find communities with high functional coherence, giving more evidence that the communities reported by our tool are biologically relevant.
Using conductance to measure the quality of a cluster is especially useful in PPI networks because of the low sampling rate of protein interaction data. The extremely low sampling rate (said to be as low as 3-9% in Two-Hybrid data ) presents a problem for any computational method based on PPI network topology. However, undersampling is not likely to affect the conductance of a cluster, because the proportion of outgoing edges is likely to stay the same. Thus if a good cluster exists in the true PPI network, it will also exist in the undersampled network, and will be found by our tool if it is looking in the right neighborhood. On the other hand, algorithms that use absolute criteria in looking for a community (such as seeds of highly-connected nodes, or k-cores ) will work poorly on PPI data unless these criteria are carefully chosen.
Our results also make an important point about whether it makes sense to use a local partitioning algorithm when it is feasible to use a global one. A local partitioning algorithm only looks at a part of the graph, which gives it an obvious advantage when the graph is very large. However, not much is said about the quality of partitions that a local algorithm finds, compared to a global method. It is easy to think that since a local algorithm only sees a part of the graph, it must always do worse. However, here we show that this is not the case, as the two local partitioning algorithms outperform the global ones. Additionally, the concept of local clustering is quite natural, as often we care about a community local to a part of the graph, rather than how all the nodes cluster. This may be especially true in biology, where researchers typically work on specific proteins of interest.
Finally, we address the biological relevance of communities found in PPI networks. Our results show that by locally partitioning a protein network we can find communities whose proteins are functionally related to each other, and less related to the other proteins in the network.
Konstantin Voevodski is supported by an IGERT Fellowship through NSF grant DGE-0221680 awarded to the ACES Training Program at BU Center for Computational Science. Shang-Hua Teng is supported in part by NSF grant CCR-0635102. Yu Xia is supported by a Research Starter Grant in Informatics from the PhRMA Foundation.
- Krogan N, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis A, Punna T, Peregrin-Alvarez J, Shales M, Zhang X, Davey M, Robinson M, Paccanaro A, Bray J, Sheung A, Beattie B, Richards D, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete M, Vlasblom J, Wu S, Orsi C, Collins S, Chandran S, Haw R, Rilstone J, Gandi K, Thompson N, Musso G, Onge PS, Ghanny S, Lam M, Butland G, Altaf-U A, Kanaya S, Shilatifard A, O'Shea E, Weissman J, Ingles J, Hughes T, Parkinson J, Gerstein M, Wodak S, Emili A, Greenblatt J: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440: 637–643. 10.1038/nature04670View ArticlePubMedGoogle Scholar
- Newman M: Fast algorithm for detecting community structure in networks. Eur Phys J B 2004, 38: 321–330. 10.1140/epjb/e2004-00124-yView ArticleGoogle Scholar
- Girvan M, Newman M: Community structure in social and biological networks. Proc Natl Acad Sci USA 2002, 99: 7821–7826. 10.1073/pnas.122653799PubMed CentralView ArticlePubMedGoogle Scholar
- Palla G, Derenyi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435: 814–818. 10.1038/nature03607View ArticlePubMedGoogle Scholar
- Clauset A: Finding local community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys 2005, 72: 026132.View ArticlePubMedGoogle Scholar
- Gibson D, Kleinberg J, Raghavan P: Inferring Web communities from link topology. Proc ACM Conf on Hypertext and Hypermedia 1998, 225–234.Google Scholar
- Kumar R, Raghavan P, Rajagopalan S, Tomkins A: Trawling the Web for emerging cyber-communities. Computer Networks 1999, 31: 1481–1493. 10.1016/S1389-1286(99)00040-7View ArticleGoogle Scholar
- Flake G, Lawrence S, Giles C, Coetzee F: Self-organization and identification of Web communities. Computer 2002, 35: 66–70. 10.1109/2.989932View ArticleGoogle Scholar
- Kleinberg J: Authoritative sources in a hyperlinked environment. Proc ACM Conf on Hypertext and Hypermedia 1998, 604–632.Google Scholar
- Page L, Brin S, Motwani R, Winograd T: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University 1998.Google Scholar
- Brin S, Page L: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 1998, 30: 107–117. 10.1016/S0169-7552(98)00110-XView ArticleGoogle Scholar
- Karp R: Reducibility among combinatorial problems. Proc Sympos on Complexity of Computer Computations 1972, 85–103.View ArticleGoogle Scholar
- Feige U, Goldwasser S, Lovasz L, Szegedy M: Approximating clique is almost NP-complete. Proc IEEE Foundations of Computer Science 1991, 2–12.Google Scholar
- Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N, Li G, Chen R: Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res 2003, 31: 2443–2450. 10.1093/nar/gkg340PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Paccanaro A, Trifonov V, Gerstein M: Predicting interactions in protein networks by completing defective cliques. Bioinformatics 2006, 22: 823–829. 10.1093/bioinformatics/btl014View ArticlePubMedGoogle Scholar
- Sinclair A, Jerrum M: Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation 1989, 82: 93–113. 10.1016/0890-5401(89)90067-9View ArticleGoogle Scholar
- Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003, 4: 2. 10.1186/1471-2105-4-2PubMed CentralView ArticlePubMedGoogle Scholar
- Spirin V, Mirny L: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 2003, 100: 12123–12128. 10.1073/pnas.2032324100PubMed CentralView ArticlePubMedGoogle Scholar
- Chen J, Yuan B: Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics 2006, 22: 2283–2290. 10.1093/bioinformatics/btl370View ArticlePubMedGoogle Scholar
- King A, Przulj N, Jurisica I: Protein complex prediction via cost-based clustering. Bioinformatics 2004, 20: 3013–3020. 10.1093/bioinformatics/bth351View ArticlePubMedGoogle Scholar
- Brun C, Chevenet F, Martin D, Wojcik J, Guenoche A, Jacq B: Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 2003, 5: R6. 10.1186/gb-2003-5-1-r6PubMed CentralView ArticlePubMedGoogle Scholar
- Vlasblom J, Wodak S: Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics 2009, 10: 99. 10.1186/1471-2105-10-99PubMed CentralView ArticlePubMedGoogle Scholar
- Spielman DA, Teng S: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning. 2008. 0809.3232v1 [cs.DS]Google Scholar
- Yu H, Jansen R, Gerstein M: Developing a similarity measure in biological function space. Bioinformatics 2007, 23: 2163–2173. 10.1093/bioinformatics/btm291View ArticlePubMedGoogle Scholar
- Andersen R, Chung F, Lang K: Local graph partitioning using PageRank vectors. Proc IEEE Foundations of Computer Science 2006, 475–486.Google Scholar
- Abou A, Karypis G: Multilevel algorithms for partitioning power-law graphs. Proc IEEE Int Prallel & Distributed Processing Sympos 2006.Google Scholar
- Alpert C, Kahng A, Yao Z: Spectral partitioning: The more eigenvectors the better. Discreet Applied Mathematics 1999, 90: 3–26. 10.1016/S0166-218X(98)00083-3View ArticleGoogle Scholar
- Meila M, Shi J: Learning Segmentation by Random Walks. Neural Information Processing Systems 2001.Google Scholar
- Kannan R, Vempala S, Vetta A: On clusterings: good, bad and spectral. Proc IEEE Foundations of Computer Science 2000, 367–377.Google Scholar
- Hagen L, Kahng A: New spectral methods for ratio cut partitioning and clustering. IEEE Trans on CAD of Integrated Circtuis and Systems 1992, 11: 1074–1085. 10.1109/43.159993View ArticleGoogle Scholar
- Spielman DA, Teng S: Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl 2007, 421: 284–305. 10.1016/j.laa.2006.07.020View ArticleGoogle Scholar
- Kelner J: Spectral Partitioning, Eigenvalue Bounds, and Circle Packings for Graphs of Bounded Genus. SIAM J Comput 2006, 35: 882–902. 10.1137/S0097539705447244View ArticleGoogle Scholar
- Biswal P, Lee JR, Rao S: Eigenvalue Bounds, Spectral Partitioning, and Metrical Deformations via Flows. Proc IEEE Foundations of Computer Science 2008, 751–760.Google Scholar
- Alpert C, Kahng A, Yao S: Spectral partitioning with multiple eigenvectors. Discrete Applied Mathematics 1999, 90: 3–26. 10.1016/S0166-218X(98)00083-3View ArticleGoogle Scholar
- Fogaras D, Racz B: Towards scaling fully personalized pagerank. Proc Workshop on Algorithms and Models for the Web-Graph 2004, 105–117.View ArticleGoogle Scholar
- Jeh G, Widom J: Scaling personalized web search. Proc World Wide Web Conf 2003, 271–279.Google Scholar
- Rodgers JL, Nicewander WA: Thirteen ways to look at the correlation coefficient. Am Stat 1988, 42: 59–66. 10.2307/2685263View ArticleGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34: D535–9. 10.1093/nar/gkj109PubMed CentralView ArticlePubMedGoogle Scholar
- Han J, Dupuy D, Bertin N, Cusick M, Vidal M: Effect of sampling on topology predictions of protein-protein interactions. Nat Biotechnol 2005, 23: 839–844. 10.1038/nbt1116View ArticlePubMedGoogle Scholar
- Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabasi AL, Tavernier J, Hill DE, Vidal M: High-quality binary protein interaction map of the yeast interactome network. Science 2008, 322: 104–110. 10.1126/science.1158684PubMed CentralView ArticlePubMedGoogle Scholar
- Hu Z, Hung J, Wang Y, Chang Y, Huang C, Huyck M, DeLisi C: VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology. Nucleic Acids Res 2009, 37: W115–121. 10.1093/nar/gkp406PubMed CentralView ArticlePubMedGoogle Scholar
- Boyle E, Weng S, Jin H, Botstein D, Cherry J, Sherlock G: GO::TermFinder-open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 2004, 20: 321–330.Google Scholar
- Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Krieger CJ, Livstone MS, Miyasato SR, Nash RS, Oughtred R, Skrzypek MS, Weng S, Wong E, Zhu KK, Dolinski K, Botstein D, Cherry JM: Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res 2008, 36: D577–581. 10.1093/nar/gkm909PubMed CentralView ArticlePubMedGoogle Scholar
- Seidman S: Network structure and minimum degree. Social Networks 1983, 5: 269–287. 10.1016/0378-8733(83)90028-XView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.