- Research article
- Open Access
RRW: repeated random walks on genome-scale protein networks for local cluster discovery
- Kathy Macropol^{1}Email author,
- Tolga Can^{2} and
- Ambuj K Singh^{1}
https://doi.org/10.1186/1471-2105-10-283
© Macropol et al; licensee BioMed Central Ltd. 2009
- Received: 23 January 2009
- Accepted: 9 September 2009
- Published: 9 September 2009
Abstract
Background
We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins.
Results
We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values.
Conclusion
RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.
Keywords
- Random Walk
- Protein Interaction Network
- Significant Cluster
- Original Network
- Biological Process Term
Background
In recent years, much effort has gone into finding the complete set of interacting proteins in an organism [1]. Such genome-scale protein networks have been realized with the help of high throughput methods, like yeast-two-hybrid (Y2H) [2, 3] and affinity purification with mass spectrometry (APMS) [4, 5]. In addition, information integration techniques that utilize indirect genomic evidence have provided both increased genome coverage by predicting new interactions and more accurate associations with multiple supporting evidence [6–9].
Complementary to the availability of genome-scale protein networks, various graph analysis techniques have been proposed to mine these networks for pathway or molecular complex discovery [10–15], function assignment [16–18], and complex membership prediction [19, 20]. Bader and Hogue [21] propose a clustering algorithm to detect densely connected regions in a protein interaction network for discovering new molecular complexes. Spirin and Mirny [22] use superparamagnetic clustering (SPC) and a Monte Carlo (MC) algorithm to cluster a given protein interaction network. These algorithms work on undirected unweighted graphs and partition the network of proteins into non-overlapping clusters. However, genome-wide networks constructed with multiple supporting evidence have edges with varying degrees of confidence. The strength of confidence should be considered when identifying strongly connected proteins. Also, it is known that there are many multi-functional proteins which may play important roles in different functional modules. Therefore, a biologically more sensitive cluster identification technique should report clusters that may sometimes overlap. Several clustering techniques have since been proposed that take into account the given edge confidence [23] or overlapping clusters [24, 25]. However, these algorithms all account for the two problems separately, and do not both use given biological edge confidences and find overlapping clusters at the same time.
In this paper, we propose a novel algorithm, repeated random walk (RRW for short), for molecular complex and functional module discovery within genome-scale protein interaction networks. This new algorithm utilizes both given edge weights and can find overlapping clusters. The idea is based on expansion of a given cluster to include the protein with the highest proximity to that cluster. Starting with a cluster of size one (any protein in the network), this iterative process is repeated either k times, or until a stopping condition is met, to obtain clusters of size ≤ k. All significant overlapping clusters are recorded and post-processed to remove redundant clusters based on a given overlap threshold. We use random walks with restarts to find the closest proteins to a given cluster. To increase the algorithm's speed, the random walk results from a given cluster are computed using linear combinations of precomputed random walk results obtained starting from single proteins. Unlike other techniques proposed for pathway discovery, the random walk method implicitly exploits the global structure of a network by simulating the behavior of a random walker [26].
We apply RRW on a genome-scale functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results by comparison to known complexes in the MIPS complex catalogue database [27]. By comparison to an existing clustering technique, we show that using edge weights in addition to connectivity information and allowing certain amounts of overlap between clusters are the key characteristics of RRW for finding biologically more significant clusters.
Results and discussion
Problem statement and algorithm
Let G = (V, E) be the graph representing a genome scale protein interaction network, where V is the set of nodes (proteins), and E is the set of weighted undirected edges between pairs of proteins. The edges are weighted by the strength of supporting evidence for functional association.
Problem definition
Given a physical protein interaction or predicted functional network of an organism, our goal is to find biologically significant groups of proteins in the network. Here, the definition of a biologically significant group entails proteins that function together in a biological pathway or are members of a protein complex. Moreover, significant clusters may contain proteins from different complexes, therefore revealing modular interactions at a higher level.
List of notations used
Symbol | Definition | Symbol | Definition |
---|---|---|---|
G | Undirected, weighted graph | α | Random walk restart probability |
V | Vertices in graph | λ | Early cutoff value |
E | Edges in graph | k | Number of iteractions (maximum cluster size) |
P | Transition matrix for graph | s _{ i } | Restart vector for a node (or set of nodes) i |
C | Vector consisting of a cluster of nodes | x _{ i } | Random Walk stationary vector from a node (or set of nodes) i |
Random walks with restarts
Repeated random walk algorithm
The random walk algorithm finds proteins that are in close proximity to a start node. Below we describe a linear combination technique to simulate a random walk starting from a set of proteins.
We can add the closest protein to the start set and repeat the random walk. Successive iterations can be used to identify clusters of any given size. Repeated random walks is based on this idea. However, the large number of random walks necessary to obtain a cluster in this way greatly reduces the speed of the algorithm. To lower the computational costs, the number of random walks performed can be reduced and the affinity vectors found using an alternative method.
Precomputed random walk results starting from single proteins in the set can be linearly combined to obtain the affinity vector for larger clusters starting from multiple proteins, as shown below.
Theorem 1 Let P be the row normalized adjacency (transition) matrix defined by the graph, G.
Let s_{C} be the restart vector for a set of nodes, C, that contains a value of in all entries corresponding to nodes in C, and 0 for other entries. Then, the stationary vector, x_{ C }, for a random walk with restarts starting from the set of nodes, C, is , where x_{i} is the stationary vector of random walk with restarts from node i.
Noting the form of Equations 2 and 3, and since the stationary vector is unique, we conclude that
■
These expanded clusters are afterwards post-processed based on a given overlap threshold. The less significant of highly overlapping (redundant) clusters are then discarded. The overlap ratio between two clusters, C_{1} and C_{2}, is given by |C_{1} ∩ C_{2}|/min {|C_{1}|, |C_{2}|} and is between 0.0 and 1.0.
The complexity of the Random Walk algorithm is linear in the size of the graph and maximum cluster size, O (|V|·R + |V|·k), where R is the complexity of the RandomWalk algorithm, and the complexity of post-processing is O(n^{2}) where n are the number of clusters created. The bottleneck for the RRW algorithm, in large graphs, are the calls to the RandomWalk method done in the beginning. On a protein network with |V | = 4,681 and |E| = 34,000, the random walk calls take about fifteen minutes in total (using a machine with a 3.2 GHz Intel Xeon CPU and 8 GB of RAM running the Ubuntu 8.04 operating system), versus less than a minute spent computing the clusters using the linear combination method after the Random Walk affinity vectors have been computed and stored.
In order to reduce this complexity, one can skip using the RandomWalk and simply use the best neighbors based solely on edge weights. However, this naïve nearest neighbor approach does not capture the structure of the network around starting nodes. Our experiments show that this is indeed the case.
Statistical significance of a cluster
Given a set of proteins that form a cluster in a genome-scale protein network, we assign a statistical significance to that set. To create a quantitative representation of a cluster, we compute a score which is the average value of the random walk distance between all nodes in the cluster. (Since the affinity vectors from each node in the graph are already precomputed and stored during the RRW computations, this can be done quickly and efficiently.) Since the "distances" are the stationary probabilities, the average score value will range from 0 to 1.
The computation of significance of a score requires estimating the cdf of scores and computing p-value(s) = 1 - cdf(s). Score distributions can be computed empirically by sampling clusters of different sizes. However, we found that the typical scores we worked with had very small tail probabilities. For example, for a cluster size of 10, the mean was 3.27·10^{-5}, the standard deviation was 1.28·10^{-4}, and the tail probability had to be computed for a score of .0359, which is about 280 times the standard deviation removed from the mean. It is difficult to apply sampling to compute these small tail probabilities.
For our purposes, we assumed a simple relationship between the cdf, scores, and cluster sizes. Clearly, the cdf value of a score is monotonic in score. It is also monotonic in cluster size since the probability of a cluster having an average score less than a threshold increases with cluster size. We attempted a number of different estimates of cdf-values: (score·log |C|), (score· ), and (score·|C|). Both (score· ) and (score·|C|) were correlated significantly with biological significance in MIPS clusters (the percent of proteins in the cluster that belong to the same MIPS complex). For a sample size of 1,855 clusters, the Pearson Correlation Coefficient between the biological significance and (score·log |C|), (score· ), and (score·|C|) was 0.00787, 0.158 and 0.229, respectively. Since the critical value of the correlation coefficient p for 1,855 items is 0.0763 at 0.001 probability, it can be seen that (score· ) and (score·|C|) are both significantly correlated to biological significance.
In our experiments, a slower growing function of |C| (such as ) led to better precision and worse recall than a faster growing function of |C| (such as |C|). Choosing clusters with higher precision over recall, we adopted the function and present results for p-value = 1 - (score· ).
Experimental results
In this section, we report our experimental results conducted on different variants of a S. cerevisiae protein interaction network, setting λ to be 0.6, k to be 11, and the overlap threshold to be 0.2. Varying the overlap threshold between 0.01 to 0.4 was found experimentally to affect the reported results only slightly, and so a value of 0.2 (one overlapping protein allowed in a cluster of size 5) was chosen. The values for λ and k were found to not significantly alter the majority of returned results as well, as the returned clusters tended to favor smaller sizes (on average 5-6 proteins). These values, however, were chosen after evaluating various parameter settings. For the model organism S. cerevisiae, we used the WI-PHI network by Kiemer et al. [29]. WI-PHI is a weighted undirected protein interaction network encompassing a large majority of yeast proteins. It is constructed by integration of various heterogeneous data sources such as application of tandem affinity purification coupled to MS (TAP-MS), large-scale yeast two-hybrid studies, and results of small-scale experiments stored in dedicated databases. The network contains 50,000 interactions for 5,955 yeast proteins. The weights, included in the original file, are determined by assessing each data source's performance in reproducing the results of a high confidence benchmark interactome. We also created noisy versions of these networks to demonstrate the robustness of RRW under noise.
Comparison to known MIPS complexes
In order to evaluate the performance of RRW, we use protein complexes from the MIPS complex catalog [27]. All proteins belonging to the same MIPS complex are determined to be interacting with each other. Two statistical results are obtained. First, the quality of a cluster is assessed by finding the percentage of proteins belonging to the same MIPS complex within that cluster. If multiple complex annotations are mapped to the same cluster, the annotation with the highest number of proteins contained in the cluster is chosen. In addition, benchmark protein complexes from the MIPS catalog were used to obtain precision, recall, and accuracy measures. The MIPS benchmark contains 49 protein complexes each of which contains 5 to 10 proteins. The goal was to find clusters as close as possible to the actual complex or pathway, as measured by: precision = number of true positives/local cluster size, recall = number of true positives/size of complex or pathway, and accuracy = where true positives are proteins in the same benchmark complex which are found in the local cluster.
Precision, recall, & accuracy on pre-selected MIPS clusters with various MCL inflation parameter values
Network | Precision | Recall | Accuracy |
---|---|---|---|
2.0/2.5/3.0 | 2.0/2.5/3.0 | 2.0/2.5/3.0 | |
WI-PHI | 0.471/0.512/0.524 | 0.858/0.832/0.780 | 0.636/0.657/0.639 |
FP40 | 0.469/0.538/0.605 | 0.859/0.768/0.837 | 0.635/0.643/0.711 |
FN40 | 0.400/0.423/0.432 | 0.719/0.670/0.628 | 0.537/0.532/0.521 |
Rewire40 | 0.455/0.480/0.550 | 0.666/0.565/0.461 | 0.550/0.520/0.504 |
Results for the WI-PHI network
% in same MIPS category | RRW | MCL | Naïve |
---|---|---|---|
90+% | 50% | 17% | 7.8% |
80+% | 71% | 30% | 19% |
70+% | 72% | 42% | 31% |
60+% | 86% | 57% | 44% |
50+% | 91% | 77% | 70% |
25+% | 98% | 99% | 98% |
Results for the FP40 network
% in same MIPS category | RRW | MCL | Naïve |
---|---|---|---|
90+% | 52% | 20% | 6.3% |
80+% | 72% | 34% | 17% |
70+% | 74% | 50% | 26% |
60+% | 84% | 63% | 37% |
50+% | 87% | 82% | 63% |
25+% | 99% | 99% | 96% |
Results for the FN40 network
% in same MIPS category | RRW | MCL | Naïve |
---|---|---|---|
90+% | 47% | 10% | 7.7% |
80+% | 67% | 23% | 19% |
70+% | 68% | 32% | 26% |
60+% | 87% | 49% | 44% |
50+% | 91% | 79% | 63% |
25+% | 99% | 98% | 95% |
Results for the Rewire40 network
% in same MIPS category | RRW | MCL | Naïve |
---|---|---|---|
90+% | 43% | 17% | 4.8% |
80+% | 64% | 32% | 13% |
70+% | 64% | 46% | 21% |
60+% | 77% | 65% | 33% |
50+% | 80% | 81% | 57% |
25+% | 97% | 98% | 93% |
Precision, recall, and accuracy on pre-selected MIPS clusters
Network | Precision | Recall | Accuracy |
---|---|---|---|
RRW/MCL/Naïve | RRW/MCL/Naïve | RRW/MCL/Naïve | |
WI-PHI | 0.765/0.512/0.363 | 0.734/0.832/0.791 | 0.749/0.657/0.535 |
FP40 | 0.788/0.538/0.362 | 0.724/0.768/0.795 | 0.755/0.643/0.537 |
FN40 | 0.708/0.423/0.326 | 0.595/0.670/0.699 | 0.649/0.532/0.477 |
Rewire40 | 0.667/0.480/0.370 | 0.545/0.565/0.706 | 0.603/0.520/0.511 |
Comparison to known biological processes
Results for the WI-PHI network
% same GO annotation | RRW | MCL | Naïve |
---|---|---|---|
90+% | 39% | 6% | 13% |
80+% | 60% | 22% | 26% |
70+% | 62% | 32% | 35% |
60+% | 76% | 49% | 57% |
50+% | 79% | 67% | 69% |
25+% | 96% | 97% | 98% |
Results for the FP40 network
% same GO annotation | RRW | MCL | Naïve |
---|---|---|---|
90+% | 42% | 18% | 10% |
80+% | 60% | 43% | 21% |
70+% | 62% | 58% | 28% |
60+% | 76% | 75% | 46% |
50+% | 79% | 85% | 57% |
25+% | 95% | 99% | 95% |
Results for the FN40 network
% same GO annotation | RRW | MCL | Naïve |
---|---|---|---|
90+% | 41% | 4.5% | 18% |
80+% | 60% | 19% | 32% |
70+% | 61% | 31% | 40% |
60+% | 81% | 50% | 61% |
50+% | 83% | 68% | 71% |
25+% | 97% | 98% | 99% |
Results for the Rewire40 network
% same GO annotation | RRW | MCL | Naïve |
---|---|---|---|
90+% | 30% | 13% | 7.8% |
80+% | 49% | 29% | 21% |
70+% | 49% | 35% | 25% |
60+% | 64% | 57% | 40% |
50+% | 67% | 70% | 52% |
25+% | 92% | 98% | 91% |
Analysis of select clusters for biological significance
To further validate the biological significance of the clusters discovered by RRW, we next discuss several statistically significant clusters discovered by our technique that are also biologically meaningful. One high scoring cluster found by RRW, and not created by either MCL or the naïve method, consisted of the proteins YML049c, YMR240c, YMR288w, YOR319w, and YPR094w. Though not all listed within the same MIPS complex, these 5 proteins were among the 7 found to interact in the yeast SF3b U2 snRNP subunits that associate with the pre-mRNA branchpoint region [35]. Another cluster found consisted of 5 proteins: YBL097w, YDR325w, YFR031c, YLR086w, and YLR272c. The MIPS complex catalogue did not list any of these five together in the same physical complex. However, their corresponding genes exactly match the 5 subunit S. cerevisiae condensin complex [36], essential for chromosome segregation during mitosis, demonstrating the ability of RRW to discover significant functional complexes as well as physical. Another 5 protein cluster discovered contained YDR200c, YFR008w, YLR238w, YMR029c, and YMR052w. Again, though not all contained within the same MIPS complex, these proteins have all been found to be part of a six-member group of interacting proteins that prevent recovery from pheremone arrest in yeast [37].
Conclusion
In this paper, we proposed a novel algorithm based on repeated random walks on graphs for discovering functional modules within genome-scale protein networks. We applied the RRW on an interaction network of yeast genes by Kiemer et al. [29] and efficiently identified statistically significant clusters of proteins. We validated the biological significance of the results by comparison to known complexes in both the MIPS complex catalogue database [27] and GO functional annotations [34], as well as to existing clustering techniques. The repeated random walk technique offers significant improvements in precision over existing clustering techniques by making use of the strength of functional associations as well as the network topology and providing clusters of desired overlap ratio. Overlapping clusters proved a more accurate model of real biological networks with multifunctional proteins. In summary, our technique discovers biologically more significant clusters in a genome-wide protein interaction network using global connectivity and supporting evidence information accurately and efficiently.
Methods
The Random Walk and the Repeated Random Walk algorithms
Figure 1 gives the algorithm for finding the stationary vector of a Random Walk with restarts from a single starting node. The complexity of the algorithm is O(w·|V|^{2}), where w is the number of iterations to converge. The value of w is determined by the structure of the network and the restart probability α. In general, the ratio of the first two eigenvalues of a transition matrix specifies the rate of convergence to the stationary probability [38].
The Repeated Random Walk (RRW) and Random Walk starting from a cluster (ClusterRWSimulation) algorithms are given in Figures 2 and 3. For the RRW algorithm, starting from every node in the network, sets of strongly connected proteins are found by expanding the clusters repeatedly using the ClusterRWSimulation method. Clusters of size ≤ k are inserted into a priority queue ordered by their statistical significance. For expanding a cluster C, the ClusterRWSimulation method is run and the closest protein in its stationary vector recorded. This neighbor protein is added to C, as long as its weight is within the early cutoff, λ, of the previously added protein to the cluster, resulting in one new cluster to be further expanded. The complexity is linear with the maximum cluster size, O (|V|·k).
An implementation of the RRW algorithm is available for download at http://cs.ucsb.edu/~kpm/RRW/
Declarations
Acknowledgements
This work was supported in part by NSF grant IIS-0612327. Tolga Can is partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) Career Program Grant #106E128.
Authors’ Affiliations
References
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417: 399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci 2001, 98: 4569–4574. 10.1073/pnas.061034498PubMed CentralView ArticlePubMedGoogle Scholar
- Uetz P, Cagney G, Mansfield TA, Judson R, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403: 623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415: 141–147. 10.1038/415141aView ArticlePubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415: 180–183. 10.1038/415180aView ArticlePubMedGoogle Scholar
- Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biology 2004, 5(5):R35. 10.1186/gb-2004-5-5-r35PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data. Science 2003, 302: 449–453. 10.1126/science.1087361View ArticlePubMedGoogle Scholar
- Lee I, Date SV, Adai AT, Marcotte EM: A Probabilistic Functional Network of Yeast Genes. Science 2004, 306: 1555–1558. 10.1126/science.1099511View ArticlePubMedGoogle Scholar
- von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Research 2005, 33: D433-D437. 10.1093/nar/gki005PubMed CentralView ArticlePubMedGoogle Scholar
- Arnau V, Mars S, Marin I: Iterative cluster analysis of protein interaction data. Bioinformatics 2005, 21(3):364–378. 10.1093/bioinformatics/bti021View ArticlePubMedGoogle Scholar
- Bader JS: Greedily building protein networks with confidence. Bioinformatics 2003, 19(15):1869–1874. 10.1093/bioinformatics/btg358View ArticlePubMedGoogle Scholar
- Hu H, Yan X, Huang Y, Han J, Zhou XJ: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 2005, 21(Suppl 1):i213-i221. 10.1093/bioinformatics/bti1049View ArticlePubMedGoogle Scholar
- Scholtens D, Vidal M, Gentleman R: Local modeling of global interactome networks. Bioinformatics 2005, 21(17):3548–3557. 10.1093/bioinformatics/bti567View ArticlePubMedGoogle Scholar
- Scott J, Ideker T, Karp RM, Sharan R: Efficient algorithms for detecting signaling pathways in protein interaction networks. In Proceedings of RECOMB. Cambridge, MA, USA; 2005:1–13.Google Scholar
- Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 2004, 20(S1):i363-i370. 10.1093/bioinformatics/bth910View ArticlePubMedGoogle Scholar
- Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernel-based data fusion and its application to protein function prediction in yeast. Pac Symp Biocomput 2004, 300–11.Google Scholar
- Letovsky S, Kasif S: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 2003, 19: i197-i204. 10.1093/bioinformatics/btg1026View ArticlePubMedGoogle Scholar
- Tsuda K, Noble WS: Learning kernels from biological networks by maximizing entropy. Bioinformatics 2004, 20(S1):i326-i333. 10.1093/bioinformatics/bth906View ArticlePubMedGoogle Scholar
- Asthana S, King OD, Gibbons FD, Roth FP: Predicting Protein Complex Membership Using Probabilistic Network Reliability. Genome Research 2004, 14: 1170–1175. 10.1101/gr.2203804PubMed CentralView ArticlePubMedGoogle Scholar
- Can T, Çamoğlu O, Singh AK: Analysis of protein-protein interaction networks using random walks. Proceedings of the 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics, Chicago 2005.Google Scholar
- Bader GD, Hogue CWV: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003, 4: 2. 10.1186/1471-2105-4-2PubMed CentralView ArticlePubMedGoogle Scholar
- Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences (PNAS) 2003, 100(21):12123–12128. 10.1073/pnas.2032324100View ArticleGoogle Scholar
- Enright A, Dongen SV, Ouzounis C: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 2002, 30(7):1575–1584. 10.1093/nar/30.7.1575PubMed CentralView ArticlePubMedGoogle Scholar
- Adamcsek B, Palla G, Farkas I, Derényi I, Viscek T: CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics 2006, 22(8):1021–1023. 10.1093/bioinformatics/btl039View ArticlePubMedGoogle Scholar
- Asur S, Ucar D, Parthasarathy S: An ensemble framework for clustering protein-protein interaction networks. Bioinformatics 2007, 23(13):i29-i40. 10.1093/bioinformatics/btm212View ArticlePubMedGoogle Scholar
- Lovasz L: Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty 1996, 2: 353–398.Google Scholar
- Mewes HW, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G, Munsterkotter M, Pagel P, Strack N, Stumpflen V, Warfsmann J, Ruepp A: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Research 2004, 32: D41-D44. 10.1093/nar/gkh092PubMed CentralView ArticlePubMedGoogle Scholar
- Tong H, Faloutsos C, Pan JY: Fast Random Walk With Restart and Its Applications. 2006, 613–622. International Conference on Data Mining (ICDM), Hong KongGoogle Scholar
- Kiemer L, Costa S, Ueffing M, Cesareni G: WI-PHI: A weighted yeast interactome enriched for direct physical interactions. Proteomics 2007, 7: 932–943. 10.1002/pmic.200600448View ArticlePubMedGoogle Scholar
- Dongen SV: Graph clustering by flow simulation. PhD thesis. University of Utrecht, The Netherlands; 2000.Google Scholar
- Brohee S, van Helden J: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 2006, 7: 488. 10.1186/1471-2105-7-488PubMed CentralView ArticlePubMedGoogle Scholar
- Andreopoulos B, An A, Wang X, Schroeder M: A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform 2009, 10(4):297–314.PubMedGoogle Scholar
- Myers C, Barrett D, Hibbs M, Huttenhower C, Troyanskaya O: Finding function: evaluation methods for functional genomic data. BMC Genomics 2006, 7: 187. 10.1186/1471-2164-7-187PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Q, Rymond B: Rds3p Is Required for Stable U2 snRNP Recruitment to the Splicing Apparatus. Molecular and Cellular Biology 2003, 23(20):7339–7349. 10.1128/MCB.23.20.7339-7349.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Freeman L, Aragon-Alcaide L, Strunnikov A: The Condensin Complex Governs Chromosome Condensation and Mitotic Transmission of rDNA. The Journal of Cell Biology 2000, 149(4):811–824. 10.1083/jcb.149.4.811PubMed CentralView ArticlePubMedGoogle Scholar
- Kemp H, Sprague G: Far3 and Five Interacting Proteins Prevent Premature Recovery from Pheromone Arrest in the Budding Yeast Saccharomyces cerevisiae. Molecular and Cellular Biology 2003, 23(5):1750–1763. 10.1128/MCB.23.5.1750-1763.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Corso GMD: Estimating an Eigenvector by the Power Method with a Random Start. SIAM J Matrix Anal Appl 1997, 18(4):913–937. 10.1137/S0895479895296689View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.