Resolving the structure of interactomes with hierarchical agglomerative clustering
- Yongjin Park^{1, 2} and
- Joel S Bader^{1, 2}Email author
https://doi.org/10.1186/1471-2105-12-S1-S44
© Park and Bader; licensee BioMed Central Ltd. 2011
Published: 15 February 2011
Abstract
Background
Graphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks. Network clustering is a valuable approach for summarizing the structure in large networks, for predicting unobserved interactions, and for predicting functional annotations. Many current clustering algorithms suffer from a common set of limitations: poor resolution of top-level clusters; over-splitting of bottom-level clusters; requirements to pre-define the number of clusters prior to analysis; and an inability to jointly cluster over multiple interaction types.
Results
A new algorithm, Hierarchical Agglomerative Clustering (HAC), is developed for fast clustering of heterogeneous interaction networks. This algorithm uses maximum likelihood to drive the inference of a hierarchical stochastic block model for network structure. Bayesian model selection provides a principled method for collapsing the fine-structure within the smallest groups, and for identifying the top-level groups within a network. Model scores are additive over independent interaction types, providing a direct route for simultaneous analysis of multiple interaction types. In addition to inferring network structure, this algorithm generates link predictions that with cross-validation provide a quantitative assessment of performance for real-world examples.
Conclusions
When applied to genome-scale data sets representing several organisms and interaction types, HAC provides the overall best performance in link prediction when compared with other clustering methods and with model-free graph diffusion kernels. Investigation of performance on genome-scale yeast protein interactions reveals roughly 100 top-level clusters, with a long-tailed distribution of cluster sizes. These are in turn partitioned into 1000 fine-level clusters containing 5 proteins on average, again with a long-tailed size distribution. Top-level clusters correspond to broad biological processes, whereas fine-level clusters correspond to discrete complexes. Surprisingly, link prediction based on joint clustering of physical and genetic interactions performs worse than predictions based on individual data sets, suggesting a lack of synergy in current high-throughput data.
Keywords
Background
Graphs or networks provide an excellent organizing framework for representing data from high-throughput experiments that measure interactomes, or genome-scale biological interactions: physical interactions between proteins; genetic interactions or specific phenotypes such as synthetic lethality between genes; gene regulation interactions between transcription factors and genes; and metabolic connections between enzymes and metabolites. In these networks, vertices represent genes, proteins, or other molecules, and edges represent specific interaction types [1, 2].
An important current challenge is to develop methods to analyze these and other networks, such as social networks [3]. One challenge is to infer network structure by identifying subgroups of related vertices, which in the biological domain may be inferred to have similar functions. A second challenge is to predict links that might exist but which are not represented in the data. Missing links are prevalent in biological interactomes, where over half the true interactions may be absent from current data sets, and where spurious interactions may overwhelm true interactions in raw data [4]. Even most ambitious physical interaction mapping technique was limited to ~ 20% of the total possible interaction space [5]. Models based only on degree distribution have been unable to predict missing interactions [6].
Stochastic block models, in which vertices belong to groups and vertex-vertex interactions are determined by group membership, have shown promising results for network clustering in terms of probabilistic mixtures [7, 8] (blocks) and admixtures [9] (blocks of blocks) of communities. Typically these models assume a flat structure of K top-level groups, which has the technical drawback of requiring a pre-specified value or a search over a pre-specifed range. A more serious problem, however, is a “resolution limit” in which the existence of large groups fundamentally prevents the discovery of small groups [10].
A recent hierarchical network model [11] proposed by Clauset, Moore, and Newman (CMN) provides a principled method for investigating structure at all levels by defining a probability distribution over network structures. This model avoids the resolution limit problem. It is also flexible in describing both assortative and disassortative networks. Unfortunately, it requires lengthy Markov chain Monte Carlo (MCMC) simulation to sample over network structures. More fundamentally, this model imposes an exhaustive hierarchical structure at both the top level (unrelated top-level groups are forced to merge together) and the bottom level (cohesive groups are exhaustively partitioned) of a network.
Here we describe a new algorithm, Hierarchical Agglomerative Clustering (HAC), that provides a fast, deterministic approximation for optimizing a network probability motivated by CMN. A key observation exploited by Newman and Leicht [12] is interactions with vertices outside a group often provide more information than within-group interactions. Methods that focus on within-cluster interactions, such as Bayesian Hierarchical Clustering [13], modularity scores [14], and even spectral methods [15] often miss this information. We use this information to drive accurate bottom-up clustering using a novel model selection strategy to identify groups to merge and to detect when a subtree should be collapsed into a single cluster, similar to Power Graph [16] but with a firm statistical foundation. A similar Bayesian model selection step determines when clustering should be terminated, yielding a set of top-level clusters lacking evidence for further hierarchical structure.
We then show that HAC achieves better accuracy in predicting missing links than other state-of-the-art algorithms. Moreover, the automated detection of structure at both the top and bottom level is shown to be expressive and flexible when applied to physical and genetic interactomes.
Methods
Preliminary definitions
Notation
A graph G is defined by a set of vertices V and edges E that connect pairs of vertices. This work considers undirected, unweighted edges with no self edges. Extensions to directed, weighted, and self-edges are possible but are not discussed here.
A “flat” model. A model M defines how vertices are collected into groups. These groups are denoted C_{1}, C_{2}, …, C_{ K } for a model with K groups. Each vertex is assigned to one of the K groups, and the groups are disjoint. This model can be summarized as M= {C_{ k } : k ∈ 1, …, K}. Subscripts u, v typically refer to individual vertices, and subscripts i, j, k refer to groups.
where Beta is the standard Beta function and x^{ x } = 1 for x = 0.
Generalization to a hierarchical model
Top-level terms and depend on the edges e_{ rr′ } and holes h_{ rr′ } crossing between the top-level groups r and r′, with t_{ rr′ }= e_{ rr′ }+ h_{ rr′ }. For all tree nodes, and . For branching nodes (including the top-level nodes), the edges e_{ r } holes h_{ r } refer to those crossing between the left and right sub-trees; for non-branching terminals, e_{ r } and h_{ r } refer to the edges and holes for vertices within the terminal groups.
Sampling trees with MCMC provides excellent results for predicting missing links by accumulating values for link probabilities between left and right sub-trees [11]. We have found that extending the MCMC approach to genome-scale networks is computationally burdensome. Approximation methods, such as a Variational Bayes approach [17], can reduce computational costs, but still require a good initial estimate of tree structure. Here we consider agglomerative approaches for finding trees T that optimize the objective function L(M) and its fully Bayesian counterpart P(M).
Agglomerative clustering
Maximum likelihood guide tree
where and similarly h_{ b } = t_{ b } - e_{ b } count the edges and holes between all pairs of top-level groups before merging 1 and 2, and e_{12} and h_{12} count the edges and holes just between groups 1 and 2. Under the star model, any two groups with the same values of e_{12} and t_{12} will have identical ratios . At the initial step, every pair of vertices will have one of two merging scores, depending on whether e_{12} = 1 or 0. Additional criteria are then required to avoid bad merges at the start of clustering. In contrast, gathers information from shared patterns of connectivity with other grops. In particular, at the initial step when each group is a single vertex, , where the number of mismatches is
Greedy agglomerative algorithm
The likelihood ratio leads to an agglomerative algorithm that successively merges the two clusters have the largest value.
Initialize top-level clusters as {{v} : v ∈ V}
Initialize K ← V
while K > 1 do
Find top-level clusters i,j with largest
Add top-level cluster r; L(r) = i and R(r) = j
Remove clusters i and j from the top level
K ← K – 1
end while
We call this method HAC-ML. The time complexity of a naϊve implementation scales as O(V^{4}), but using a priority queue, restricting possible merging pairs to clusters that share at least one common neighbor, and lazy evaluation of λ reduce the complexity to O(EJ log V), where E is the total number of edges and J is the average vertex degree.
Bayesian model selection for top-level and terminal clusters
A reasonable stopping criterion is for the best merge [18]. While there are K(K – 1)/ 2 possible merges, we do not include this factor in the stopping criterion.
is calculated, where the subscripts indicate edges and holes within and between groups. The merged cluster is collapsed if . Clusters of two vertices are always merged because λ^{ C } = 1. While there are ways for the reverse process of splitting a cluster into two non-empty groups of sizes n_{1} and n_{2}, we do not include this factor in the model selection.
Extension to multiple edge types
The HAC-ML algorithm is directly applicable to networks with multiple edge types. Rather than merging the edges into a single superimposed network, each edge type α defines its own likelihood L^{(}^{ α }^{)}(M) and probability P^{(}^{ α }^{)}(M) for a particular model M. The full likelihood and full probability are then obtained as products over the edge types, L = ∏_{ α }L^{(}^{ α }^{)} and P = ∏_{ α } P^{(}^{ α }^{)}.
Performance Evaluation
Data preparation
Experimental evidence codes listed in BioGRID database (http://thebiogrid.org) provide a way to distinguish physical versus genetic interaction pairs. We built a physical network collecting all physically binding or interacting pairs and a genetic network restricted to negative interactions comprising to empirical evidence codes Negative Genetic, Synthetic Growth Defect, Synthetic Haploin-sufficiency, and Synthetic Lethality. We ignored redundant pairs within each type of network such that resulting graphs were undirected and unweighted. We then iteratively removed isolated or degree-1 vertices, as these provide scant information for clustering. For other non-BioGRID genetic interaction datasets we filtered out positively weighted pairs and applied the same iterative removal. In joint-network analysis, we restricted attention to the common intersection of genes.
Other methods
We compared HAC-ML with other deterministic methods: Fast Modularity (CNM; Clauset et al.[14]), Variational Bayes Modularity (VBM; Hofman and Wiggins [19], and Graph Diffusion Kernel (GDK; Qi et al.[20]). CNM is an efficient algorithm that directly optimizes Newman modularity [21]. VBM simplifies network data to one intra- and one inter-community probability distribution. For GDK by discriminating between even-length and odd-length paths, Qi et al.[20] improved link prediction performance, particularly for disassorative (bipartite-like) networks. We used the odd parity kernel with the recommended damping parameter set to 1.0.
Different merging scores
where d_{ u } and d_{ v } are vertex degrees and E is the total number of edges. This algorithm is essentially CNM, but retains the hierarchical structure defined by the merge order for link prediction (rather than predicting links based on the cut that maximizes modularity). Replacing with ρ_{ e }, ρ_{ e } + ρ_{ s }, and Q yields algorithms HAC-E, HAC-ES, and HAC-Q.
Link prediction
We assessed correctness of a model in the framework of link prediction as presented in Henderson et al.[8]. Starting with a real-world network, training networks are generated by deleting a specified fraction of edges. A test set is defined by the held-out edges and a random choice of an equal number of holes. We then ran all methods on the training data set. The trained group structure provides maximum likelihood estimates for edges within and between clusters (Eq. 9). For VBM and CNM, we estimated edge densities between all pairs of clusters and within all clusters. For hierarchical models, we estimated densities between all left and right clusters at all tree levels. For GDK, each pair’s diffusion was directly used to rank pairs. Finally we assessed precision and recall of pairs in the test set ranked by link probability or GDK score. The counts of true positives (TP), false positives (FP), and false negatives (FN) as function of the number of predictions define the Precision, TP/(TP+FP), and the Recall, TP/(TP+FN). The F-score is the maximum value of harmonic mean of Precision and Recall. This test set definition is suitable for assessment, but overstates practical performance by reducing the number of negative test examples for a sparse network. Note that for large real-world networks, group assignments are generally unknown, making it difficult to assess group assignments directly.
Implementation
Algorithms were implemented in C++ and are available under an open source BSD license as supplementary material and from http://www.baderzone.org.
Results and Discussion
Data preparation
Network data sets
Name | V | E | Kind | Organism | Source | |
---|---|---|---|---|---|---|
Arabidopsis | 777 | 1,831 | 4.71 | Physical | A. Thaliana | BioGRID^{1} |
Celegans | 1,089 | 2,842 | 5.22 | Physical | C. elegans | BioGRID^{1} |
Drosophila | 4,692 | 19,876 | 8.47 | Physical | D. melanogaster | BioGRID^{1} |
Human | 6,094 | 26,112 | 8.57 | Physical | H. sapiens | BioGRID^{1} |
Yeast-PPI | 5,105 | 50,542 | 19.80 | Physical | S. cerevisiae | BioGRID^{1} |
Yeast-GEN | 4,763 | 85,855 | 36.05 | Genetic | S. cerevisiae | BioGRID^{1,2} |
SGA | 4,398 | 108,369 | 49.38 | Genetic | S. cerevisiae | Costanzo et al.^{3} |
dSLAM | 627 | 4,710 | 15.02 | Genetic | S. cerevisiae | Pan et al.^{4} |
Empirical evaluation
Link prediction performance of 85/15 cross validation (7.5% of observed edges held out).
Physical interactions | |||||||
---|---|---|---|---|---|---|---|
Data | HAC-ML | GDK | CNM | VBM | HAC-ES | HAC-E | HAC-Q |
Yeast-PPI | 0.79±0.5 | 0.69±0.3 | 0.69±0.7 | 0.76±0.4 | 0.71±0.5 | 0.69±0.7 | 0.69±0.8 |
Drosophila | 0.73±0.8 | 0.66±0.2 | 0.67±0.4 | 0.70±0.4 | 0.67±0.3 | 0.67±0.3 | 0.67±0.4 |
Human | 0.73±0.9 | 0.75±0.7 | 0.71±0.5 | 0.70±0.6 | 0.67±0.4 | 0.68±0.5 | 0.69±1.0 |
Celegans | 0.68±1.5 | 0.67±1.3 | 0.68±1.3 | 0.66±0.6 | 0.66±0.8 | 0.66±0.7 | 0.67±0.8 |
Arabidopsis | 0.80±8.3 | 0.92±2.2 | 0.92±3.2 | 0.90±3.6 | 0.78±11.0 | 0.87±10.8 | 0.88±11.4 |
Genetic interactions | |||||||
Data | HAC-ML | GDK | CNM | VBM | HAC-ES | HAC-E | HAC-Q |
Yeast-GEN | 0.78±2.3 | 0.67±0.0 | 0.69±0.7 | 0.74±6.0 | 0.73±0.8 | 0.67±0.1 | 0.69±0.7 |
SGA | 0.76±1.5 | 0.67±0.0 | 0.67±0.2 | 0.76±0.3 | 0.70±0.2 | 0.67±0.0 | 0.69±0.2 |
SLAM | 0.92±1.0 | 0.91±0.5 | 0.68±0.8 | 0.67±0.3 | 0.84±2.9 | 0.76±1.0 | 0.67±0.3 |
Among top-ranked pairs, the flat models CNM and VBM perform worse than the hierarchical models. The performance of CNM is improved to nearly the performance of HAC-ML by using HAC-Q to determine the merge order. The poor performance of CNM and VBM in the high-precision region may reflect the inherent resolution limit of a flat model [10] that hierarchical models do not appear to be limited.
Methods that consider shared neighbors, including HAC-ML and GDK, also perform better than methods that ignore this information, such as HAC-E. Shared neighbors are strong predictors of missing links in networks of protein interactions [25] and genetic interactions [26]. Methods that consider shared neighbors, as opposed to just modularity or density, perform better for disassortative networks such as Yeast-GEN. The VBM method, which assumes homogeneous groups, may also work incorrectly when applied to networks with a mix of assortative and disassortative group structures.
Multi-resolution views of a physical interaction network
Visual inspection indicates that the bottom-level clusters are subsets of known GO annotation categories, and may provide greater resolution than existing bottom-level GO categories. These results also indicate connections between GO categories learned from high-throughput data. An example is process of autophagy, which starts by forming a membrane-bound component that engulfs excess cytosolic proteins and make degraded in lysosome or other vacuoles [30, 31]. Therefore “vecicle fusion” and “vesicle-mediated transport” are its mechanistic processes; a proper “protein localization” and targeting is required. Connections with plasma membrane proteins have become recently known, suggesting that plasma membrane is the source of autophagosome and autophagy is initiated by de novo assembly of proteins and lipids [32, 33]. As autophagy is a response to starvation [30] to re-use available intracellular resources. We find that disjoint low-level clusters correspond to “autophagy” and “golgi to plasma membrane transport”, suggesting that different proteins are responsible for transport in each direction. Moreover seemingly distant relationship to “exocytosis” is under investigation [34].
Synergy in mixed networks
Link prediction performance of joint analysis.
HAC-ML | Prediction of | ||
---|---|---|---|
Trained by | PPI | SGA | GEN |
PPI | 0.75±1.6 | ||
SGA | 0.77±1.0 | ||
GEN | 0.78±1.4 | ||
PPI+SGA | 0.69±0.5 | 0.73±0.8 | |
PPI+GEN | 0.71±1.1 | 0.79±0.5 | |
SGA+GEN | 0.77±1.0 | 0.78±1.1 | |
PPI+SGA+GEN | 0.68±1.2 | 0.73±0.3 | 0.78±0.6 |
This lack of synergy may arise from high-throughput studies exploring different subsets of genes and proteins. Moreover our joint analysis assumes different types of edges are generated under a common group structure, but this pattern might be disrupted by a large fraction of false positive interactions, or some edge types might conflict with others. In presence of prevalent false positive interactions, physical and genetic interactions might not be directly complementary or orthogonal to each other in contrary to Kelley et al.[36]. In our simulation study, where orthogonality is well-preserved, HAC-ML trained by multiple data sources significantly outperformed (results not shown). To resolve this issue, a kernel-based method used by the previous studies [35] can be beneficial, but this is an open research problem.
Conclusions
The hierarchical agglomerative clustering methods HAC-ML is effective at discovering structure in real-world networks, with the ability to resolve both top-level and bottom-level groups. It provides superior performance for link prediction when applied to real-world networks, with a good tradeoff between efficiency and accuracy.
A general weakness of deterministic optimization heuristics is the possibly of becoming trapped in a local minimum. A more fundamental weakness is that different aspects of cross-cutting network structure may be reflected by multiple pertinent local minima. Even so, the group structure generated by HAC-ML can be used as a starting point for MCMC sampling over tree structures, which can provide better results than any single tree [11].
Unlike many agglomerative algorithms which effectively introduce a new parameter every time two groups are merged, HAC-ML starts from a full model and removes parameters at each step. This approach gathers information from shared interaction patterns in building a guide tree, and then uses Bayesian model selection to collapse the bottom level of the tree and terminate the clustering at the top level. Extensions to joint analysis of multiple networks are provided, and extensions to more complex networks with weighted, directed, and time-varying edges are easily envisioned within the same probabilistic framework.
Declarations
Acknowledgements
We acknowledge funding from the NIH and the NSF. We acknowledge helpful discussions with Cris Moore, Mark Newman, Aaron Clauset, Chris Wiggins, and David Bader.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
Authors’ Affiliations
References
- Bader GD, Hogue CWV: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003, 4: 2. 10.1186/1471-2105-4-2PubMed CentralView ArticlePubMedGoogle Scholar
- Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 2003, 100(21):12123–12128. 10.1073/pnas.2032324100PubMed CentralView ArticlePubMedGoogle Scholar
- Zachary WW: An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research 1977, 33(4):452–473.Google Scholar
- Huang H, Bader JS: Precision and recall estimates for two-hybrid screens. Bioinformatics 2009, 25(3):372–8. 10.1093/bioinformatics/btn640PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabasi AL, Tavernier J, Hill DE, Vidal M: High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science 2008, 322(5898):104–110. 10.1126/science.1158684PubMed CentralView ArticlePubMedGoogle Scholar
- Han JDJ, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 2005, 23(7):839–844. 10.1038/nbt1116View ArticlePubMedGoogle Scholar
- Zhang H, Qiu B, Giles C, Foley H, Yen J: An LDA-based community structure discovery approach for large-scale social networks. IEEE Intelligence and Security Informatics 2007.Google Scholar
- Henderson K, Eliassi-Rad T, Papadimitriou S, Faloutsos C: HCDF: A Hybrid Community Discovery Framework. SDM, SIAM 2010, 754–765.Google Scholar
- Airoldi E, Blei D, Fienberg S, Xing E: Mixed Membership Stochastic Blockmodels. The Journal of Machine Learning Research 2008., 9:Google Scholar
- Fortunato S, Barthélemy M: Resolution limit in community detection. Proc Natl Acad Sci USA 2007, 104: 36–41. 10.1073/pnas.0605965104PubMed CentralView ArticlePubMedGoogle Scholar
- Clauset A, Moore C, Newman MEJ: Hierarchical structure and the prediction of missing links in networks. Nature 2008, 453(7191):98–101. 10.1038/nature06830View ArticlePubMedGoogle Scholar
- Newman MEJ, Leicht EA: Mixture models and exploratory analysis in networks. Proc Natl Acad Sci USA 2007, 104(23):9564–9569. 10.1073/pnas.0610537104PubMed CentralView ArticlePubMedGoogle Scholar
- Heller K, Ghahramani Z: Bayesian hierarchical clustering. The 22nd International Conference on Machine Learning 2005.Google Scholar
- Clauset A, Newman MEJ, Moore C: Finding community structure in very large networks. Physical review E, Statistical, nonlinear, and soft matter physics 2004, 70(6 Pt 2):66111. 10.1103/PhysRevE.70.066111View ArticleGoogle Scholar
- Luxburg UV: A Tutorial on Spectral Clustering. Tech. Rep. March, Max Planck Institute for Biological Cybernetics 2007.Google Scholar
- Royer L, Reimann M, Andreopoulos B, Schroeder M: Unraveling protein networks with power graph analysis. PLoS computational biology 2008, 4(7):e1000108. 10.1371/journal.pcbi.1000108PubMed CentralView ArticlePubMedGoogle Scholar
- Park Y, Moore C, Bader JS: Dynamic networks from hierarchical bayesian graph clustering. PloS one 2010, 5: e8118. 10.1371/journal.pone.0008118PubMed CentralView ArticlePubMedGoogle Scholar
- Kass RE, Raftery AE: Bayes Factors. Journal of the American Statistical Association 1995, 90(430):773–795. 10.2307/2291091View ArticleGoogle Scholar
- Hofman JM, Wiggins CH: Bayesian approach to network modularity. Phys Rev Lett 2008, 100(25):258701. 10.1103/PhysRevLett.100.258701PubMed CentralView ArticlePubMedGoogle Scholar
- Qi Y, Suhail Y, Lin Yy, Boeke JD, Bader JS: Finding friends and enemies in an enemies-only network: A graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Research 2008, 18(12):1991–2004. 10.1101/gr.077693.108PubMed CentralView ArticlePubMedGoogle Scholar
- Newman MEJ: Modularity and community structure in networks. Proc Natl Acad Sci USA 2006, 103(23):8577–8582. 10.1073/pnas.0601602103PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database Issue):D535. 10.1093/nar/gkj109PubMed CentralView ArticlePubMedGoogle Scholar
- Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JLY, Toufighi K, Mostafavi S, Prinz J, St Onge RP, VanderSluis B, Makhnevych T, Vizeacoumar FJ, Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, Marback M, Paw J, San Luis BJ, Shuteriqi E, Tong AHY, van Dyk N, Wallace IM, Whitney JA, Weirauch MT, Zhong G, Zhu H, Houry WA, Brudno M, Ragibizadeh S, Papp B, Pál C, Roth FP, Giaever G, Nislow C, Troyanskaya OG, Bussey H, Bader GD, Gingras AC, Morris QD, Kim PM, Kaiser CA, Myers CL, Andrews BJ, Boone C: The genetic landscape of a cell. Science 2010, 327(5964):425–31. 10.1126/science.1180823View ArticlePubMedGoogle Scholar
- Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD: A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 2006, 124(5):1069–1081. 10.1016/j.cell.2005.12.036View ArticlePubMedGoogle Scholar
- Goldberg DS, Roth FP: Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 2003, 100(8):4372–4376. 10.1073/pnas.0735871100PubMed CentralView ArticlePubMedGoogle Scholar
- Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS: Gene function prediction from congruent synthetic lethal interactions in yeast. Mol Syst Biol 2005., 1: 2005.0026 2005.0026 10.1038/msb4100034Google Scholar
- Clauset A, Shalizi CR, Newman MEJ: Power-Law Distributions in Empirical Data. SIAM Review 2009, 51(4):661. 10.1137/070710111View ArticleGoogle Scholar
- Palla G, Derényi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435(7043):814–8. 10.1038/nature03607View ArticlePubMedGoogle Scholar
- Rivera CG, Vakil R, Bader JS: NeMo: Network Module identification in Cytoscape. BMC Bioinformatics 2010, 11(Suppl 1):S61. 10.1186/1471-2105-11-S1-S61PubMed CentralView ArticlePubMedGoogle Scholar
- Mizushima N, Levine B, Cuervo AM, Klionsky DJ: Autophagy fights disease through cellular self-digestion. Nature 2008, 451(7182):1069–1075. 10.1038/nature06639PubMed CentralView ArticlePubMedGoogle Scholar
- He C, Klionsky DJ: Regulation Mechanisms and Signaling Pathways of Autophagy. Annual Review of Genetics 2009, 43: 67–93. 10.1146/annurev-genet-102808-114910PubMed CentralView ArticlePubMedGoogle Scholar
- Cuervo AM: The plasma membrane brings autophagosomes to life. Nat Cell Biol 2010, 12(8):735–737. 10.1038/ncb0810-735View ArticlePubMedGoogle Scholar
- Ravikumar B, Moreau K, Jahreiss L, Puri C, Rubinsztein DC: Plasma membrane contributes to the formation of pre-autophagosomal structures. Nat Cell Biol 2010, 12(8):747–757. 10.1038/ncb2078PubMed CentralView ArticlePubMedGoogle Scholar
- Pfeffer SR: Unconventional secretion by autophagosome exocytosis. The Journal of Cell Biology 2010, 188(4):451–452. 10.1083/jcb.201001121PubMed CentralView ArticlePubMedGoogle Scholar
- Qiu J, Noble WS: Predicting Co-Complexed Protein Pairs from Heterogeneous Data. PLoS Comput Biol 2008, 4(4):e1000054. 10.1371/journal.pcbi.1000054PubMed CentralView ArticlePubMedGoogle Scholar
- Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol 2005, 23(5):561–566. 10.1038/nbt1096PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.