RocSampler: regularizing overlapping protein complexes in proteinprotein interaction networks
 Osamu Maruyama^{1}Email author and
 Yuki Kuwahara^{2}
https://doi.org/10.1186/s1285901719205
© The Author(s) 2017
Published: 6 December 2017
Abstract
Background
In recent years, proteinprotein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously.
Results
In this paper, we propose a samplingbased protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushuu.ac.jp/~om/software/RocSampler/.
Conclusions
We have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction.
Keywords
Background
In recent years, proteinprotein interaction (PPI) datasets have been recognized as important resources to elucidate various biological processes and cellular mechanisms. The prediction of protein complexes from PPIs (see, for example, survey papers [1–3]) is one of the most challenging inference problems from PPIs because protein complexes are essential entities in the cell. Proteins’ functions are manifested in the form of a protein complex. Thus, the identification of protein complexes is necessary for the precise description of biological systems.
For protein complex prediction, many computational methods have been proposed, which were directly or indirectly designed based on the observation that densely connected subgraphs, or clusters of proteins, of a whole PPI network often overlap with known complexes. This observation is often valid for relatively large protein complexes. However, small complexes, consisting of two or three proteins, form a major category of the known complexes of an organism [4, 5]. For example, a yeast protein complex database, CYC2008 [6], with 408 protein complexes includes 172 (42%) complexes consisting of two different proteins (called heterodimeric complexes), and 87 (21%) complexes consisting of three different proteins (called heterotrimeric complexes). Unfortunately, the density measure for a cluster of proteins, being a predicted complex, works less for smaller ones because the connectivity of PPIs within such a complex has small variations. For example, a cluster with two components either has an interaction or not. Thus, how to predict small complexes accurately is a critical issue,
To resolve this issue, we have proposed a samplingbased method for predicting protein complexes, PPSampler2 [4]. The concept of PPSampler2 involves regulating the frequency of the sizes of predicted clusters by a regularization term designed based on the observation that the distribution of the sizes of the complexes of an organism (see, for example, CYC2008 [6] for yeast and CORUM [7] for human) can be approximated by a powerlaw distribution. Namely, the regularization term evaluates how the distribution of the sizes of predicted clusters is likely to be a powerlaw distribution. The regularization term is used as part of the whole scoring function of PPSampler2. As a result, it is possible to identify small predicted complexes with relatively high accuracy.
However, there is a drawback to the model for the collection of clusters of proteins predicted by PPSampler2. This model involves a partition of all proteins in a given PPI network, and every element with two or more proteins is taken as a predicted complex. Thus, any two predicted complexes are exclusive, namely, they never share any common proteins due to the structure of partition. This partition model is also adopted by the Markov cluster algorithm (MCL), which is a popular nodeclustering algorithm for an edgeweighted undirected graph based on the simulation of stochastic flow in the graph [8]. On the other hand, it is known that many complexes overlap with each other, namely they share common proteins. Actually, CYC2008 has 216 pairs of complexes sharing one or more common proteins. In this sense, the partition model is not the best model for a collection of predicted complexes. However, PPSampler2 and MCL are reported to achieve relatively good performance [4]. This implies that the partition model is a good approximation model for a set of predicted complexes.
Some existing methods indirectly allow predicted complexes to overlap with each other. Such methods often adopt the same scheme, which can be called the clusterexpansion approach. This involves repeatedly expanding a cluster of proteins by adding a protein out of the cluster, where an initial cluster is a cluster with either a single protein or a pair of proteins sharing an interaction, until a stop criterion is satisfied. After this expansion process is applied to all initial clusters, some of the resulting clusters can overlap with each other. If two predicted clusters have a large overlap, the highscoring one remains and the other is discarded, or they are merged into one. This pruning process is repeated until there are no large overlaps between clusters. As a result, some clusters still overlap with each other. Examples of the clusterexpansion approach are ClusterONE [9], RRW [10], and NWE [11].
In this work, to address both of the issues of predicting small complexes and overlapping complexes simultaneously, we improve PPSampler2 by relaxing the partition model for a set of predicted complexes, so that predicted complexes are allowed to overlap with each other. To realize this relaxation, we propose a regularization term for controlling overlaps of predicted complexes, and add it as part of the whole scoring function of the new method. Furthermore, we have designed a proposal function, by which a current set of predicted complexes, some of which can overlap with each other, is partially modified into a new one. We call the resulting method RocSampler (Regularizing Overlapping Complexes). In addition, RocSampler uses refined terms of the scoring function of PPSampler2. We have empirically shown that RocSampler is superior to existing methods on five different yeast PPI datasets.
Methods
We construct a MetropolisHastings algorithm for P(X,γ) with a fixed constant, T. This algorithm generates a sequence of samples from the distribution over (X,γ). Furthermore, for the MetropolisHastings algorithm, we introduce a cooling scheme, that is, a way of decreasing T gradually. Thus, the resulting method becomes a simulated annealing algorithm, shown in Algorithm 1, where a state of (X,γ) is denoted by Z for simplicity. We call the resulting algorithm RocSampler (Regularizing Overlapping Complexes). Among all samples, the one whose score is lowest is returned as the output of an execution.
In the subsequent section, we give the models of the input and output of the scoring function, f(X,γ), and some notations used throughout this paper. After that, we describe three key components of our methods: (i) the scoring function, f(X,γ), (ii) a proposal function that randomly generates a candidate state, (X ^{′},γ ^{′}), from a current one, (X,γ), and (iii) a cooling scheme of T.
Notations
Every predicted cluster, x _{ i }∈X, should have two or more components as it models a protein complex. Note that, in this model, clusters are allowed to overlap with each other.
Scoring function
where S _{max} is the upper bound on the size of a predicted cluster. The default value is simply set to be 100, and c _{ c l u−d i s } c _{ c l u−s i z e },c _{ hy }, and c _{ p r o−n u m }, are the coefficients of the corresponding terms.
Here, we briefly explain each term. After that, we give their details. The first term, b(X), checks the minimum requirements for the predicted cluster of X. Whenever there is a cluster in X violating at least one of them, the resulting probability of X is zero. The second term, h _{ c l u−d e n }(X), calculates the negative of the sum of a generalized density of a predicted cluster in X. The effectiveness of these two terms for protein complex prediction is empirically shown in our previous works [4, 12]. The term of h _{ c l u−d i s }(X) is a newly introduced regularizer to penalize overlaps between predicted clusters of X. The remaining terms, \(\sum _{s=2}^{S_{\max }} h_{clusize,s}(X,\gamma), h_{hy}(\gamma)\), and h _{ p r o−n u m }(X), are regularization terms refined from the original ones of the previous works. The group of terms, \(\sum _{s=2}^{S_{\max }} h_{clusize,s}(X,\gamma)\), is a regularizer that checks how the distribution of the sizes of predicted clusters in X is similar to the powerlaw distribution of the scaling exponent γ. The term of h _{ p r o−n u m }(X) is also another regularizer that restricts the number of proteins included in X.
Basic constraints on the model of a protein complex
The minimum size of predicted clusters is set to be two in our method since a true complex has two or more components. The Boolean term does not include this minimum size requirement because our procedure never produces a predicted cluster with fewer than two components.
Density measure
Regularizing overlaps of clusters
One of the mathematical models representing a set of predicted clusters of proteins is a partition of all proteins of a given set of PPIs, where each element with two or more components in the partition represents a predicted cluster. For example, this model is adopted by MCL [8], SPICi [13], and PPSampler2 [4]. If those clusters could be allowed to slightly overlap with each other, the predictability of those tools is expected to be improved by identifying overlapping complexes. We then design a regularization term that gives a larger penalty for a larger overlap (or say, less dissimilar) between two predicted clusters.
Note that this dissimilarity measure has a similar role to the repulsive force term used in the task of simultaneously finding multiple sequence motifs [14].
Regularizing the distribution of cluster sizes
The parameter γ _{0} is set to be 2.5, the median of the interval, (2,3), which is the typical range of a scaling exponent of powerlaw distributions in physics, biology, and the social sciences [15]. Note that this prior distribution of γ is introduced in this work, although γ was fixed to be 2 in the previous work [4], which is almost the same as 2.02, the scaling exponent of the powerlaw regression curve mentioned above.
Regularizing the number of proteins in clusters
Using the term h _{ p r o−n u m }(X), we also control the total number of proteins over all predicted clusters in X. The term is simply formulated as the square of that number, that is,
This term is simpler than the corresponding term, \(\left (\left  \bigcup _{x \in X} x \right   \lambda \right)^{2}\), given in the previous work [4, 12], where λ is a parameter representing a target number of proteins over all clusters. Thus, we do not need to specify that parameter in our new method.
Proposal function

randomly add a new cluster with two components to a set of predicted clusters, X,

randomly add a new protein to a cluster in X,

randomly remove a cluster with two components in X, and

randomly remove a protein from a cluster in X.
Details of the four procedures are explained in the subsequent sections. After executing one of the above four options, the proposal function subsequently proposes a new candidate value of γ, which is max{10^{−10},γ+ε} where \(\varepsilon \sim \mathcal {N}(0,0.001)\). Note that \(\mathcal {N}(\mu,\sigma ^{2})\) is the normal distribution with mean parameter, μ, and variance parameter, σ ^{2}. The minimum value of 10^{−10} is used to avoid the value γ being negative.
Adding a new cluster with two components
Adding a protein to a cluster
 1.
A cluster, x, is uniformly chosen at random from X.
 2.
A protein, u, is randomly chosen from N(x) with probability proportional to w(u,x), which is the sum of the weights of the interactions between u and all components of x.
 3.
The chosen protein, u, is added to x.
Removing a cluster with two components
This procedure removes a cluster with two components from X. It chooses a cluster, x, of size two from X at random with probability proportional to the inverse of the weight of the unique interaction of x. The probability of this proposal is given as
Removing a protein from a cluster
 1.
A cluster, x, is uniformly chosen at random from the clusters with three or more components in X.
 2.
A protein, u, in x is randomly chosen with probability proportional to 1/w(u,x), representing the inverse of the strength of the connectivity between u and x.
 3.
The chosen protein, u, is removed from x.
Cooling schedule for the temperature
Performance measure
The former represents the subset of \(\mathcal {X}\), each of which matches at least one known complex in \(\mathcal {K}\) with η. The latter is the subset of \(\mathcal {K}\), each of which matches at least one predicted cluster in \(\mathcal {X}\) with η. For an integer i (≥2), we denote by X_{ i } the subset of X whose elements have i components, that is, X_{ i }={x∈Xx=i}, and by X_{≥i } the subset of X whose elements have i or more components, that is, X_{≥i }={x∈Xx≥i}. Similarly, we introduce the notations of K_{ i } and K_{≥i } for K. We then formulate the precision and recall as follows:
Results and discussion
Input PPI datasets and gold standard protein complexes
Input PPI datasets
#Protein  #PPI  Degree  Threshold  

WIPHI  5,953  49,607  16.7  N/A 
Collins  1,622  9,074  11.2  top 9,074 
Krogan core  2,708  7,123  5.3  0.273 
Krogan extended  3,672  14,317  7.8  0.101 
Gavin  1,855  7,669  8.3  5 
In addition to the WIPHI dataset, we also use four different datasets of PPIs with weights, which are denoted by Collins [19], Gavin [20], Krogan core, and Krogan extended [21], which were also used in [9]. As shown in Table 1, the number of proteins included in each dataset is much smaller than the number of all yeast proteins, which is about 6,000. Those datasets are filtered by the threshold of those weights, shown in Table 1, to use reliable PPIs. Those thresholds are the same as in the original papers [19–21] of the PPI datasets and the work [9].
The frequency of overlap sizes of protein complexes in CYC2008
Overlap size  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17 

Frequency  151  22  9  13  4  1  10  1  0  1  1  1  0  0  0  1  1 
Performance comparison
To evaluate how RocSampler works well, we carry out a performance comparison with existing methods, MCL [8], SPICi [13], ClusterONE [9], NWE [11], and PPSampler2 [4]. For each tool and each PPI dataset, the parameter set with the highest Fmeasure is determined as follows. MCL is a popular clusteringbased method. It alternately repeats two different steps. One is the expansion step, which takes the square of a current transition matrix of an input PPI network. Another is the inflation step, in which all transition probabilities are raised to the power of the value of the inflation parameter and normalized. The inflation parameter is optimized over the range from 1.2 to 5.0 in steps of 0.1. SPICi is a clustering algorithm using the weighted version of the standard density measure. The parameters of minimum cluster density and minimum support threshold are independently chosen in the range from 0.1 to 0.9 in steps of 0.1. The graph mode parameter is also optimized over 0 (sparse graph), 1 (dense graph), and 2 (large sparse graph). ClusterONE is also a clustering algorithm using a cohesiveness score. The most important parameter is the minimum density of predicted complexes. We optimized the parameter value over the range from 0.1 to 0.9 in steps of 0.1. NWE executes random walks with restarts and constructs predicted clusters based on the probability from one protein to another obtained from the random walks. Here, three parameters are optimized. The restart probability takes the range from 0.4 to 0.8 in steps of 0.1. The early cutoff is optimized in the range from 0.3 to 0.7 in steps of 0.1. The overlap threshold is selected from the range from 0.1 to 0.4 in steps of 0.1. PPSampler2 is an MCMC(Markov Chain Monte Carlo)based method whose structure of a set of predicted clusters is a partition of proteins. The following four parameters are optimized. The coefficient of the term regulating the powerlaw distribution of sizes of predicted clusters is selected among 500, 1000, and 1,500. The scaling exponent is optimized over 2.0, 2.5, and 3.0. The coefficient of the term regulating the number of proteins over predicted clusters is selected from 10^{5}, 10^{6}, and 10^{7}. The target number of proteins used in that term, λ, is selected from 1,000, 2,000, and 3,000. The four coefficients of the scoring function of RocSampler are optimized over the ranges: β∈{0.2,0.3,0.4}, c _{ c l u−s i z e }∈{200,300,400,500}, c _{ hy }∈{5,10,15,20}, c _{ p r o−n u m }∈{5×10^{−5},10^{−4},1.5×10^{−4}}, and c _{ c l u−d i s }∈{70,90,110,130,150,170}. The repeat count, L, is fixed to 5,000,000.
Note that MCL, SPICi, and PPSampler2 do not allow predicted clusters to overlap with each other.
Prediction from WIPHI
Selected parameters
Parameters  Value  

MCL  Inflation  3.4 
SPICi  Density, support, graph  0.1, 0.5, 0 
ClusterONE  Density  0.2 
NWE  Restart, cutoff, overlap  0.4, 0.3, 0.3 
PPSampler2  Size dist coef, scaling exp,  500, 3, 
Protein num coef, λ  10^{6}, 2000  
RocSampler  c _{ c l u−d i s }, β, c _{ c l u−s i z e },c _{ hy },  110, 0.2, 500, 10, 
c _{ p r o−n u m },  5×10^{−5} 
We here compare the performances of PPSampler2 and RocSampler intensively, because, RocSampler is an improved version of PPSampler2. The precision scores of PPSampler2 and RocSampler are 145/396 = 0.37 and 147/281 = 0.52, respectively. On the other hand, their recall scores are, as mentioned, the same, 156/408 = 0.38. Thus, RocSampler improves the precision score without reducing the recall score. As a result, the Fmeasure score of RocSampler, 0.44, is 19% higher than that of PPSampler2, 0.37.
Surprisingly, no predicted clusters of RocSampler overlap with others, although we had expected that some would overlap with each other. A relatively sparse set of predicted clusters might be a good approximation to the current gold standard protein complexes, although further investigation of this issue is required.
We have mentioned that the scaling exponent of the powerlaw regression curve in Fig. 1 is 2.02. The found value of γ is 1.91, which is quite similar to the true value.
Prediction from other PPI datasets
Selected parameters for the Collins, Gavin, Krogan core, and Krogan extended PPI datasets
Collins  Gavin  Krogan core  Krogan extended  

MCL  2.1  2.5  2.4  1.6 
SPICi  0.1, 0.5, 0  0.4, 0.4, 0  0.6, 0.3, 0  0.6, 0.4, 0 
ClusterONE  0.7  0.4  0.6  0.7 
NWE  0.4, 0.3, 0.1  0.4, 0.3, 0.2  0.4, 0.3, 0.4  0.4, 0.7, 0.1 
PPSampler2  1500, 3, 10^{7}, 1000  500, 2.5, 10^{5}, 1000  1000, 2, 10^{7}, 1000  1500, 3, 10^{7}, 1000 
RocSampler  90, 0.3 300, 5,  170, 0.2, 200, 20,  170, 0.3, 500, 15,  150, 0.3, 200, 15, 
1.5×10^{−4}  10^{−4}  1.5×10^{−4}  1.5×10^{−4} 
Example of overlapping clusters
RocSampler has succeeded in predicting overlapping clusters only from the Collins PPI dataset. We here give an example of such overlapping clusters, which are good predictions of known complexes.
On the other hand, PPSampler2 found the cluster with Mud1p, Luc7p, Prp42p, Snu56p, Snu71p, Nam8p, Snp1p, Prp40p, Yhc1p, Prp39p, Sto1p, Cbc2p, and Smx3p. This cluster includes only Smx3p among the seven components of the heteroheptameric complex. Although it matches the commitment complex and U1 snRNP complex, the Jaccard indexes are 0.61 and 0.58, lower than the corresponding ones of RocSampler. It can be expected that all or most of the remaining components of the heteroheptameric complex are included in another cluster which matches the U4/U6.U5 trisnRNP complex, but PPSampler2 failed to find such a cluster. Thus, we can say that, by allowing predicted clusters to overlap with each other, more refined predictions are obtained.
Conclusion
In this work, we have proposed a novel samplingbased protein complex prediction method, RocSampler, which is a successor to PPSampler2. The major difference between them is that RocSampler exploits a regularization term for controlling overlaps of predicted clusters and PPSampler2 does not allow predicted clusters to overlap with each other. RocSampler also introduced a new proposal function for generating overlapping clusters and regularization terms refined from those of PPSampler2. We have shown that RocSampler outperforms five other methods on five different PPI datasets. RocSampler has succeeded in finding overlapping clusters from the Collins PPI dataset, but it has not done so from the other PPI datasets. Future work is required to identify the reason for this and to devise a new scoring function to attain higher performance and simultaneously to find overlapping clusters of proteins.
Declarations
Acknowledgements
Not applicable.
Funding
This work was supported by JSPS KAKENHI Grant Numbers JP26330330, JP17K00407. Publication costs were funded by JSPS KAKENHI Grant Number JP17K00407.
Availability of data and materials
WIPHI: https://application.wileyvch.de/contents/jc_2120/2007/pro200600448_s.html
Other PPI datasets: http://www.paccanarolab.org/static_content/clusterone/cl1_datasets.zip.
CYC2008: http://wodaklab.org/cyc2008/.
MCL: https://micans.org/mcl/.
SPICi: http://compbio.cs.princeton.edu/spici/.
ClusterONE: http://www.paccanarolab.org/clusterone/
NWE, PPSampler2, and RocSAmpler: http://imi.kyushuu.ac.jp/~om/
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 15, 2017: Selected articles from the 6th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume18supplement15.
Authors’ contributions
In the initial stage of this research project, YK participated in designing the computational methods, implementing the computer programs, executing the computational experiments, and analyzing the outputs. OM carried out all remaining tasks. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Brohée S, van Helden J. Evaluation of clustering algorithms for proteinprotein interaction networks. BMC Bioinformatics. 2006; 7:488.View ArticlePubMedPubMed CentralGoogle Scholar
 Li X, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010; 11(suppl 1):3.View ArticleGoogle Scholar
 Srihari S, Leong HW. A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinforma Comput Biol. 2013; 11:1230002.View ArticleGoogle Scholar
 Widita CK, Maruyama O. PPSampler2: Predicting protein complexes more accurately and efficiently by sampling. BMC Syst Biol. 2013; 7(Suppl 6):14.View ArticleGoogle Scholar
 Yong C, Maruyama O, Wong L. Discovery of small protein complexes from PPI networks with sizespecific supervised weighting. BMC Syst Biol. 2014; 8(Suppl 5):3.View ArticleGoogle Scholar
 Pu S, Wong J, Turner B, Cho E, Wodak SJ. Uptodate catalogues of yeast protein complexes. Nucleic Acids Res. 2009; 37:825–31.View ArticlePubMedGoogle Scholar
 Ruepp A, Waegele B, Lechner M, Brauner B, DungerKaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes2009. Nucleic Acids Res. 2010; 38:497–501.View ArticleGoogle Scholar
 Enright AJ, Dongen SV, Ouzounis CA. An efficient algorithm for largescale detection of protein families. Nucleic Acids Res. 2002; 30:1575–84.View ArticlePubMedPubMed CentralGoogle Scholar
 Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in proteinprotein interaction networks. Nat Methods. 2012; 9:471–2.View ArticlePubMedPubMed CentralGoogle Scholar
 Macropol K, Can T, Singh AK. RRW: Repeated random walks on genomescale protein networks for local cluster discovery. BMC Bioinformatics. 2009; 10:283.View ArticlePubMedPubMed CentralGoogle Scholar
 Maruyama O, Chihara A. NWE: Nodeweighted expansion for protein complex prediction using random walk distances. Proteome Sci. 2011; 9(Suppl 1):14.View ArticleGoogle Scholar
 Kobiki S, Maruyama O. ReSAPP: predicting overlapping protein complexes by merging multiplesampled partitions of proteins. J Bioinform Comput Biol. 2014; 12(6):1442004.View ArticlePubMedGoogle Scholar
 Jiang P, Singh M. SPICi: a fast clustering algorithm for large biological networks. Bioinformatics. 2010; 26:1105–11.View ArticlePubMedPubMed CentralGoogle Scholar
 Ikebata H, Yoshida R. Repulsive parallel mcmc algorithm for discovering diverse motifs from large sequence sets. Bioinformatics. 2015; 31(10):1561–68.View ArticlePubMedPubMed CentralGoogle Scholar
 Clauset A, Shalizi CR, Newman MEJ. Powerlaw distributions in empirical data. SIAM Rev. 2009; 51:661–703.View ArticleGoogle Scholar
 Yong CH, Wong L. From the static interactome to dynamic protein complexes: Three challenges. J Bioinforma Comput Biol. 2015; 13:1571001.View ArticleGoogle Scholar
 Maruyama O, Wong L. Regularizing predicted complexes by mutually exclusive proteinprotein interactions. In: Proc. of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015: 2015. p. 1068–75.Google Scholar
 Kiemer L, Costa S, Ueffing M, Cesareni G. WIPHI: A weighted yeast interactome enriched for direct physical interactions. Proteomics. 2007; 7:932–43.View ArticlePubMedGoogle Scholar
 Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FCP, Weissman JS, Krogan NJ. Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol Cell Proteomics. 2007; 6:439–50.View ArticlePubMedGoogle Scholar
 Gavin AC, et al.Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006; 440:631–6.View ArticlePubMedGoogle Scholar
 Krogan NJ, et al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature. 2006; 440:637–43.View ArticlePubMedGoogle Scholar
 Neubauer G, Gottschalk A, Fabrizio P, Séraphin B, Lührmann R, Mann M. Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc Natl Acad Sci U S A. 1997; 94:385–90.View ArticlePubMedPubMed CentralGoogle Scholar
 Legrain P, Seraphin B, Rosbash M. Early commitment of yeast premRNA to the spliceosome pathway. Mol Cell Biol. 1988; 8:3755–60.View ArticlePubMedPubMed CentralGoogle Scholar
 Gottschalk A, Neubauer G, Banroques J, Mann M, Lührmann R, Fabrizio P. Identification by mass spectrometry and functional analysis of novel proteins of the yeast [U4/U6,U5] trisnRNP. EMBO J. 1999; 18(16):4535–48.View ArticlePubMedPubMed CentralGoogle Scholar
 Stevens S, Abelson J. Purification of the yeast U4/U6,U5 small nuclear ribonucleoprotein particle and identification of its proteins. Proc Natl Acad Sci U S A. 1999; 96:7226–31.View ArticlePubMedPubMed CentralGoogle Scholar