Detecting temporal protein complexes from dynamic proteinprotein interaction networks
 Le OuYang^{1},
 DaoQing Dai^{1}Email author,
 XiaoLi Li^{2}Email author,
 Min Wu^{2},
 XiaoFei Zhang^{3} and
 Peng Yang^{2}
https://doi.org/10.1186/1471210515335
© OuYang et al.; licensee BioMed Central Ltd. 2014
Received: 10 May 2014
Accepted: 23 September 2014
Published: 4 October 2014
Abstract
Background
Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization.
Results
In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of timecourse gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TSOCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TSOCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points.
Conclusions
Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the stateoftheart complex detection techniques.
Keywords
Background
With the technological advances in highthroughput screening techniques, largescale proteinprotein interaction (PPI) data have been generated and catalogued for many species [1–3]. Proteins seldom act alone, and they often bind together to form complexes to carry out their biological functions [4–6]. Comprehensive investigation of protein complexes could help to reveal the structure of PPI networks, predict protein functions and elucidate cellular mechanisms underlying various diseases [7]. Computational detection of complexes has thus attracted tremendous attentions during the past decade [6, 8–14].
According to their life time, PPIs could be classified into stable or transient PPIs [15, 16]. Stable PPIs which are important in maintaining the cell fitness and stability are usually permanent and irreversible. Meanwhile, transient PPIs can associate and dissociate temporarily, and thus they provide a mechanism for the cell to quickly respond to extracellular stimuli. As physical interactions determined by popular highthroughput technologies, e.g. yeast twohybrid (Y2H) and Tandem Affinity Purification with mass spectrometry (TAPMS) lack of temporal information, majority of existing complex detection methods treat the PPI network as a static network that can not be used to detect temporal protein complexes. In reality, however, cellular systems are highly dynamic and responsive to environmental cues [17]. The real PPI network in cell keeps changing over different stages of the cell cycle [18], leading to multiple dynamic protein interaction networks. As such, it is desirable to design novel computational methods that can take the inherent dynamic characteristics of PPI networks into consideration to better detect temporal protein complexes.
Nevertheless, the advent of DNA microarray technologies has enabled the differential expressions of thousands of genes under various experimental conditions to be monitored simultaneously and quantitatively [19, 20], which provides the useful temporal information to complement the static protein interaction data in the gene level. There have been some attempts to investigate the temporal properties for individual proteins and protein interactions by integrating PPI data with timecourse gene expression data [21–29]. For example, in [22], the authors proposed a threesigma principle to identify active time points for individual proteins. They further investigated the temporal protein associations and protein state transition on the identified active time points.
Temporal protein complexes are typically constructed by the dynamic assembly or disassembly of proteins to perform various biological functions. Tracking the temporal protein complexes could reveal important insights into dynamic modular mechanisms and improve our understanding on the disease pathways etc [23, 30]. To detect temporal protein complexes, we need to leverage the temporal information from gene expression data to construct timeevolving dynamic protein interaction networks. In [31], the authors incorporated the "time" factor for proteins in the form of cellcycle phases into the analysis of complexes and studied the temporal phenomena of complex assembly and disassembly across various cell cycles. Wang et al. identified temporal protein complexes from the dynamic PPI networks by applying static complex detection methods (e.g., MCL) for each time point [22]. In [28], the authors proposed DHAC (Dynamical Hierarchical Agglomerative Clustering) complex mining method, to detect temporal complexes from individual dynamic PPI networks.
We observe that the above few methods for predicting temporal protein complexes suffer from the following two major limitations. Firstly, their methods just focus on the individual dynamic PPI networks and fully ignore the correlations between the networks at consecutive time points. Note that while there are different temporal complexes occur at different time points, many protein complexes will still form stable macromolecular complexes to perform their important biological functions [21]. As many stable interactions that perform fundamental roles for the cell are conserved across different time points, the corresponding complexes will also occur in multiple consecutive dynamics PPI networks and they should thus change smoothly across time [24, 26], to maintain the cell fitness and stability as well as to avoid the adverse disruption of the basic operations of the cell. These existing methods, however, have overlooked the smoothness of the temporal complexes at different time points and simply apply static complex detection methods for each individual dynamic PPI network. Secondly, as multifunctional proteins are often involved in different complexes, it is highly desirable to discover overlapping complexes to better decipher the inherent overlapping modular structures of PPI networks. However, existing methods, namely DHAC and MCL, do not generate the overlapping protein complexes and they are thus less accurate.
To address the above two issues, in this paper we propose a novel technique to detect temporal protein complexes from the dynamic PPI networks. We first construct a series of dynamic PPI networks by detecting stable interactions and transient interactions by integrating protein interaction data and gene expression data. Particularly, the stable interactions are reserved across different time points to serve as the backbone of the protein interaction networks, while the existence of a transient interaction at a certain time point depends on the specific activities and functions required from the two associated proteins. Then, based on the concept of overlapping temporal communities [32], we propose a novel Time Smooth Overlapping Complex Detection model (TSOCD) to detect overlapping temporal protein complexes from the constructed dynamic PPI networks, which allows individual complex to grow and shrink across different time points. Finally, a Nonnegative Matrix Factorization (NMF) based method is introduced to effectively merge those very similar temporal complexes across time and track their evolutionary process. We have performed extensive experiments to evaluate the performance of our TSOCD model. Experimental results show that TSOCD is able to achieve significantly better results than the stateoftheart algorithms for detecting protein complexes. Moreover, our algorithm is accessible as a tool, which could be downloaded from http://mail.sysu.edu.cn/home/stsddq@mail.sysu.edu.cn/dai/others/TSOCD.zip.
Methods
In this section, we first present how to construct dynamic PPI networks, and subsequently introduce how to detect overlapping temporal protein complexes from the constructed dynamic PPI networks.
Constructing dynamic PPI networks
The dynamic proteinprotein interaction networks (DPPI networks) are constructed by integrating timecourse gene expression data with static PPI networks. A static PPI network is often modelled as an undirected graph G = (V,E), where V consists of V = N proteins and E consists of E edges (protein interactions under different conditions between two proteins in V). The timecourse gene expression data of these N proteins across T time points are represented by a N × T matrix GE, which represents the expression level of N genes across T time points.
Now, we infer a DPPI network for each time point from GE and G. Existing methods construct DPPI networks solely by determining the peak time points of expression for each protein [22] and the connections among the networks at different time points are ignored. To address this problem, we first extract stable protein interactions from G, which are supposed to appear at all time points, as they are encoded by globally coexpressed gene pairs [27]. Particularly, for each protein interaction in G, we calculate their Pearson Correlation Coefficient (PCC) based on their gene expression profiles across all time points in GE. Then the protein interactions with PCC values greater than a certain cutoff δ are defined as stable interactions due to their corresponding globally coexpressed genes (we will discuss how to determine the value of δ in next section). These stable interactions represent the static part of the DPPI networks and are likely to be reserved across all time points. Note a N × N symmetric matrix S is introduced to indicate the stable interactions in the given PPI network G = (V,E), where S_{ ij } = 1 if protein i and j has a stable interaction, i.e. e_{ ij } ∈ E and PCC(e_{ ij }) > δ; S_{ ij } = 0 otherwise.
where $u(i)=\frac{1}{T}{\sum}_{t=1}^{T}{\mathit{\text{GE}}}_{\mathit{\text{it}}}$ and σ(i) are the algorithm mean and standard deviation of the expression values over times 1 to T for protein i respectively, and F(i) = 1/(1 + σ^{2}(i)) is a weight function which reflects the fluctuation of the expression values of protein i. For more details, please refer to [22]. For each edge in the static PPI network (i.e., e_{ ij } ∈ E), it is presented at time point t if proteins i and j are in their active states (i.e., GE_{ it } ≥ AT(i) and GE_{ jt } ≥ A T(j)). The dynamic PPI networks can be represented by a set of graphs, G^{(t)} = (V,E^{(t)}), t = 1,…,T, where V denotes the original set of proteins and E^{(t)} represents the set of edges presented at time point t. Particularly, edge ${e}_{\mathit{\text{ij}}}^{(t)}\in {E}^{(t)}$ if S_{ ij } = 1 (i.e. stable interaction) or e_{ ij } ∈ E, G E_{ it } ≥ AT(i) and GE_{ jt } ≥ A T(j) (i.e. transient interaction). For each dynamic PPI network G^{(t)}, ${A}^{(t)}=\left[{A}_{\mathit{\text{ij}}}^{(t)}\right]\in {\{0,1\}}^{N\times N}$ is introduced to represent its adjacency matrix, where ${A}_{\mathit{\text{ij}}}^{(t)}=1$ if ${e}_{\mathit{\text{ij}}}^{(t)}\in {E}^{(t)}$ and ${A}_{\mathit{\text{ij}}}^{(t)}=0$ otherwise.
Detecting overlapping temporal protein complexes
Our objective is to infer D^{(t)}(1 ≤ t ≤ T), a sequence of timeevolving protein complexes, from the dynamic networks G^{(t)}(1 ≤ t ≤ T). Let ${D}^{(t)}=\left\{{D}_{k}^{(t)},k=1,\dots ,{r}_{t}\right\}$ contains r_{ t } predicted complexes at time point t. We define a N × r_{ t } proteincomplex assignment matrix H^{(t)} to indicate the membership of proteins in complexes, where ${H}_{\mathit{\text{ik}}}^{(t)}=1$ if protein i belongs to a complex ${D}_{k}^{(t)}\left(\text{i.e.}\phantom{\rule{2.77626pt}{0ex}}i\in {D}_{k}^{(t)}\right)$, and ${H}_{\mathit{\text{ik}}}^{(t)}=0$ otherwise. Here we allow overlapping proteins occur in multiple protein complexes simultaneously, i.e. ${H}_{\mathit{\text{ik}}}^{(t)}=1$, ${H}_{\mathit{\text{iz}}}^{(t)}=1$, and k ≠ z. Obviously, if we can compute H^{(t)}, we can easily infer D^{(t)}.
We further introduce another N × N matrix U^{(t)}, where each element ${U}_{\mathit{\text{ij}}}^{(t)}$ is the number of predicted complexes in D^{(t)} which contain both proteins i and j, i.e., ${U}_{\mathit{\text{ij}}}^{(t)}=\left\left\{{D}_{k}^{(t)}\in {D}^{(t)}:i\in {D}_{k}^{(t)},j\in {D}_{k}^{(t)},1\le k\le {r}_{t}\right\}\right$. Clearly, U^{(t)} represents the cocomplex membership among proteins at the time point t, which allows a protein to belong to more than one complex. Meanwhile, we have ${U}_{\mathit{\text{ij}}}^{(t)}=\sum _{k=1}^{{r}_{t}}{H}_{\mathit{\text{ik}}}^{(t)}{H}_{\mathit{\text{jk}}}^{(t)}$.
Model formulation
In order to predict D^{(t)}, we first infer U^{(t)} from the dynamic networks G^{(t)},1 ≤ t ≤ T. Particularly, We study the following three factors that are relevant for estimating U^{(t)}.
Secondly, stable interactions are preserved across all the dynamic PPI networks, whereas the transient interactions only present at some special time points and absent at the other time points. Therefore, we introduce a smoothness regularization term R to enforce the stable interactions (with S_{ ij } = 1) and their corresponding complex membership ${U}_{\mathit{\text{ij}}}^{(t)}$ in U^{(t)} to change smoothly over time, rather than change dramatically between two consecutive time points. Here, the smooth regularization term ${R}_{t}=\sum _{i,j}{S}_{\mathit{\text{ij}}}{\left({U}_{\mathit{\text{ij}}}^{(t+1)}{U}_{\mathit{\text{ij}}}^{(t)}\right)}^{2}$ shows the temporal smoothness between ${U}_{\mathit{\text{ij}}}^{(t)}$ and ${U}_{\mathit{\text{ij}}}^{(t+1)}$. Correspondingly, $R={\sum}_{t=1}^{T1}{R}_{t}$ measures the overall smoothness across all time points.
Finally, as ${U}_{\mathit{\text{ij}}}^{(t)}={\sum}_{k=1}^{{r}_{t}}{H}_{\mathit{\text{ik}}}^{(t)}{H}_{\mathit{\text{jk}}}^{(t)}$, the rank of matrix U^{(t)} cannot be larger than the number of complexes r_{ t }. As we have no prior knowledge on r_{ t }, a low rank restriction for each U^{(t)} is thus needed during estimating U^{(t)}. In this paper, we use the trace norm constraint ∥U^{(t)}∥_{∗} as a relaxation of the low rank constraint [32], which prevents our model from producing too many complexes and controls the overlaps among complexes. In particular, ∥U^{(t)}∥_{∗} is the sum of singular values of U^{(t)}. According to the definition, it is easy to obtain ${\parallel {U}^{(t)}\parallel}_{\ast}={\parallel {H}^{(t)}\parallel}_{F}^{2}$, where ∥·∥_{ F } denotes Frobenius norm.
Temporal protein complex detection
where λ ≥ 0 and β ≥ 0 are the tradeoff parameters that control the balance among the three factors. The optimization problem (4) is combinatorial as U^{(t)} specifies all the possible cocomplex memberships among proteins at the time point t. As such, exhaustive search is impractical since there are exponentially many possible combinations. To address this problem, we relax the constrains of U^{(t)} and H^{(t)} from integers ( ) to real numbers with U^{(t)} ≥ 0 and H^{(t)} ≥ 0.
Here, ${{H}^{(t)}}_{\mathit{\text{ik}}}^{\star}=1$ represents protein i is in predicted complex k at time point t while ${{H}^{(t)}}_{\mathit{\text{ik}}}^{\star}=0$ denotes protein i is not in predicted complex k. In this study, the value of τ is set to 0.3, the same as in [14] (In next section, we will discuss how changing this parameter can affect the final results). In addition, we only consider predicted complexes with at least three proteins [12]. The detailed TSOCD algorithm of identifying temporal protein complexes is illustrated in Additional file 1: Figure S4.
Merging temporal protein complexes
Since the dynamic PPI networks, G^{(t)}(1 ≤ t ≤ T), contain a considerable fraction of stable interactions, some complexes detected across different time points will be quite similar. Thus we needed to merge those similar complexes to generate a final set of predicted complexes. Note we will only match and merge those very similar complexes but still maintain those timespecific complexes that occur only at certain dynamic PPI networks.
In this paper, we use a Nonnegative Matrix Factorization (NMF) model to merge similar temporal protein complexes, which provides a low rank approximation of a nonnegative matrix and has been widely used as a clustering method [35, 36]. After we compute a series of proteincomplex assignment matrices $\mathcal{H}=\left\{{{H}^{(1)}}^{\star},\dots ,{{H}^{(T)}}^{\star}\right\}$, a combined proteincomplex assignment matrix Y is defined as Y = [H^{(1)}^{⋆},…,H^{(T)}^{⋆}]. According to this definition, matrix Y = [Y_{ il }] ∈ {0,1}^{N×L} contains N rows and L = r_{1} + … + r_{ T } columns, each of which represents a complex detected at the corresponding time point, where Y_{ il } = 1 if protein i belongs to complex l and Y_{ il } = 0 otherwise. Our objective is to detect similar complexes from Y.
Results and discussion
In this section, we will first introduce the data, evaluation metrics and parameter settings. Then, we will present detailed experimental results.
Data, evaluation metrics and parameter settings
Protein interaction networks and time course gene expression data
Two yeast PPI networks have been employed for evaluating the performance of various complex detection methods, including 1) DIP PPI network [38], and 2) BioGrid PPI network (version 3.1.77) [39]. DIP data contain 21592 interactions among 4850 proteins, while BioGrid contain 59748 interactions among 5640 proteins.
Note that both DIP and BioGrid are aggregates of protein interactions obtained under different conditions or time points. In order to extract dynamic PPI networks from these datasets, we have used yeast metabolic cycle (YMC) gene expression microarrays [40] to infer stable and transient interactions. YMC reports the expression values for 3552 significant periodic genes [40] at 12 time points (i.e. T = 12 in our experiments, there are about 25 minutes per each time interval) over three successive cycles. The raw data are available on Gene Expression Omnibus (GEO) [41] with the accession number GSE3431. Similar to [22], in our experiment, the average expression value of each gene at the same time point of three cycles is used as its expression value at that time point. Among the 3552 genes, 2389 occur in DIP and 3057 occur in BioGrid. Thus, we retain these genes and their corresponding interactions in DIP and BioGrid respectively.
Gold standard protein complexes
To measure whether the predicted complexes match with known experimentally determined protein complexes, we have chosen two benchmark complex sets as our gold standard. They are derived from CYC2008 [42] and MIPS [43] respectively. For both gold standard sets, to avoid selection bias, we filter out the proteins that are not involved in the two PPI networks. Moreover, we only consider complexes with at least 3 proteins.
Metrics
We utilize two independent quality criteria, namely PR metric [44] and fmeasure [6], to evaluate the performance of various complex detection methods. Among these two measures, PR metric judge how well the predicted complexes match with known complexes mainly by considering the percentage of their overlapping proteins. fmeasure is the harmonic mean of recall and precision where recall measures how many known gold standard complexes are matched by the predicted complexes, while precision measures how many predicted complexes are matched with known complexes. The two metrics have complementary strengths and they could thus evaluate the prediction performance from different perspectives. In addition, they all give a value in the range of 01, where the higher values indicate the better performance.
Please refer to the Additional file 1 for more detailed description about the two PPI networks, two gold standard complex sets, as well as two evaluation metrics.
Parameter setting
When extracting dynamic PPI networks from given static PPI networks, we distinguish stable interactions from transient interactions by calculating the PCCs of their associated gene pairs’ expression values across all time points (i.e., PCC(e_{ ij })). Physical interactions with PCC values greater than a certain cutoff δ are defined as stable interactions. To determine the cutoff threshold, we use the PCC values of all the physical interactions and fit the PCC distribution with two parametric distributions, assuming one from the stable interactions and the other from the transient interactions.
Both TSOCD and NMF need to define the number of complexes, i.e. {r_{1},…,r_{ T }} and K. With our low rank constrain of each U^{(t)}, we can give TSOCD a relatively large values of r_{ t } since the model could adaptively control the number of generated complexes. When merging similar temporal protein complexes via nonnegative matrix factorization, similar complexes likely to associate with same latent index and irrelevant latent indexes always obtain lower associations. As such, the value of K could also be relatively large since irrelevant dimensions will be filtered out. In this study, the values for r_{ t } (t = 1,…,T), and K are set to 1000 since our algorithm is not sensitive to their values.
Recall that TSOCD has three parameters τ, λ and β where τ is the threshold parameter, λ and β control the effects of the smooth regularization term R and low rank constrain respectively. To fully understand how these three parameters affect the performance of TSOCD, we perform the sensitivity studies. Particularly, we first keep τ = 0.3 and run TSOCD with different combination values of λ(λ ∈ {2^{7},2^{6},…,2^{1}}) and β(β ∈ {2^{0},2^{1},…,2^{6}}) and assess how well the predicted complexes match with gold standard sets. Then we fix the values of λ and β which result in the best performance, and study the effect of τ on the performance of TSOCD by setting τ = 0.1,0.2,…,0.6, respectively. Moreover, in order to verify the generalization of TSOCD, we select their best parameter values by testing the performance of TSOCD on DIP and BioGrid in terms of fmeasure with respect to the reference set MIPS. Therefore, the performance of TSOCD on DIP and BioGrid with respect to the other reference set CYC2008 can well validate the general performance of TSOCD.
Comparison with static complex detection methods
In order to demonstrate the benefits of using our constructed dynamic PPI networks, we compare our proposed TSOCD method with five stateoftheart algorithms, namely ClusterONE [12], MCL [8], MINE [45], COACH [46] and SPICi [47], which are originally designed for detecting protein complexes from static PPI networks. We apply these five algorithms on available static PPI networks (full PPI networks which are assembled by stable interactions and transient interactions) and apply TSOCD on our constructed dynamic PPI networks respectively, and evaluate the predicted complexes in terms of two metrics with respect to two gold standards. Note optimal parameters are set for MCL, MINE, COACH and SPICi to generate their best results (in terms of fmeasure with respect to MIPS and CYC2008) while ClusterONE has used the default parameters set by the authors. For detailed parameter settings of these five algorithms, please refer to Additional file 1.
Comparative results of various algorithms on two PPI networks using CYC2008 as benchmark
Network  Algorithm  # complexes  avg size  std  CYC2008  

Precision  Recall  fmeasure  
BioGrid  ClusterONE  260  5.97  4.93  0.312  0.655  0.422 
SPICi  136  6.06  5.23  0.294  0.448  0.355  
MCL  264  8.53  29  0.167  0.405  0.236  
COACH  182  7.94  7.48  0.324  0.560  0.411  
MINE  219  6.06  7.48  0.311  0.655  0.421  
OCD  209  6.62  6.19  0.332  0.647  0.439  
TSOCD  606  7.42  6.95  0.363  0.741  0.487  
DIP  ClusterONE  166  4.18  1.56  0.301  0.447  0.360 
SPICi  78  4.26  1.22  0.453  0.417  0.435  
MCL  338  5.21  5.04  0.163  0.592  0.255  
COACH  151  4.45  2.12  0.305  0.544  0.390  
MINE  121  4.03  1.91  0.355  0.505  0.417  
OCD  82  4.24  1.93  0.415  0.417  0.416  
TSOCD  254  3.99  1.43  0.429  0.524  0.472 
As shown in Figure 6, for both DIP and BioGrid, our TSOCD outperforms other 5 existing methods in terms of two metrics based on the benchmark CYC2008 (we have similar results with respect to MIPS benchmark in Additional file 1: Figure S2). For instance, on DIP data, TSOCD achieves the highest fmeasure 0.472, which is 8.5% higher than the second best fmeasure 0.435, achieved by SPICi. On BioGrid data, TSOCD also achieves the highest fmeasure 0.487, which is 15.4% higher than the second best fmeasure 0.422 achieved by ClusterONE. In Table 1, we can find that TSOCD achieves a good performance due to its high recall and precision. Additional file 1: Table S1 in the Additional files shows similar results with respect to the MIPS benchmark. Interestingly, we also observe that OCD achieves better performance than the above 5 existing algorithms on both DIP and BioGrid data. Thus, even without using timecourse gene expression information, our method could also be utilized to better detect complexes from static PPI networks. On the other hand, by taking into account the temporal gene expression data to construct dynamic PPI networks, our method is able to capture timeevolving protein complexes and thus detect complexes much more accurately.
Comparison with dynamic complex detection methods
Recently, Park et al.[28] proposed Dynamical Hierarchical Agglomerative Clustering (DHAC) method to detect protein complexes from dynamic PPI networks, with two different versions, i.e. DHACconst and DHAClocal. The existing methods, such as ClusterONE, SPICi, MCL, COACH and MINE, can also be adapted to handle each of the dynamic PPI networks across different time points. For fair comparison, we have also applied nonnegative matrix factorization (NMF) to merge those clusters predicted by each method into their own final predicted complex results.
Besides NMF, there are also some other algorithms that could be used to merge those similar complexes. Another widely used method is based on the overlap between different complexes. To study the effectiveness of NMF in merging those similar complexes, we also apply the reduction strategy proposed by Wang et al.[22] to merge those similar complexes. Since their method is based on the overlap between different complexes, how to decide the value of the similarity threshold is an important problem. In this study, the similarity threshold is set to be 0.65 as recommended by the authors. For more details about the reduction strategy proposed by Wang et al., please refer to [22]. The results of using the reduction strategy proposed by Wang et al. are shown in Additional file 1: Table S3. We can find from Additional file 1: Table S3 that TSOCD can still achieve the best performance. Furthermore, we could find that NMF is more accurate in merging those similar complexes, since better precision and recall are obtained when using NMF as the reduction strategy.
Detecting multifunctional proteins
Protein complexes predicted by various methods can be used for protein function prediction [48] – a unknown protein can be assigned with its involved complex’s functions. However, multifunctional proteins carry out different functions by interacting with different partners at different time points [11]. It is thus a challenging task for traditional complex detection methods to predict multifunctional proteins based on the static view of PPI networks, which cannot reflect the dynamic nature of real PPI networks. Our proposed TSOCD method, on the other hand, can handle this task well, as it is specially designed to detect timeevolving overlapping protein complexes by integrating PPI data with temporal gene expression data. Next, we present an interesting case study to show how the complexes predicted by our method help to detect and analyze multifunctional proteins.
Moreover, when running SPICi on dynamic BioGrid PPI networks, it predicts two different complexes with YOR210W, i.e., {YOR210W, YOR224C, YGR005C, YOR341W, YOR340C, YDR156W, YJL148W} and {YJL164C, YER125W, YHL024W, YOR151C, YOL005C, YGR005C, YOR210W, YOR224C}. These two complexes match with both the RNA polymerase I and II complexes. Recall that SPICi can only generate one cluster based on the static BioGrid data involving YOR210W. Hence, dynamic networks indeed provide us with more insights into the proteins’ temporal activities for dynamic complex formation.In addition, as shown in Figure 8(c), TSOCD predicts a novel protein YIR010W for both DNAdirected RNA polymerase I and III complexes. As protein YIR010W interacts with most members in RNA polymerase I, and all members in RNA polymerase III, we infer that YIR010W is likely to be multifunctional and highly related to RNA polymerase. By checking and browsing the literature, we find that YIR010W is a component of MIND kinetochore complex which is required for correct chromosome alignment and is related to the assembly of the RNA polymerase complex.
Conclusion
In real biological environments, protein interaction networks are not static – they dynamically change across different time points [29]. Many existing protein complex mining methods, however, detect protein complexes from the overly simplified static PPI network model, which can not capture the inherent dynamic nature of protein interactions as well as modular temporal protein complexes.
Temporal protein complexes are typically constructed by the dynamic assembly or disassembly of proteins to perform various biological functions [49]. As they can better reflect the realworld dynamic molecular mechanisms inside the cellular systems, it is thus crucial to detect them by systematically analyzing dynamic PPI networks. Although a few methods have been proposed to identify temporal protein complexes by applying static complex detection methods for each individual time point, they fully ignore the correlations between the consecutive dynamic protein networks and thus cannot work well. In addition, these methods can not generate overlapping protein complexes and they do not reflect the biological observation that proteins frequently involve in multiple protein complexes [6] to play diverse biological functions.
To address these problems, in this study, we introduce a novel Time Smooth Overlapping Complex Detection model (TSOCD) to detect overlapping temporal protein complexes from the dynamic PPI networks. Particularly, we construct a series of dynamic PPI networks by detecting stable interactions and transient interactions via integrating protein interaction data and gene expression data. Our proposed TSOCD allows individual complex to be assembled and disassembled across different time points. Furthermore, with the smoothness regularization term, our model can detect conserved protein complexes that play fundamental roles in cellular systems. The analysis on real biological data shows that our proposed TSOCD significantly outperforms existing stateoftheart temporal complex detection methods. Furthermore, with the constructed dynamic PPI networks, our method could detect multifunctional proteins more correctly. All the experimental results, including the predicted stable complexes and temporal complexes, are shown in Additional files 2 and 3. We also investigate the benefits of using the smoothness regularization term by comparing the performance of our model without the smoothness regularization term. Our experimental results show that with the smoothness constrain, our method could detect temporal protein complexes more accurately, as we can better consider the conserved protein interactions between the consecutive networks. The detailed comparison are shown in Additional file 1.
In summary, compared with existing methods, our model has the following advantages:

We have distinguished two different types of protein interactions for constructing dynamic PPI networks. In particular, the stable interactions are reserved across different time points to serve as the backbone of the protein interaction networks, while transient interactions are only presented under certain conditions and thus occurred in dynamic part of PPI networks.

It allows the dynamic assembly process, i.e. individual complex to be assembled and disassembled across different time points. In addition, with smoothness regularization, it prevents the value of the assigned cocomplex similarity for proteins with stable interactions from changing too dramatically.

It generates the overlapping temporal protein complexes, which clearly reflect the biological reality on proteins’ multifunctional roles.

Finally, our proposed method is unsupervised and thus is generic enough to apply for the dynamic complex detection of other species.
The computational complexity for updating H^{(t)} is O (N^{2}r_{ t }), where N is the number of proteins, and r_{ t } is the number of complexes at time t. Thus the overall time cost of TSOCD is O (N^{2}(r_{1} + … + r_{ T })I), where T is the number of time points and I is the number of iterations. In practice the time cost will be much smaller since H^{(t)} is sparse and the number of proteins at each time point is less than N.
Applying our proposed TSOCD method on dynamic PPI networks could effectively track the underlying dynamic modular organization and provide a new biological knowledge and insights about the molecular systems. In this study, we use timecourse gene expression data to help construct dynamic PPI networks since it is one of the most abundant data that include the temporal information of proteins in the gene level. However, as it contains noisy information, the performance of our proposed algorithm could be limited by its poor quality. Moreover, there are a few of other related information sources, including a collection of genomics, functional genomics, genetics studies and their corresponding result datasets, biological pathway databases, cellar compartment information and biomedical ontologies. As such, in our future work, we will study how to reduce the noise in the gene expression data as well as to incorporate other biological evidences for constructing more accurate dynamic PPI networks that could lead to further performance improvements for detecting temporal protein complexes.
Declarations
Acknowledgements
This work is supported by the National Science Foundation of China [11171354, 61375033 and 61402190 to LOY, DQD, XFZ], the Ministry of Education of China [20120171110016 to LOY, DQD, XFZ], the Natural Science Foundation of Guangdong Province [S2013020012796 to LOY, DQD, XFZ], the International Program Fund of 985 Project, Sun Yatsen University.
Authors’ Affiliations
References
 Yu H, Braun P, Yıldırım MA, Lemmens I, Venkatesan K, Sahalie J, HirozaneKishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, et al: Highquality binary protein interaction map of the yeast interactome network. Science. 2008, 322 (5898): 104110. 10.1126/science.1158684.View ArticlePubMed CentralPubMedGoogle Scholar
 Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631636. 10.1038/nature04532.View ArticlePubMedGoogle Scholar
 Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, PeregrínAlvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Mand Vlasblom CJM, Wu S, Orsi C, Collins SR, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637643. 10.1038/nature04670.View ArticlePubMedGoogle Scholar
 Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, et al: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141147. 10.1038/415141a.View ArticlePubMedGoogle Scholar
 Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. Proc Nat Acad Sci USA. 2003, 100 (21): 1212312128. 10.1073/pnas.2032324100.View ArticlePubMed CentralPubMedGoogle Scholar
 Li X, Wu M, Kwoh CK, Ng SK: Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010, 11 (Suppl 1): S310.1186/1471216411S1S3.View ArticlePubMed CentralPubMedGoogle Scholar
 Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S: A human phenomeinteractome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007, 25 (3): 309316. 10.1038/nbt1295.View ArticlePubMedGoogle Scholar
 Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for largescale detection of protein families. Nucleic Acids Res. 2002, 30 (7): 15751584. 10.1093/nar/30.7.1575.View ArticlePubMed CentralPubMedGoogle Scholar
 Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003, 4: 210.1186/1471210542.View ArticlePubMed CentralPubMedGoogle Scholar
 Cho YR, Hwang W, Ramanathan M, Zhang A: Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinformatics. 2007, 8: 26510.1186/147121058265.View ArticlePubMed CentralPubMedGoogle Scholar
 Becker E, Robisson B, Chapple CE, Guénoche A, Brun C: Multifunctional proteins revealed by overlapping clustering in protein interaction network. Bioinformatics. 2012, 28: 8490. 10.1093/bioinformatics/btr621.View ArticlePubMed CentralPubMedGoogle Scholar
 Nepusz T, Yu H, Paccanaro A: Detecting overlapping protein complexes in proteinprotein interaction networks. Nat Methods. 2012, 9 (5): 471472. 10.1038/nmeth.1938.View ArticlePubMed CentralPubMedGoogle Scholar
 Zhang XF, Dai DQ, OuYang L, Wu MY: Exploring overlapping functional units with various structure in protein interaction networks. PloS ONE. 2012, 7 (8): e4309210.1371/journal.pone.0043092.View ArticlePubMed CentralPubMedGoogle Scholar
 OuYang L, Dai DQ, Zhang XF: Protein complex detection via weighted ensemble clustering based on Bayesian nonnegative matrix factorization. PloS ONE. 2013, 8 (5): e6215810.1371/journal.pone.0062158.View ArticlePubMed CentralPubMedGoogle Scholar
 Wang J, Peng X, Peng W, Wu FX: Dynamic protein interaction network construction and applications. Proteomics. 2014, 14 (4–5): 338352.View ArticlePubMedGoogle Scholar
 Nooren I, Thornton JM: Diversity of protein–protein interactions. EMBO J. 2003, 22 (14): 34863492. 10.1093/emboj/cdg359.View ArticlePubMed CentralPubMedGoogle Scholar
 Xiao Q, Wang J, Peng X, Wu FX: Detecting protein complexes from active protein interaction networks constructed with dynamic gene expression profiles. Proteome Sci. 2013, 11 (Suppl 1): S2010.1186/1477595611S1S20.View ArticlePubMed CentralPubMedGoogle Scholar
 Przytycka TM, Singh M, Slonim DK: Toward the dynamic interactome: it’s about time. Brief Bioinformatics. 2010, 11: 1529. 10.1093/bib/bbp057.View ArticlePubMed CentralPubMedGoogle Scholar
 Lo K, Raftery A, Dombek K, Zhu J, Schadt E, Bumgarner R, Yeung K: Integrating external biological knowledge in the construction of regulatory networks from timeseries expression data. BMC Syst Biol. 2012, 6: 10110.1186/175205096101.View ArticlePubMed CentralPubMedGoogle Scholar
 Li XL, Tan YC, Ng SK: Systematic gene function prediction from gene expression data by using a fuzzy nearestcluster method. BMC Bioinformatics. 2006, 7 (Suppl 4): S2310.1186/147121057S4S23.View ArticlePubMed CentralPubMedGoogle Scholar
 Han JDJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature. 2004, 430 (6995): 8893. 10.1038/nature02555.View ArticlePubMedGoogle Scholar
 Wang J, Peng X, Li M, Pan Y: Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics. 2013, 13 (2): 301312. 10.1002/pmic.201200277.View ArticlePubMedGoogle Scholar
 Yu H, Lin CC, Li YY, Zhao Z: Dynamic protein interaction modules in human hepatocellular carcinoma progression. BMC Syst Biol. 2013, 7 (5): 113.Google Scholar
 Song L, Kolar M, Xing EP: KELLER: estimating timevarying interactions between genes. Bioinformatics. 2009, 25 (12): i128—i136View ArticlePubMed CentralPubMedGoogle Scholar
 Ahmed A, Xing EP: Recovering timevarying networks of dependencies in social and biological studies. Proc Nat Acad Sci. 2009, 106 (29): 1187811883. 10.1073/pnas.0901910106.View ArticlePubMed CentralPubMedGoogle Scholar
 Du N, Zhang Y, Li K, Gao J, Mahajan SD, Nair BB, Schwartz SA, Zhang A: Evolutionary analysis of functional modules in dynamic PPI networks. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, 7–10 October, 2012; Orlando, Florida. 2012, New York: ACM, 250257.Google Scholar
 Das J, Mohammed J, Yu H: Genomescale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics. 2012, 28 (14): 18731878. 10.1093/bioinformatics/bts283.View ArticlePubMed CentralPubMedGoogle Scholar
 Park Y, Bader JS: How networks change with time. Bioinformatics. 2012, 28 (12): i40i48. 10.1093/bioinformatics/bts211.View ArticlePubMed CentralPubMedGoogle Scholar
 Kim Y, Han S, Choi S, Hwang D: Inference of dynamic networks using timecourse data. Brief Bioinformatics. 2014, 15 (2): 212228. 10.1093/bib/bbt028.View ArticlePubMedGoogle Scholar
 Vinayagam A, Hu Y, Kulkarni M, Roesel C, Sopko R, Mohr SE, Perrimon N: Protein complexbased analysis framework for highthroughput data sets. Sci Signal. 2013, 6 (264): rs5View ArticlePubMed CentralPubMedGoogle Scholar
 Srihari S, Leong HW: Temporal dynamics of protein complexes in PPI networks: a case study using yeast cell cycle dynamics. BMC Bioinformatics. 2012, 13 (Suppl 17): S16PubMed CentralPubMedGoogle Scholar
 Chen Y, Kawadia V, Urgaonkar R: Detecting overlapping temporal community structure in timeevolving networks. arXiv preprint arXiv:1303.7226 2013Google Scholar
 Ball B, Karrer B, Newman M: Efficient and principled method for detecting communities in networks. Phys Rev E. 2011, 84 (3): 036103View ArticleGoogle Scholar
 Lee DD, Seung HS: Algorithms for nonnegative matrix factorization. Adv Neural Inf Process Syst. 2001, Cambridge: The MIT Press, 556562.Google Scholar
 Lee D, Seung H: Learning the parts of objects by nonnegative matrix factorization. Nature. 1999, 401 (6755): 788791. 10.1038/44565.View ArticlePubMedGoogle Scholar
 Ding C, He X, Simon H: On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM International Conference on Data Mining (SDM’05). 2005, Philadelphia: Society for Industrial and Applied Mathematics, 606610.View ArticleGoogle Scholar
 Schmidt MN, Laurberg H: Nonnegative matrix factorization with Gaussian process priors. Comput Intell Neurosci. 2008, 2008: 3Google Scholar
 Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The database of interacting proteins: 2004 update. Nucleic Acids Res. 2004, 32 (suppl 1): D449—D451PubMed CentralPubMedGoogle Scholar
 Chatraryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O’Donnell L, Reguly T, Breitkreutz A, Sellam A, Chen D, Chang C, Rust J, Livstone M, Oughtred R, Dolinski K, Tyers M: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013, 41 (D1): D816D823. 10.1093/nar/gks1158.View ArticlePubMed CentralPubMedGoogle Scholar
 Tu BP, Kudlicki A, Rowicka M, McKnight SL: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science. 2005, 310 (5751): 11521158. 10.1126/science.1120499.View ArticlePubMedGoogle Scholar
 Edgar R, Domrachev M, Lash AE: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207210. 10.1093/nar/30.1.207.View ArticlePubMed CentralPubMedGoogle Scholar
 Pu S, Wong J, Turner B, Cho E, Wodak SJ: Uptodate catalogues of yeast protein complexes. Nucleic Acids Res. 2009, 37 (3): 825831. 10.1093/nar/gkn1005.View ArticlePubMed CentralPubMedGoogle Scholar
 Mewes HW, Amid C, Arnold R, Frishman D, Güldener U, Mannhaupt G, Münsterkötter M, Pagel P, Strack N, Stümpflen V, Warfsmann J, Ruepp A: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004, 32 (suppl 1): D41—D44PubMed CentralPubMedGoogle Scholar
 Song J, Singh M: How and when should interactomederived clusters be used to predict functional modules and protein function?. Bioinformatics. 2009, 25 (23): 31433150. 10.1093/bioinformatics/btp551.View ArticlePubMed CentralPubMedGoogle Scholar
 Rhrissorrakrai K, Gunsalus KC: MINE: module identification in networks. BMC Bioinformatics. 2011, 12: 19210.1186/1471210512192.View ArticlePubMed CentralPubMedGoogle Scholar
 Wu M, Li X, Kwoh CK, Ng SK: A coreattachment based method to detect protein complexes in PPI networks. BMC Bioinformatics. 2009, 10: 16910.1186/1471210510169.View ArticlePubMed CentralPubMedGoogle Scholar
 Jiang P, Singh M: SPICi: a fast clustering algorithm for large biological networks. Bioinformatics. 2010, 26 (8): 11051111. 10.1093/bioinformatics/btq078.View ArticlePubMed CentralPubMedGoogle Scholar
 AltafUlAmin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006, 7: 20710.1186/147121057207.View ArticlePubMed CentralPubMedGoogle Scholar
 Chen B, Fan W, Liu J, Wu FX: Identifying protein complexes and functional modules–from static PPI networks to dynamic PPI networks. Brief Bioinformatics. 2013, 15 (2): 212228.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.