 Research article
 Open access
 Published:
Protein complexes detection based on node local properties and gene expression in PPI weighted networks
BMC Bioinformatics volume 23, Article number: 24 (2022)
Abstract
Background
Identifying protein complexes from protein–protein interaction (PPI) networks is a crucial task, and many related algorithms have been developed. Most algorithms usually employ direct neighbors of nodes and ignore resource allocation and secondorder neighbors. The effective use of such information is crucial to protein complex detection.
Result
Based on this observation, we propose a new way by combining node resource allocation and gene expression information to weight protein network (NRAGEWPN), in which protein complexes are detected based on coreattachment and secondorder neighbors.
Conclusions
Through comparison with eleven methods in Yeast and Human PPI network, the experimental results demonstrate that this algorithm not only performs better than other methods on 75% in terms of fmeasure+, but also can achieve an ideal overall performance in terms of a composite score consisting of five performance measures. This identification method is simple and can accurately identify more complexes.
Background
Proteins are the basis of biological activities, and their functions are generally expressed by the interactions between proteins [1]. In organisms, protein–protein interaction (PPI) networks consist of proteins and protein interactions. PPI networks provide an elegant means for expressing gene regulation and metabolic pathways in complex biological systems [2]. Protein complexes are the locally dense regions of PPI networks and possess graphlike structures in which a node represents a protein and an edge represents interaction between two proteins [3].
Complexes take part in many diverse biochemical activities that are fundamental to all kinds of functions, such as cell homeostasis, cell cycle control, growth, and proliferation. Moreover, specific functional modules usually are related to certain diseases.
Although great progress has been made in identifying protein complexes, laboratorybased methods are expensive, ineffective and sometimes even infeasible, and only parts of protein complexes are located. In addition, experiments in the laboratory are often incomplete because of the constraints of experimental conditions. As it is necessary to overcome the lacking of laboratorybased methods, a large number of computational algorithms have been designed as alternative methods to identify protein clusters, such as densitybased clustering [4,5,6,7,8], hierarchical clustering[8,9,10], partitionbased clustering [11, 12], flow simulationbased clustering[13,14,15,16] and other methods with integrating biological and topological multiple information [17,18,19,20]. Although methods of protein complexes detection have achieved some effective results, how to reasonably integrate PPI node local data and gene expression biological information to construct weighted graphs, and how to define effective detection methods to identify complexes from the weighted network still need further study. Only direct neighbors are applied to PPI network clustering problems, which is not sufficient. In fact, node resource allocation information and secondorder neighbors often contain some important potential information in PPI networks.
Aiming at the solution for the abovementioned problems, we introduce a novel method based on resource allocation and gene expression in weighted PPI networks (called NRAGEWPN) with based on coreattachment structure and secondorder neighbors searching. First, based on the resource allocation and gene expression of the PPI network, a new weight metric is designed to accurately describe the interaction between proteins. Then our method detects a series of dense complex cores based on density and network diameter constraints and the final complexes are recognized by expanding the secondorder neighbors of nodes in core complexes. This identification method is simple and can accurately identify more complexes.
Methods
Protein complex detection with a computational approach from PPI data is useful as the useful supplement to the limited experimental methods. Besides the enhancement in graph clustering techniques, successful and accurate methods for protein complex prediction depends more on the construction of weighted graphs. Therefore, constructing weighted graph for protein interactions is essential. In this section, we introduce a novel method based on resource allocation and gene expression in weighted PPI networks with two main steps. First, a method is proposed to evaluate the reliability of the protein interaction data considering both the common neighbor information and gene expression profiles through the weighted graph construction. Second, protein complexes are detected based on coreattachment and secondorder neighbors in this new weighted graph. The workflow of our method is shown in Fig. 1.
Assessing the reliability of protein interaction
To represent a PPI network, a 3element tuple \(G = ({\text{V}},E,{\text{W}})\) is employed, where \(V = (V_{i} )(1 \le i \le N)\) is a set of N proteins, and \(E = \{ e_{ij} \}\) is the set of PPI edges whose values are stored in matrix W. For each pair of nodes, \(i,j \in V\) and the edge \(e_{ij}\) is assigned a score as \(w_{ij}\). Inspired by the reference [21], resource allocation index (RA), is introduced to measure the similarity of interaction proteins in a network and a weighted graph based on resource allocation (WRA) is constructed in this step.
Taking Fig. 2 as an example, there is an edge between node 1 and node 2 and no common neighbors between them, but \(e_{12}\) is an important bridge for information transmission between node group{1, 2, 6, 7} and node group{1, 2, 3, 4, 5}. Simply, it is assumed that the transmitter 1 can carry resources, and will equally deliver it among all its neighbors. Based on this, the similarity of two nodes is shown in Eq. (1). We can consider node i and node j, which are directly connected without common neighbors and the node i can transmit the information to node j through edge \(e_{ij}\) to help the communication between two clusters {1, 2, 6, 7} and {1, 2, 3, 4, 5}. The value range of WRA belongs to [0 1]. This measure requires only the information of the nearest neighbors which therefore has very low computational complexity. \(N(i)\) is the set of the neighbors of node i and node i, N(j) is the set of the neighbors of node j and node j.
Pearson’s correlation of expression levels
Coexpression genes tend to encode interacting proteins [22]. In this paper, we mainly concentrate on linear gene expression networks unless explicitly stated otherwise and Pearson’s correlation coefficient of expression levels (PCC) is employed as biological information for interacting protein pair p and q. According to GBA principle (i.e. genes with similar expression spectrums have similar biological functions) [23], a higher correlation suggests a higher confidence in their interaction. PCC is generally used to measure the strength of the linear relationship between two variables and is also commonly used to measure the linear relationship between two sets of gene expression values. Suppose there are two columns of gene expression profiles \({\text{X}} = (x_{1} , \ldots ,x_{n} )\) and \({\text{Y}} = ({\text{y}}_{1} , \ldots ,{\text{y}}_{n} )\). Matrix W_{p} is formed by the PCC calculation formula, which is defined in Eq. (3). The value range of PCC belongs to [− 1 1]. If PCC (X,Y) < 0, it means that gene X and Y show a negative correlation; if PCC (X,Y) > 0, it means gene X and Y show a positive correlation, PCC (X, Y) = 0 means that there is no correlation between genes X and Y. If PCC(X, Y) < 0, protein pairs will be removed from PPI network in order to reduce the negative effect of low noise data on the detection results of mining protein complexes. The value range [0 1] of PCC is employed in this step.
where \(\overline{x}\) denotes the average value of the expression value of gene X at 36 different times and \(\overline{y}\) denotes the average value of the expression value of gene Y at 36 different times.
Weighted graph construction
In this part, we first describe how to compute the weighted value by combining gene expression information (GEI) based on PCC and RA information between two interaction proteins. The final weighted construction formula is proposed in Eq. (5).
Matrix \(W_{P}\) is constructed based on Pearson correlation coefficient and matrix \(W_{N}\) is constructed based on RA, respectively. After a simple calculation, the range of values can be known from 0 to 2. The final values are normalized to [0 1]. \(\alpha\)(\(0 \le \alpha \le 1\)) is a constant, where a smaller \(\alpha\) indicates that the importance of the modules is dependent more on RA information of the network, and a bigger \(\alpha\) indicates that the importance of the modules depends more on gene expression information. When \(\alpha = 0\), the weighted method only considers RA information. When \(\alpha = 1\), the weighted method only considers gene expression information. Therefore the Eq. (5) can measure the differential importance of interaction in protein networks by integrating node local information and biological information.
Detecting protein complexes in weighted graphs
The proposed algorithm, NRAGEWPN, consists of two phases: weighted graph construction and coreattachment protein complex detection based on secondorder neighbors searching. In the weighted graph construction phase, gene expression information and common neighbor information are integrated. A detailed description of the algorithm is outlined in Algorithm 1. Line 1 is for constructing matrix W_{N} with the given PPI datasets. Line 2 is for constructing matrix_{.} W_{p} with the gene expression data. Line 3 is for constructing the new matrix W based on W_{N} and W_{p}, and the protein interaction confidence is the sum of the weights of W_{N} and W_{p}. Lines 4–8 are for identifying core clusters. Lines 9–11 are for enlarging core clusters based on secondorder neighbors of nodes in each core.
In this algorithm, density and diameter are employed as the condition for complex detection.
If a node meets the two constraints in condition (7), it is added to the current cluster (subgraph). Generally, \(\lambda\) is usually set to 0.7 and \(\delta\) is set to 2, according to the references [12, 24].
(1) Density: The degree of a node V is the sum of the weights for each edge connecting to this node. Density in the weighted subgraph G = (V, E) is defined in (6). \(\left {\text{N}} \right\) is the number of nodes in G and w(e) is the weight of the edge \(e_{ij}\) in G.
(2) Network Diameter: Diameter is the shortest path in a cluster.
Results
Datasets
The effectiveness of our method is evaluated using PPI networks and gold standards of protein complexes from yeast and human and the detail information is shown in Table 1 and relative detail information can be find in reference [25]. GSE3431 dataset [26] is employed in our paper which records the data of 36 time points during three successive metabolic cycles.
Evaluation criteria
To evaluate our method on benchmark datasets and compare NRAGEWPN with other methods, evaluation measures are given in this section, such as sensitivity (SN), positive predictive value (PPV), accuracy (ACC), separation (SEP), fraction match (FRM), maximum matching ratio(MMR), precision (Prec), recall (Rec) and fmeasure, precision+, recall+, fmeasure+, the sum (F_MMR) of MMR and fmeasure+, the composite score(CS) of MMR, FRM, SEP, ACC and fmeasure [25]. Given a set of benchmark protein complexes \(R = \{ R_{1} ,R_{2} , \ldots ,R_{n} \}\) and a set of predicted clusters \(P = \{ P_{1} ,P_{2} , \ldots ,P_{n} \}\), two protein complexes, namely, \(R_{i}\) and \(P_{j}\), are generated from benchmark complex datasets R and predicted protein complex sets P, respectively. \(T_{ij}\) is the number of proteins in common between ith benchmark complex \(R_{i}\) and jth predicted complex \(P_{j}\). \(S{\text{N}}\), PPV and ACC are defined as follows. \(N_{i}\) presents the size of proteins in the ith benchmark module. Here, n is the number of benchmark complexes and m is the number of predicted complexes.
To evaluate protein complex prediction in terms of precision and recall, the Jaccard index is employed. The located complex \(P_{{\text{j}}}\) is defined to match the real complex \(R_{{\text{i}}}\) if the Jacquard similarity is greater than 0.5.
In terms of precision+, recall+ and fmeasure+, neighborhood affinity score \(NA(P_{j} ,R_{i} )\) between \(P_{j}\) and \(R_{i}\), as defined in Eq. (10) can be used to determine whether they match with each other. If \(NA(P_{j} ,R_{i} )\) = ω, \(\omega \ge t\), \(\omega\) is greater than 0.2, \(P_{j}\) and \(R_{i}\) are considered to be matching. In this paper, t is usually set as 0.20. \(\left{\mathrm{P}}_{\mathrm{i}}\right\) \(\left {P_{i} } \right\) and \(\left{\mathrm{R}}_{\mathrm{j}}\right\) \(\left {R_{j} } \right\) are the numbers of proteins in \({\mathrm{P}}_{\mathrm{i}}\) \(P_{i}\) and \(R_{j}\), respectively.
Comparison with other methods
To inspect the performance of our proposed algorithm, we compare our algorithm with MCODE [6], Cfinder [4], ClusterOne [20], ProRank+ [27], MCL [28], PC2P [25], CLE [7], CW [8], CLP [29], CI [13], DPCT [30] in different measures as shown in Additional file 1: Table S1 and all the weighted graphs are constructed based on Eq. (5). Comparison results about CS measure in four PPI networks of Yeast on CYC2008 are shown in Fig. 3.
Comparative analysis is performed with the sum score of MMR, FMR, SEP, ACC and fmeasure. Performances among different methods are compared for yeast and human with the corresponding complex datasets and PPI networks. First, as is illustrated in Fig. 3 that NRAGEWPN can achieve best performance in Collins, Gavin and KroganExt network and perform better than other ten methods except PC2P in KroganCore in terms of CS on CYC2008. On MIPS, NRAGEWPN outperforms all methods on MIPS in network Collins and ten methods in network Gavin, KroganExt and KroganCore except PC2P in Additional file 1: Table S1. On CORUM in 2 combinations, NRAGEWPN can achieve best performances in terms of CS. Second, in terms of fmeasure+, NRAGEWPN results the best performance except in Collins on MIPS. Third, in the rest measures, NRAGEWPN performs better than most other methods and the all detail information can be shown in Additional file 1: Table S1.
Assessment performances of fmeasure+ and accuracy with parameter α
By evaluating the importance of parameter \(\alpha\), we can more intuitively observe the influence of a certain parameter on the experimental results, and it is helpful to understand the advantages and disadvantages of the algorithm and enhance it. The critical parameter \(\alpha\) in our method is mainly employed to show the effectiveness of information fusion from local neighbors and gene expression information and to affect the detection results of protein complexes. This experiment investigates the effects of different parameters \(\alpha\) from 0.1 to 0.9 at interval of 0.1 on complex detection performance. Using fmeasure+ and accuracy as our experimental evaluation criterion, the performances with different \(\alpha\) are evaluated as shown in the Figs. 4 and 5, respectively. In Fig. 4,when the parameter \(\alpha\) is greater than or equal to 0.3, the fmeasure+ tends to stable. In In Fig. 5, when \(\alpha\) = 0.3, the best performance of accuracy can be achieved. In this article, we take \(\alpha\) = 0.3.
Robustness to the different thresholds (t)
In order to illustrate the comprehensive performance of NRAGEWPN, we demonstrate fmeasure+ performances with nine thresholds \(t = \left\{ {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9} \right\}\) among different methods in Fig. 6. Figure 6a shows the comparisons of fmeasure+ performances on the CYC2008 benchmark dataset in Collins. It can be illustrated that NRAGEWPN outperforms other eleven methods. Similar results can also be found on the CYC2008 benchmark in Gavin in Fig. 6b. Other comparisons are shown in Additional file 1: Fig. S1, which illustrates that NRAGEWPN performs better than other combinations on 50%. This further demonstrates the effectiveness of the fusion information from local node and gene expression data.
Discussion
Functional analysis
For the protein complexes identified by the NRAGEWPN algorithm, we measure the effectiveness of the algorithm quantitatively and qualitatively. We analyze the biological significance of the identified protein complexes. Real protein complexes often present high functional homogeneity, so the function enrichment test is employed to demonstrate the biological significances of detected protein complexes [31]. The function enrichment analysis of protein complexes identified from yeast PPI network is carried out to further verify the effectiveness of NRAGEWPN algorithm. The analysis and comparison of P value are shown in Table 2. P value of each complex can be divided into one of four intervals from small to large: < E15, [E15, E10], [E10, E5], [E5, 0.001]. When P value is greater than 0.001, it is generally considered that the function of the complex is very likely to be randomly assigned and has no biological significance. The percentages in brackets in Table 2 indicate the ratio of the number of complexes in a certain interval to the number of complexes in all intervals. For example, a total of 325 complexes are predicted by NRAGEWPN on CYC2008 in Collins and effective percentage of NRAGEWPN is greater than other eleven algorithms. Further, with respect to the biological relevance, the enrichment score of the annotations are employed to evaluate the performance of predicted complex. The average of detected complexes with at least one enriched annotation over all clusters among eleven approaches on six datasets is compared in Additional file 1: Table S2. The results illustrate that NRAGEWPN predicts biologically relevant clusters with enrichment scores with the top 70% of other methods in terms of the different GO categories.
Effectiveness of RA
Due to the noise data in the PPI network, NRAGEWPN uses gene expression and RA information to score a weight to each interaction of the PPI network. To assess the effect of using RA in the fmeasure+ for complexes detection, we conduct NRAGEWPN without considering RA information and compare its results with normal the NRAGEWPN which employs both gene expression and RA information. Without using RA situation, a weighted PPI network is constructed by gene expression only. Figure 7 shows the results of NRAGEWPN in RAOFF and RAON in Collins, Gavin, KroganCore and KroganExt datasets with CYC2008 and MIPS benchmarks, respectively. From Fig. 7, it can be shown that by introducing RA, the quality performance of F_MMR is enhanced. In term of RAON mode in Collins data, F_MMR increases 8.8% for the CYC2008 benchmark and 8.3% for the MIPS benchmark. According to Fig. 7, the same trend can also be shown in other three PPI datasets on two benchmarks, respectively. This experiment shows that using RA can reduce noise data and improve the overall performance of complexes detection.
Effectiveness of secondorder neighbors searching (SNS)
The second phase of the NRAGEWPN method is to enlarge the core complexes by secondorder neighbors. After detecting core protein complexes from weighted PPI network, due to the nature of complexes of coreattachment, there may be many attachment parts to be added to the cores. In this situation, the cores and attachment parts are combined to form final complexes. In order to assess the effect of introducing secondorder neighbors searching(SNS), we conduct NRAGEWPN without its second phase. Figure 8 shows the comparison between secondorder neighbors searchingon (SNSON) and secondorder neighbors searchingoff (SNSOFF) modes in terms of fmeasure+. On the CYC2008 benchmark, when NRAGEWPN uses the SNS phase, we can see a 5.2%, 3.2%, 7.8% and 7.7% rise in Collins, Gavin, KroganCore, KroganExt, respectively. As the results show, performance of fmeasure+ can be improved by introducing the secondorder neighbors searching.
Assessment of density in different weighted graphs
Although PCC cannot identify whether gene variables are directly regulated or indirectly regulated [33,34,35], in this paper, we mainly focus on PCC as biological information to construct weighted graph network based on gene expression, which is one of the most commonly used methods for constructing gene regulatory networks. At the same time, we discuss the influence of nonlinear correlation of gene expression on the density of whole network. We construct another four weighted graphs based on KBRV [32] method and the density of networks are compared in Fig. 9. First, the results show that four weighted networks based on KBRV can increase the density of PPI network. The reason is that the weighted value of the protein pairs that can be increased by (12). Second, we can find that when \(\alpha\) belongs to [0.3 0.5], the densities of four weighted graph by (5) decrease slow. In our experiment, \(\alpha\) = 0.3 is used. Lastly, in our future work,we will focus on the nonlinear correlation of gene expression for weighted graph construction and complex detection.
Conclusions
The identification of protein complexes is important for discovering and understanding the cellular organizations and biological processes in PPI networks. In this paper a new approach named NRAGEWPN is proposed for identifying protein complexes in protein–protein interaction networks. Based on the resource allocation and gene expression of the PPI network, we first design a new weight metric to accurately describe the interaction between proteins. Our method then constructs a series of dense complex cores based on density and network diameter constraints, and the final complexes are recognized by expanding the secondorder neighbors of nodes in core complexes. Through comparison with eleven methods in Yeast and Human PPI network, the experimental results demonstrate that this algorithm not only performs better than other methods on 75% in terms of fmeasure+, but also can achieve an ideal overall performance in terms of a composite score consisting of five performance measures. In the future work, we will focus on locating sparse and density protein complexes by integrating multiple information.
Availability of data and materials
The datasets are available at https://github.com/graceyy000/dataset and NRAGEWPN codes are available from the corresponding author on reasonable request.
Abbreviations
 PPI:

Protein–protein interaction
 NRAGEWPN:

A new way by combining node resource allocation and gene expression information to weight protein network
 RA:

Resource allocation
 WRA:

Weighted graph based on resource allocation
 PCC:

Pearson’s correlation coefficient
 GEI:

Gene expression information
 DP:

Detected protein complexes
 SN:

Sensitivity
 PPV:

Positive predictive value
 ACC:

Accuracy
 SEP:

Separation
 FRM:

Fraction match
 MMR:

Maximum matching ratio
 Prec:

Precision
 Rec:

Recall
 F_MMR:

The sum of MMR and fmeasure+
 CS:

The composite score of MMR, FRM, SEP, ACC and fmeasure
References
Lei X, Yang X, Wu F. Artificial fish swarm optimization based method to identify essential proteins. IEEE/ACM Trans Comput Biol Bioinf. 2018;17(2):495–505.
Bo W, Pourshafeie A, Zitnik M, Zhu J, Bustamante CD, Batzoglou S, Leskovec J. Network enhancement as a general method to denoise weighted biological networks. Nat Commun. 2018;9:1–8.
Rehman ZU, Idris A, Khan A. Multidimensional scaling based grouping of known complexes and intelligent protein complex detection. Comput Biol Chem. 2018;74:149–56.
Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22:1021–3.
AltafUlAmin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform. 2006;7(1):1–13.
Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 2003;4(1):2.
Newman M. Finding community structure in networks using the eigenvectors of matrices. PhRvE. 2006;74:036104.
Libraries M. Computing communities in large networks using random walks. In: Computer and information sciences—ISCIS 2005; 2005.
Ahn YY, Bagrow J, Lehmann S. Link communities reveal multiscale complexity in networks. Nature. 2010;466:761–4.
Arnau V, Mars S, Marín I. Iterative cluster analysis of protein interaction data. Bioinformatics. 2004;21(3):364–78.
Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;312:972–6.
King AD, Przulj N, Jurisica I. Protein complex prediction via costbased clustering. Bioinformatics. 2004;20(17):3013–20.
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci. 2008;105(4):1118–23.
Enright AJ, Dongen SV, Ouzounis CA. An efficient algorithm for largescale detection of protein families. Nucl Acids Res. 2002;7(30):1575–84.
PereiraLeal JB, Enright AJ, Ouzounis CA. Detection of functional modules from protein interaction networks. ProteinsStruct Funct Bioinform. 2004;54:49–57.
Cho YR, Hwang W, Ramanathan M, Zhang A. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinform. 2007;8:1–13.
Hwang W, Cho YR, Zhang A, Ramanathan M. CASCADE: a novel quasi all pathsbased network analysis algorithm for clustering biological interactions. BMC Bioinform. 2008;9(1):64.
Kentaro I, Weijiang L, Hiroyuki K, Ernberg IT. Diffusion model based spectral clustering for proteinprotein interaction networks. PLoS ONE. 2010;5(9):e12623.
Lecca P, Re A. Detecting modules in biological networks by edge weight clustering and entropy significance. Front Genet. 2015;6:265.
Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in proteinprotein interaction networks. Nat Methods. 2012;9(5):471–2.
Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71(4):623–30.
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of largescale data sets of proteinprotein interactions. Nature. 2002;417(6887):399–403.
Wolfe CJ, Kohane IS, Butte AJ. Systematic survey reveals general applicability of “guiltbyassociation” within gene coexpression networks. BMC Bioinform. 2005;6(1):227–220.
Li M, Chen JE, Wang JX, Hu B, Chen G. Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinform. 2008;9:1–16.
Sara O, Angela A, Zoran N. PC2P: parameterfree networkbased prediction of protein complexes. Bioinformatics. 2021;37:73–81.
Tu PB. Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science. 2005;310(5751):1152–8.
Hanna EM, Zaki N. Detecting protein complexes in protein interaction networks using a ranking algorithm with a refined merging procedure. BMC Bioinform. 2014;15(1):204.
van Dongen S. Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, Utrecht, The Netherlands 2000.
Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in largescale networks. Phys Rev E. 2007;76(3 Pt 2):036106.
SabziNezhad A, Jalili S. DPCT: a dynamic method for detecting protein complexes from TAPaware weighted PPI network. Front Genet. 2020;11:567.
Ma J, Wang J, Ghoraie LS, Men X, HaibeKains B, Dai P. A comparative study of cluster detection algorithms in proteinprotein interaction for drug target discovery and drug repurposing. Front Pharmacol. 2019;10:109.
Yao Z, Zhang J, Zou X. A general index for linear and nonlinear correlations for high dimensional genomic data. BMC Genomics. 2020;21(1):1–14.
Guo X, Zhang Y, Hu W, Tan H, Wang X. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PLoS ONE. 2014;9(2):e87446.
Kontio JA, RintaAho MJ, Sillanpää MJ. Estimating linear and nonlinear gene coexpression networks by semiparametric neighborhood selection. Genetics. 2020;215(3):597–607.
Piran M, Karbalaei R, Piran M, Aldahdooh J, Mirzaie M, AnsariPour N, Tang J, Jafari M. Can we assume the gene expression profile as a proxy for signaling network activity? Biomolecules. 2020;10(6):850.
Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege F, Weissman JS, Krogan NJ. Toward a comprehensive atlas of the physical interactome of saccharomyces cerevisiae. Mol Cell Proteomics. 2007;6:439–50.
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440(7084):631–6.
Krogan N, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis A. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–43.
Damian S, Andrea F, Stefan W, Kristoffer F, Davide H, Jaime HC, Milan S, Alexander R, Alberto S, Tsafou KP. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(D1):D447–52.
McDowall MD, Scott MS, Barton GJ. PIPs: human proteinprotein interaction prediction database. Nucl Acids Res. 2009;37:D651–6.
Pu S, Jessica W, Brian T, Emerson C, Wodak SJ. Uptodate catalogues of yeast protein complexes. Nucl Acids Res. 2009;37(3):825–31.
Mewes HW, Amid C, Arnold R, Frishman D, Güldener U, Mannhaupt G, Münsterkötter M, Pagel P, Strack N, Stümpflen V. MIPS: analysis and annotation of proteins from whole genomes. Nucl Acids Res. 2004;32(suppl_1):169–72.
Giurgiu M, Reinhard J, Brauner B, DungerKaltenbach I, Fobo G. CORUM: the comprehensive resource of mammalian protein complexes2019. Nucleic Acids Res. 2018;47:D559–63.
Acknowledgements
We wish to thank the authors of the toolkits used in this paper and the reviewers of this paper.
Funding
This research is supported by the Liaoning Natural Science Foundation Project of China (20180550918). The funder YY took part in the formulation and development of methodology, and provided financial support for this study.
Author information
Authors and Affiliations
Contributions
YY designed the study and contributed manuscript preparation; DZK conducted the study. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional fle 1.
The Additional fle 1 contains Figures 1, Tables 1 and 2. Figure 1 shows comparative analysis of approaches for prediction of protein complexes in Yeast on different threshold t. Table 1 shows Comparative analysis of eleven algorithms with respect to different measures. Table 2 shows the average of enrichment score of predicted complexes with at least one enrichedannotation over all clusters compared among eleven methods across six datasets.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yu, Y., Kong, D. Protein complexes detection based on node local properties and gene expression in PPI weighted networks. BMC Bioinformatics 23, 24 (2022). https://doi.org/10.1186/s12859021045434
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859021045434