- Research article
- Open Access
Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms
© Jafari et al. 2015
- Received: 16 March 2015
- Accepted: 29 September 2015
- Published: 5 October 2015
In the field of network science, exploring principal and crucial modules or communities is critical in the deduction of relationships and organization of complex networks. This approach expands an arena, and thus allows further study of biological functions in the field of network biology. As the clustering algorithms that are currently employed in finding modules have innate uncertainties, external and internal validations are necessary.
Sequence and network structure alignment, has been used to define the Interlog Protein Network (IPN). This network is an evolutionarily conserved network with communal nodes and less false-positive links. In the current study, the IPN is employed as an evolution-based benchmark in the validation of the module finding methods. The clustering results of five algorithms; Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Cartographic Representation (CR), Laplacian Dynamics (LD) and Genetic Algorithm; to find communities in Protein-Protein Interaction networks (GAPPI) are assessed by IPN in four distinct Protein-Protein Interaction Networks (PPINs).
The MCL shows a more accurate algorithm based on this evolutionary benchmarking approach. Also, the biological relevance of proteins in the IPN modules generated by MCL is compatible with biological standard databases such as Gene Ontology, KEGG and Reactome.
In this study, the IPN shows its potential for validation of clustering algorithms due to its biological logic and straightforward implementation.
- Protein-protein interaction network
- Interlog protein network
- Network module
One of the important challenges in the interpretation of proteomic data is the detection of the cellular active process by exploring protein function. This newly emerging discipline; network science, has demonstrated that the majority of biological and evolutionary concepts make sense in the light of Systems Biology [1, 2]. Hence, the protein function, and consequently, cell function are more clearly demonstrated in the context of protein interaction network . Protein-protein interactions make up the major branch in the study of protein interaction networks. From a biochemical view, these interactions can be divided into two categories: physical and functional [4, 5]. There are several methods employed to describe these interactions. The advantages and disadvantages of these methods have been widely reviewed [6–8]. Different limitations such as slow- and small-scale performances, inability to identify protein complexes, artificial interaction obtained from the in vitro assay and the operational restrictions, led to the discovery of methods that complement each other . Some experimental methods are involved in determining physical and functional interactions [10, 11]. The phylogenetic profile, Rosetta stone, gene neighborhood and co-evolution are the most prevalent computational methods [11–13].
Biological network modules
After constructing a Protein-Protein Interaction Network (PPIN), the next step is the exploration of the protein tasks within this complex circuit. As Alessandro Vespignani mentioned, “evolution thinks modular” ; a cell’s activity is a result of groups of interacting proteins, known as functional module (if they do not necessarily interact at the same time and place), in PPIN [15, 16]. Therefore, the PPIN modules should be identified and determined and then a biological function could be assigned to them based on the protein annotations. Sometimes this procedure is specifically more successful for the protein complexes that work together at the same time and place, rather than for the functional modules [6, 9].
Module detection can be divided into two approaches, namely graph clustering, and distance-based clustering. In the first approach, algorithms seek communities of the nodes in the graph that contain more intra-edges than inter-edges, e.g. Super Paramagnetic Clustering (SPC) , Highly Connected Subgraph (HCS) based on the Monte Carlo algorithm , Markov clustering (MCL)  and Restricted Neighborhood Search Clustering (RNSC) . In the distance-based approach, the clustering algorithms e.g. the hierarchical or k-means are used so that the concept of distance and its associated measures in graph theory are applied as the similarity measures in the clustering. Some of the distances used in this approach are as follows: shortest path , number of edges [22, 23], shortest path profiles [24, 25] and a combination of distance and the statistical objects . The detected modules are then well-characterized, biologically, based on information beyond the network topology such as gene expression, cell localization, virulence and knockout phenotypes [6, 27].
Although, protein complexes consist of functional and also physical interactions at the same time, it is obvious that many of the functional interactions are not in the data of the protein complexes. Several protein interactions happen transiently and indirectly, and as such are not detectable by empirical routine tests. These are important steps in the protein interactions which are lacking in the MIPS database [33–35]. In addition, modeling and representing the protein complex acquired by some experimental techniques, such as a graph (e.g. “spoke” and “matrix” model) is a challenging issue . Furthermore, one cannot ignore the fact that the inherited experimental errors inherent to these problems and many more protein complexes, have not been fully studied as yet . These flaws may lead to misinterpretation in the validation step if we use only the MIPS database.
On the other side, GO is a standard glossary of biological terms known as the first and most common reference for the Biological Process (BP), Molecular Function (MF) and Cellular Compartment (CC) of the proteins . Additionally, a significant correlation between the node distances in some biological networks and the semantic similarity of their GO terms has also been reported [39–42]. Although the GO contains comprehensive and organized information, it has some limitations, namely, insufficient GO annotations (35–55 % false-negatives) , inaccurate GO annotation (false-positives, it should be observed that most of the annotations in GO are obtained by an indirect method such as gene manipulation as well as heterogeneous experimental and computational data), the functional diversity of proteins under different conditions resulting in different and sometimes conflicting annotations for one protein (false-positives)  and errors due to the manual annotation approach . These deficiencies lead also to misinterpretation in the validation step.
The goal of present study
Regarding the aforementioned restrictions in the validation step of module finding algorithms, we propose a network-based evolutionary benchmark as a complementary approach to solving some of the presented issues. Recently, in a companion study , we introduced a common network, that had low false-positives and tuned false-negatives. Using the four PPINs, a network with a high degree of conservation between four species was constructed. We call this common network the Interlog Protein Network (IPN) (Additional file 1). In the present study, the IPN, which is confirmed using experimentally proteomic data, has been suggested to be applied as a complementary benchmark in the validation of the different module finding algorithms, namely, Markov Clustering (MCL) , Restricted Neighborhood Search Clustering (RNSC) , Cartographic Representation (CR) , Laplacian Dynamics (LD) [48, 49] and Genetic Algorithm, to find communities in Protein-Protein Interaction networks (GAPPI) .
In the current study, the mitochondrial IPN of the four eukaryotic species was constructed. These species consisted of, human, rat, fruit fly and worm. The IPN was achieved through the interlog finding procedure of the mitochondrial PPIN of these species. In the other words, the IPN is an evolutionarily conserved network obtained from the overlap of orthologous proteins reinforced by gene expression. By pair-wise sequence alignment (≥30 %), the 226 Orthologous Protein Sets (OPSs) were obtained. Each OPS contained four orthologous proteins from the four species (83 human, 82 rat, 83 fruit fly and 80 worm). Finally, the IPN showed 29 nodes, 61 edges, 4.34° on average, a diameter of 6, and an average clustering coefficient of 0.625 (Additional files 1 and 2). This network represents the evolutionarily conserved topological network features shared among these species.
The expression data is used to empirically validate the IPN. The significantly high correlation between the protein concentrations endorsed the edges in the IPN. In fact, the correlation or co-expression network was reconstructed based on this concentration data and this network was compared to the IPN. In the previous study, the rat mitochondrial proteins (~500) were analyzed by several electrophoresis techniques . It was claimed that different electrophoresis techniques are capable of fractionating proteins with different subcellular localizations . Hence, the significant correlation between the expression profiles of the proteins in different electrophoresis implies they are co-localized proteins [51–53].
IPN expression data
STRING derived networks (Nodes and Edges)
Co-expression networks (Nodes and Edges)
Ratio of matched edges
Interlog protein network (IPN)
Protein-protein interaction network (PPIN)
Comparison of network clustering methods
All the methods including RNSC, MCL, CR, LD and GAPPI were performed to discover modules in all four PPINs and the IPN. It should be noted that all of these algorithms are unsupervised and the network size affects the number of clusters. By defining the IPN as the benchmark, some external measures were used to validate all the methods. Next, some comparison indices were used, including Jaccard, Rand, Fowlkes-Mallows and Minkowski for all species. Except for the Minkowski index with the range [0, +∞) (where the values near to zero indicated the greater similarity) the other indices have a range [0,1] and the values closer to zero, indicate greater inconsistency.
As represented in Fig. 2, our defined network showed that MCL outperforms the clustering methods of GAPPI, RNSC, LD and CR in terms of external measure indices. The range of the standard deviations showed that the MCL is more dependent on the size of the graph. This superiority was evident in all the indices, even in the Rand index with the above-mentioned imperfection. However, in the case of human PPIN as a large network, MCL and GAPPI cluster with similar accuracy (Additional file 2).
Meanwhile, the superiority of the MCL is compatible with the earlier results . By distinct approach, they presented a comparative assessment of clustering algorithms and showed that the MCL was remarkably robust in graphing alterations and capable of the extraction of the complexes from the PPINs. CR took a long computation time and could not specify the modularity as well as the other algorithms within a reasonable time. The RNSC, LD and CR clustering showed similar ability to find the module robustly but the LD algorithm showed the lowest standard deviation among all the indices calculated. The GAPPI as the most recently proposed algorithm for this problem, works better than RNSC, LD and CR. This algorithm takes second place after MCL in all comparisons and it clusters large network same as MCL. This pattern was almost repeated in the recent study  using MIPS as gold standard.
The biological relevance of the proteins in each module detected by MCL was assessed. By using the Enrichr tool , three well-known biological standard databases are used namely; Gene Ontology (Biological Processes) , KEGG  and Reactome . The result shows that each module enriched significantly and annotated separately (Additional file 3). Briefly, in 3 modules of this conservative IPN, the results are as follows. The first module is related to citrate/TCA cycle and oxidation phosphatase based on these ID numbers (GO:0006099, GO:0022904, GO:0022900, ko00020 and ko00190). The second module is related to Mitochondrial protein import based on these ID numbers (GO:0006626, GO:0070585, GO:0072655 and GO:0006839). The last module is related to Mitochondrial translation, ribosome and nucleoside biosynthetic process based on these ID numbers (GO:0046031, GO:0009133, GO:0046033, GO:0009135, GO:0009179, ko03010, ko00240 and ko00230). These results are compatible with our earlier study about biological meaningful communities in IPN as a pure evolutionary extract of mitochondrial PPIN .
There are several module detection methods based on different approaches. Validation assays are required to compare and select the best one for network analysis. The major prerequisite for validation is the determination of the reliable benchmark. A standard topological and functional PPIN helps us to assess and verify the PPIN modularity results. In the earlier studies, researchers used the MIPS or GO dataset as the gold standard in validation assays. As mentioned earlier, these datasets are not point-device gold standard and each one has its own particular shortcomings. In other words, these databases have been designed with specific purpose and are diverse conceptually .
In the current study, we used the pair-wise sequence alignment and comparative interactomics of evolutionary distant species to reconstruct a conserved and common network that can be used as the benchmark or ground truth. The proposed benchmark does not have the above-mentioned limitations. First, the edges (interaction data) in IPN and associated compared networks are generally of the same origin. This implies that if the edges of the associated compared networks are predicted and designated computationally, this benchmark is also constituted from the computational data and so on for experimentally identified interaction. In other words, the IPN edges are a result of the filtering procedure (see Additional file 4) and they do not originate from logically distinct methods. Second, the IPN reconstruction procedure most likely leads to a network with low false-positives and tuned false-negatives. This issue has a high impact on the assessment results in the validation step. Third, the reconstruction of IPN is possible for all the sequenced proteins and genes that are well-conserved across multiple species with predicted interactions. It implies that this approach does not require special expensive and time consuming techniques to generate the experimental data and evaluate the molecular networks.
Similar to the previous result , but dissimilar in approach, we found MCL to be the outstanding algorithm based on its performance in the comparison study. In the traditional method, the MIPS database was used to evaluate the different clustering methods. The sensitivity and accuracy of the different methods was also examined by adding and subtracting the edges i.e. artificial false-positives and negatives (Note that their tests did not contain large size changes). Our findings about GAPPI implementation are also consistent with the prior study , which showed the improving ability of a genetic algorithm to search modules in PPINs based on the MIPS database.
However, interaction data was retrieved from the STRING database which includes different sources of information, including various experimental, computational and even text mining methods [60, 61]. In addition, an independent set of empirical data was applied and the IPN quality was experimentally confirmed. However, our goal was to search for and introduce a method that could segregate the functional modules. It should be noted again that a functional module means a group of cell components and their interactions that do or do not promise specific biological functions at the same time and place. So, these modules also include all the protein complexes. Therefore, the validation standard should not lack the functional interaction data. MCL is the superior module detection method in exploring the protein complexes and also for the functional modules based on the previous  and current studies, respectively. In addition, in terms of different graph sizes, it appears that MCL is not as robust as the other algorithms based on the range of the standard deviation.
In this study, we suggest the IPN to justify the modularity results of any PPIN due to three preponderances mentioned above. The graph clustering algorithm would be inefficient if it could not find the modules analogously in the individual PPINs and IPN as a purified, conserved and confirmed network. This approach to make a new benchmark may also help to assess and verify other biological networks e.g. gene regulatory networks or gene correlation networks and other biological network analysis methods such as network motif finding or orienting PPINs, which are subjects for further research. Again, this approach uses evolutionary concept i.e. conservedness to evaluate the biological networks. This is reminiscent of the well-known quote, “Nothing in biology makes sense, except in the light of evolution” .
Interlog protein network (IPN)
Construction of the common PPIN or IPN has been described earlier in detail . Briefly, the mitochondrial reviewed proteins were retrieved from the four eukaryotic model species (Rattus norvegicus, Drosophila melanogaster, Caenorhabditis elegans and Homo sapiens) from the UniProtKB database (UniProt release 2013_02) . Then, Using the Needleman and Wunsch algorithm, the homologous proteins were identified in the OPSs. In the next step, four distinct mitochondrial PPINs of the four species were identified from the STRING database (Ver. 9) . The four PPINs were elicited with the default value in the database by all the prediction methods. Finally, by applying a stringent rule that is the existence of interlog in all four species, the mitochondrial IPN of these species was enucleated.
To explain more, an edge links two OPSs, say OPS1 = (p1h, p1r, p1f, p1w) and OPS2 = (p2h, p2r, p2f, p2w), if the protein pairs i.e. (p1h, p2h), (p1r, p2r), (p1f, p2f) and (p1w, p2w), have interaction based on STRING database (h, r, f and w indicate human, rat, fruit fly and worm respectively). Therefore, those OPSs which do not satisfy this condition will not be used in the IPN. In other word, the IPN was constructed in a way that each edge between two OPSs in the IPN indicates the six interlogs in these species (In this example, if it is a link between OPS1 and OPS2, then there are six interlogs i.e. ((p1h, p2h) and (p1f, p2f)), ((p1h, p2h) and (p1w, p2w)), ((p1h, p2h) and (p1r, p2r)), ((p1r, p2r) and (p1f, p2f)), ((p1r, p2r) and (p1w, p2w)) and ((p1f, p2f) and (p1w, p2w))). It should be noted that in each step some proteins are pretermitted to discern conserved structures (Additional file 4).
The results of the mitochondrial proteomic study of rat  were used for the empirical evaluation. In the shotgun proteomics strategy, the rat liver proteome with different cellular compartments was detected and quantified by several gel-based fractionation techniques. In the present study, normalized peptide counts were used to estimate the protein concentrations in a label-free quantification method. Then, Pearson correlation was applied to find the correlated proteins (|r coefficient| ≥ 0.7, P-value ≤ 0.05). According to the distinction made by the electrophoresis methods, the correlated proteins are likely co-localized. And, also as discussed earlier, co-localization can confirm the protein-protein interaction. Later, the ratio of the correlated proteins in the rat PPIN and IPN was computed separately and compared with the hypergeometric test (P-value ≤ 0.001). Thus, the IPN edges were examined by independent experimental data statistically.
Network clustering algorithms
Main features of the graph clustering methods presented in this study
Markov clustering (MCL)
Restricted Neighborhood Search Clustering (RNSC)
Laplacian dynamics (LD)
Cartographic Representation (CR)
Genetic Algorithm to find communities in Protein-Protein Interaction networks (GAPPI)
Flow simulation & Pagerank centrality
Cost-based local search
Multiscale modular structure
Inter- and intramodule connection
Search inspired by natural evolution
Allow multiple assignations
Allow unassigned nodes
Edge-weighted graphs supported
Protein family detection
Protein complex prediction
High modularity partitions of large (more than million) networks finding
Protein-protein interaction networks
Upon Gephi program
Enright A.J. et al. (2002) 
King A.D. et al. (2004) 
(1) Lambiotte R. et al. (2007) ;
Guimera R. & Amaral LAN (2005) 
Pizzuti C. & Rombo S. E. (2014) 
(2) Blondel V.D. et al. (2008) 
Evaluation of the clustering results
After clustering, validation is required to confirm the results or compare the different methods. A new benchmark was introduced, i.e., IPN in the validation step, so that the modules corresponding to each of the PPINs are compared with the IPN’s modules. In fact, the IPN was used as the ground truth in the standard external measures assay. Note that the clustering results on the PPIN are restricted to those proteins also in the IPN. It was expected that the successful algorithm should be able to find the modules analogously in PPIN and IPN. In order to assess the clustering results, the similarity matrices (Symmetric binary matrices) of clustering results were constructed, such that a 1 indicated placing two objects in the same cluster or module and a 0 indicated the opposite. Then, the entities of each of the PPINs and IPN matrices were compared with each other. If the corresponding entities in the two matrices were equal, the two clustering methods resulted in the same clusters. The following four conditions occurred: Agreements; n11 (Number of paired entities in the similarity matrices in which both are 1) and n00 (Number of paired entities in the similarity matrices in which both are 0), Disagreements; n10 (Number of paired entities that are 1 in the PPIN similarity matrix and 0 in the IPN similarity matrix) and n01 (Number of paired entities that are 1 in the IPN similarity matrix and 0 in the PPIN similarity matrix).
In order to perform biological evaluation of IPN modules, Enrichr software was used . In this web-based tool, significantly enriched terms are extracted based on the Gene Ontology, Biological Processes , Kegg Orthology  and Reactome databases . The combined score; consisting of the Z-score and adjusted p-value, was used to rank and define enriched terms. This validation was done for the modules defined by MCL algorithm as a superior algorithm in our comparison.
The authors would like to express their gratitude to Dr. Anne-Ruxandra Carvunis for thoroughly reviewing the manuscript and to Dr. Sayed-Amir Marashi for his useful comments. Valuable contributions were also made by Zhi Wang, Jianzhi Zhang (University of Michigan) and Clara Pizzuti (National Research Council of Italy) in providing module finding programs (CR and GAPPI respectively) and by Mohammad-Reza Okhovat in graphically designing some of the Figures. The computation was performed at Math. Computing Center of IPM (http://math.ipm.ac.ir/mcc). The article processing charge has been waived by BioMed Central.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Barabási AL. The network takeover. Nat Phys. 2011;8(1):14–6.View ArticleGoogle Scholar
- Carvunis AR. From proteins and their interactions to evolutionary principles of biological systems. Université Joseph Fourier. 2011.Google Scholar
- Hou J, Chi X. Predicting protein functions from PPI networks using functional aggregation. Math Biosci. 2012;240(1):63–9.View ArticlePubMedGoogle Scholar
- Srivas R, Hannum G, Ruscheinski J, Ono K, Wang P-L, Smoot M, et al. Assembling global maps of cellular function through integrative analysis of physical and genetic networks. Nat Protoc. 2011;6(9):1308–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Bandyopadhyay S, Kelley R, Krogan NJ, Ideker T. Functional maps of protein complexes from quantitative genetic interaction data. PLoS Comput Biol. 2008;4(4):e1000065.View ArticlePubMedPubMed CentralGoogle Scholar
- Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3(1):88.PubMedPubMed CentralGoogle Scholar
- Sharan R, Ideker T. Modeling cellular machinery through biological network comparison. Nat Biotechnol. 2006;24(4):427–33.View ArticlePubMedGoogle Scholar
- Futschik ME, Chaurasia G, Herzel H. Comparison of human protein–protein interaction maps. Bioinformatics. 2007;23(5):605–11.View ArticlePubMedGoogle Scholar
- Luonan C, Rui-Sheng W, Xiang-Sun Z. Biomolecular networks methods and applications in systems biology. USA: John Wiley and Sons, Inc; 2009.Google Scholar
- Braun P, Tasan M, Dreze M, Barrios-Rodiles M, Lemmens I, Yu H, et al. An experimentally derived confidence score for binary protein-protein interactions. Nat Methods. 2009;6(1):91–7.View ArticlePubMedGoogle Scholar
- Ngounou Wetie AG, Sokolowska I, Woods AG, Roy U, Deinhardt K, Darie CC. Protein-protein interactions: switch from classical methods to proteomics and bioinformatics-based approaches. Cellular and Molecular Life Sciences: CMLS. 2013.Google Scholar
- Junker BH, Schreiber F. Analysis of biological networks. USA: Wiley; 2007.Google Scholar
- Yu D, Kim M, Xiao G, Hwang TH. Review of biological network data and its applications. Genomics Inform. 2013;11(4):200–10.View ArticlePubMedPubMed CentralGoogle Scholar
- Vespignani A. Evolution thinks modular. Nat Genet. 2003;35(2):118–9.View ArticlePubMedGoogle Scholar
- Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999;402(6761 Suppl):C47–52.View ArticlePubMedGoogle Scholar
- Pizzuti C, Rombo SE. Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods. Bioinformatics. 2014;30(10):1343–52.View ArticlePubMedGoogle Scholar
- Blatt M, Wiseman S, Domany E. Superparamagnetic clustering of data. Phys Rev Lett. 1996;76(18):3251–4.View ArticlePubMedGoogle Scholar
- Hartuv E, Shamir R. A clustering algorithm based on graph connectivity. Inf Process Lett. 2000;76(4-6):175–81.View ArticleGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30(7):1575–84.View ArticlePubMedPubMed CentralGoogle Scholar
- King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 2004;20(17):3013–20.View ArticlePubMedGoogle Scholar
- Arnau V, Mars S, Marín I. Iterative cluster analysis of protein interaction data. Bioinformatics. 2005;21(3):364–78.View ArticlePubMedGoogle Scholar
- Vazquez A, Flammini A, Maritan A, Vespignani A. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol. 2003;21(6):697–700.View ArticlePubMedGoogle Scholar
- Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006;7:207.View ArticlePubMedPubMed CentralGoogle Scholar
- Rives AW, Galitski T. Modular organization of cellular networks. Proc Natl Acad Sci U S A. 2003;100(3):1128–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Maciag K, Altschuler SJ, Slack MD, Krogan NJ, Emili A, Greenblatt JF, et al. Systems-level analyses identify extensive coupling among gene expression machines. Mol Syst Biol. 2006;2:2006.0003.View ArticlePubMedPubMed CentralGoogle Scholar
- Samanta MP, Liang S. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci U S A. 2003;100(22):12579–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou H. Network landscape from a Brownian particle’s perspective. Phys Rev E. 2003;67:041908.View ArticleGoogle Scholar
- Bader GD, Betel D, Hogue CWV. Bind: the biomolecular interaction network database. Nucleic Acids Res. 2003;31(1):248–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A. 2003;100(21):12123–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Dunn R, Dudbridge F, Sanderson CM. The use of edge-betweenness clustering to investigate biological function in protein interaction networks. BMC Bioinformatics. 2005;6:39.View ArticlePubMedPubMed CentralGoogle Scholar
- Pereira-Leal JB, Enright AJ, Ouzounis CA. Detection of functional modules from protein interaction networks. Proteins. 2004;54(1):49–57.View ArticlePubMedGoogle Scholar
- Brohée S, van Helden J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006;7:488.View ArticlePubMedPubMed CentralGoogle Scholar
- Mewes HW, Ruepp A, Theis F, Rattei T, Walter M, Frishman D, et al. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 2011;39(Database issue):D220–4.View ArticlePubMedGoogle Scholar
- Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002;30(1):31–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Mewes HW, Dietmann S, Frishman D, Gregory R, Mannhaupt G, Mayer KFX, et al. MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 2008;36(Database issue):D196–201.PubMedGoogle Scholar
- Wang Z, Zhang J. In search of the biological significance of modular structures in protein networks. PLoS Comput Biol. 2007;3(6):e107.View ArticlePubMedPubMed CentralGoogle Scholar
- Phan HTT, Sternberg MJE. PINALOG: a novel approach to align protein interaction networks–implications for complex detection and function prediction. Bioinformatics. 2012;28(9):1239–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Gene. 2000;25(1):25–9.Google Scholar
- Sevilla JL, Segura V, Podhorski A, Guruceaga E, Mato JM, Martínez-Cruz LA, et al. Correlation between gene expression and GO semantic similarity. IEEE ACM Trans Comput Biol Bioinformatics. 2005;2(4):330–8.View ArticleGoogle Scholar
- Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83.View ArticlePubMedGoogle Scholar
- du Plessis L, Skunca N, Dessimoz C. The what, where, how and why of gene ontology–a primer for bioinformaticians. Brief Bioinform. 2011;12(6):723–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Cho YR, Hwang W, Ramanathan M, Zhang A. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinformatics. 2007;8:265.View ArticlePubMedPubMed CentralGoogle Scholar
- Dotan-Cohen D, Letovsky S, Melkman AA, Kasif S. Biological process linkage networks. PLoS One. 2009;4(4):e5313.View ArticlePubMedPubMed CentralGoogle Scholar
- Luciani D, Bazzoni G. From networks of protein interactions to networks of functional dependencies. BMC Syst Biol. 2012;6(1):44.View ArticlePubMedPubMed CentralGoogle Scholar
- Khatri P, Drăghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21(18):3587–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Jafari M, Sadeghi M, Mirzaie M, Marashi SA, Rezaei-Tavirani M. Evolutionary conserved motifs and modules in mitochondrial protein interaction network. Mitochondrion. 2013;13(6):668.View ArticlePubMedGoogle Scholar
- Guimerà R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005;433(7028):895–900.View ArticlePubMedPubMed CentralGoogle Scholar
- Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theor Exp. 2008;10:P10008.View ArticleGoogle Scholar
- Lambiotte R, Ausloos M, Hołyst JA. Majority model on a network with communities. Phys Rev E. 2007;75(3):030101.View ArticleGoogle Scholar
- Jafari M, Primo V, Smejkal GB, Moskovets EV, Kuo WP, Ivanov AR. Comparison of in-gel protein separation techniques commonly used for fractionation in mass spectrometry-based proteomic profiling. Electrophoresis. 2012;33(16):2516–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen L, Wang RS, Zhang XS. Biomolecular networks: methods and applications in systems biology: Wiley. 2009.Google Scholar
- Zhang A. Protein interaction networks: computational analysis: Cambridge University Press. 2009.View ArticleGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003;302(5644):449–53.View ArticlePubMedGoogle Scholar
- Jiang D, Tang C, Zhang A. Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng. 2004;16(11):1370–86.View ArticleGoogle Scholar
- Campello RJGB. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recogn Lett. 2007;28(7):833–41.View ArticleGoogle Scholar
- Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. 2013.Google Scholar
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42(D1):472–7.View ArticleGoogle Scholar
- Tieri P, Nardini C. Signalling pathway database usability: lessons learned. Mol BioSyst. 2013;9(10):2401–7.View ArticlePubMedGoogle Scholar
- von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, et al. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33(Database issue):D433–7.View ArticleGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39(Database issue):D561–8.View ArticlePubMedGoogle Scholar
- Dobzhansky T. Nothing in biology makes sense except in the light of evolution. Am Biol Teach. 1973;35:125–9.View ArticleGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34(Database issue):D187.View ArticlePubMedGoogle Scholar