Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules

Background It is generally acknowledged that a functional understanding of a biological system can only be obtained by an understanding of the collective of molecular interactions in form of biological networks. Protein networks are one particular network type of special importance, because proteins form the functional base units of every biological cell. On a mesoscopic level of protein networks, modules are of significant importance because these building blocks may be the next elementary functional level above individual proteins allowing to gain insight into fundamental organizational principles of biological cells. Results In this paper, we provide a comparative analysis of five popular and four novel module detection algorithms. We study these module prediction methods for simulated benchmark networks as well as 10 biological protein interaction networks (PINs). A particular focus of our analysis is placed on the biological meaning of the predicted modules by utilizing the Gene Ontology (GO) database as gold standard for the definition of biological processes. Furthermore, we investigate the robustness of the results by perturbing the PINs simulating in this way our incomplete knowledge of protein networks. Conclusions Overall, our study reveals that there is a large heterogeneity among the different module prediction algorithms if one zooms-in the biological level of biological processes in the form of GO terms and all methods are severely affected by a slight perturbation of the networks. However, we also find pathways that are enriched in multiple modules, which could provide important information about the hierarchical organization of the system.

of the same biological function or pathway. This may also be reflected in the evolution of the organisms [8,[13][14][15]. As a complicating factor, in reality, these pathways are not discrete, but each gene may take part in multiple biological functions, and therefore can be a part of multiple communities. Hence, a biological network with a modular structure can contain multiple overlapping communities, which might also contribute to the fact that biological networks are robust [16,17].
For protein interaction networks (PINs) it is known that there are two types of modular structure that are of significant importance. These modules can be either formed by protein complexes or dynamic functional units [18]. Also the modules in PINs of different species have been explained as the efficient functioning of a cell and the basis of evolution in order to adapt the changes to the environment quickly [19,20]. In [21] the existence of two further types of structural components of modules in protein networks has been revealed, which have been termed core components and ring components. The core components are more conserved and perform key biological functions, while the ring components performs certain specialized functions under particular circumstances potentially triggered by environmental changes. Furthermore, several methods have been developed to identify and integrate protein networks along with gene expression or other datasets such as disease-gene association to identify the functional activity of modules in different disease conditions [22][23][24][25]. Finally, in [26] the algorithm ClusterONE has been developed to identify overlapping nodes in modules in protein networks. These examples demonstrate that any systemsbased analysis on the genomic level is incomplete without a network understanding of interactions on the molecular level.
Our study has four major objectives. The first objective of our study is to compare community detection algorithms for benchmark networks as well as 10 protein interaction networks. Second, we provide an in depth analysis of the biological meaning of the predicted networks across a variety of different biological aspects. Third, due to the fact that all PINs are inferred from experimental data they carry a certain uncertainty with respect to the correctness of the inferred interactions. For this reason, we are performing a robustness analysis of the predicted modules by perturbing the PINs by edge deletions. Finally, we investigate overlapping pathways that may form functional bridges between more specialized modules.
For the community detection analysis, we are using the 5 most popular module detection algorithms, fast-greedy [27], walktrap [28], label propagation [29], spinglass [30] and multi-level community [31], that have been developed for application to large networks and propose in Briefly, for our approaches, we assign weights to each pair of nodes depending on the distance between them in the network and utilize this for the module prediction. This provides competitive modularity measures for artificial and biological networks in comparison to other community detection algorithms. The details about all measure will be given in the Methods section. Typically, for large real networks there is only limited information available about the true module structure within these networks because of our lack of understanding of the underlying phenomena. However, for protein networks we can make use of the Gene Ontology (GO) database [32], which provides a comprehensive overview of thousands of biological processes in a variety of different organisms. Utilizing this information allows a biologically meaningfully assessment of the predicted modules. Specifically, in our analysis, we use protein networks of 10 different species to investigate the modularity predicted by the different community detection algorithms. This paper is organized as follows. In the next section, we describe all methods, measures and data sets used for our analysis, including a description of the protein interaction networks. In the Results section, we present our numerical findings and this paper finishes with the Conclusions section summarizing and discussing our results.

Modularity
The module detection algorithms studied in this paper, optimize the modularity in a network. The measure for Fig. 1 Normalized mutual information of different module detection algorithms for the benchmark networks the modularity has been introduced in [27,33] and is defined as follows.
where e ij is a fraction of edges between communities i and j, A vw is the adjacency matrix element between v and w and a i is the fraction of edges which is connected to the nodes in community i, i.e., Here k v is a degree of node v ∈ i .

Fast-greedy algorithm
This method was proposed in [27]. The algorithm starts with the assumption that each individual node is an independent community and assigns modularity score, Q ij , to each pair of nodes, and a i for each community. The Q ij and a i are defined as follows: if i, j are connected; m is the total number of edges 0 o t h e r w i s e .
The algorithm starts by calculating Q ij . Then it merges the two communities for which Q ij is largest. After that, it updates Q and a i for each community and repeats all steps until all communities are merged into one community. When two communities, i and j are merged the Q is updated as follows:

Walktrap algorithm
This method was proposed in [28]. The algorithm starts with the assumption that if two vertices, i and j, are in same community, then the random walk of length t from i and j to the nodes of other communities would be similar, P t ik ∼ P t jk . The random walk starting at vertex i to j through a path of length t is described as follows: where d(i) is the degree of vertex i.
In the first step of the algorithm, all nodes are considered as individual communities. In the second step, the two closest communities are merged based on the distance between them, and the community structure is updated. Then the second step is repeated until all communities are merged into one community.
The distance between communities is calculated as follows. Suppose there are C = C 1 , C 2 . . . C k communities in the network.    Fig. 3 Distribution of the size of modules detected in PPI networks by different module detection algorithms σ k is a mean square distance between two communities. The r is defined as follows:

Label propagation algorithm
This method was proposed in [29]. In this approach, a node x chooses to community to which the maximum numbers of its neighbours belong to. There are following steps to identify communities in the network.
1. Assign a unique label to each node. 2. Order nodes randomly. 3. label the selected node with the same label which is in maximum number in its neighbourhood. 4. If all the nodes have the same label, which is in maximum number in their neighbourhood, then stop the algorithm, otherwise repeat step 3.

Spinglass community algorithm
This method was proposed in [30]. In this approach the community detection is mapped to finding the ground state of an infinite ranged Potts spin glass model, by combining the information from both present and missing links, where the clusters are represented as the number of occupied spin states. In the Spinglass algorithm, existing edges within a community and non-existing edges between communities are rewarded while the edges which are not present in the community and edges between communities are penalized.

Multi-level community algorithm
This method was proposed in [31]. This algorithm is divided into two phases. In the first phase, all nodes are considered as independent communities. Then communities are merged into a larger community if the modularity of the network increase. The first phase is stopped if there is no further increase in the modularity. In the second phase each community is represented in the form of a node and edges between and within communities are replaced by weighted-edges. The number of edges between two nodes (communities) are replaced by a single weighted edge and all the edges in a community are replaced by a self-connecting weighted edge. After the construction of a new weighted network, first phase is repeated to obtain an improvement in modularity. These two phases are iterated until there is no further improvement in the modularity of the network.

Correlation based hierarchical clustering
In this approach, we start with the assumption that if two nodes are the part of same community then their shortest path distance to all other nodes are positively correlated. We first calculate the shortest path distance, S(G) for a graph G, between all pairs of nodes an calculate correlation between each pair of nodes. Here we provide some shortest path based measures to calculate correlation between pairs of nodes. Let S(G) is the shortest path distance matrix, and the correlation matrix is ρ(S(G)), then the distance between each pair of node is described as follows: The second measure for correlation is described as follows: Let A and S(G) are adjacency and shortest distance matrix for a graph G, then the weight matrix of pairs of nodes.
M is an asymmetric weight matrix where each row represent nodes and columns represent weights between each other. If the nodes are from same community then their weights w.r.t other nodes are strongly correlated. The distance matrix is defined as follows: We use these two different distance measures for hierarchical clustering (ward algorithm). To get an optimal Fig. 4 Scatter plot between the number of modules and the modularity. Each method is color coded by a different color. The shown curves correspond to Least Squares regression models. For A*SP Pearson, no statistically significant model could be fit that would be different from a horizontal line number of cluster we use modularity measure by newman [27] described in the "Modularity" section.

Data
In the Results section, we first analyze the performance of the community detection algorithms with artificially generated benchmark networks, and then we study protein interaction networks of different species. A description of these networks is provided in the following subsections.

Benchmark networks
The benchmark networks are generated by an algorithm proposed by [34]. It has been introduced with the purpose to generate benchmark networks for testing module detection algorithms. The generation of the network proceeds along the following steps.
(1) The degree, d, of each node is randomly assigned from the power law distribution with exponent γ , in our case it is 1. The degree distribution is assigned depending on the maximum degree d max = {20, 40} and the average degree, d avg = 10, selected as an input.
(2) Nodes are assigned a fraction of edges, μ, that are shared with nodes of other communities and the remaining fraction, 1 − μ, is shared within the community. (3) A community-size k min and k max is assigned in a following way, where k min > d min and k max > d max so that each node can be assigned to a community. The community size is decided based on the power law distribution so that the sum of the nodes in all communities is equal to the number on nodes in the network.
(4) First, nodes are not assigned to any community and than nodes are assigned randomly to a community if the community-size exceeds the number of neighbours of the node in the community. This step is repeated until all nodes are assigned to a community. (5) In order to ensure that each node has a right approximation of μ and 1 − μ for external and internal edges several rewiring steps are iterated.

Protein interaction networks
The protein interaction networks (PINs) we use for our analysis are obtained from Biogrid database [35]. In total, we use 10 PINs from 10 different species. The details are described in Table 1. These networks are pre-processed using the R package igraph [36] by extracting the giant connected components (GCC) of the networks.
As one can see in Table 1 these biological networks show a large variety in the network parameters such as number of nodes and number of edges.

Normalized mutual information (NMI)
In order to assess the predicted modules of the algorithms qualitatively, we use the normalized mutual information (NMI) [37][38][39].
The normalized mutual information is defined as follows. Suppose we have two community detection algorithms, U and V and they predict |R| and |C| communities in a network. The overlap between the two predicted communities is shown in the contingency Table 2, i.e., community U 2 and V 1 share n 21 nodes. Then the NMI [37][38][39] is calculated as follows.

Benchmark networks
We start our analysis investigating the performance of community detection algorithms by application to bench- mark networks. The benchmark networks are generated by an algorithm [34], as described in the Methods section, that result in networks with a predefined modularity structure. Hence, it is know that the networks have a module structure and can be used as a reference to quantify the performance of the community detection algorithms in an objective manner.
In the following, we study various parameters of the benchmark algorithm to generate benchmark networks. Specifically, we set the network size to |V | = 1000 nodes, for the average degree of the vertices we use d As performance measure for assessing the predicted modules of the community detection algorithms we are using the normalized-mutual information (NMI); see Methods section. The NMI evaluates the comparison of the true communities and the predicted communities, as identified by the different algorithms. The distribution of NMI values for different community detection algorithms is shown in Fig. 1 Overall, the figure shows that as the mixing parameter, μ, increases the performance of all module detection algorithms deteriorates. Compared to all algorithms, the Label propagation algorithm underperforms throughout all values of μ and the Spinglass community algorithm performs better than all other algorithms, except for low values of the mixing parameter. This indicates that the Fig. 7 Bar plots of the number of pathways that are enriched in multiple modules. The numbers inside each bar correspond to the maximum number of modules to which pathways are enriched method has an optimal working point for intermediately connected modules, which is a counterintuitive behavior. Furthermore, our distance measure-based approaches, notably A*SP Pearson and A*SP Spearman, are showing in general a good performances, and compared to Fast greedy and Walktrap they show even a favourable performance.

Performance of module detection algorithms by adding random edges
In this analysis, we test the robustness of different module detection algorithms against noise by adding a certain percentage of edges randomly to the network. Specifically, in the first step we generate synthetic networks, G = (V , E), with N modules as described in section Benchmark networks. Then we add a certain fraction of random edges is a randomly chosen set of edges between vertices in V of the benchmark network G. We then compare the modularity of the modules predicted by different module detection algorithms in G with the modules in G. The main objective of this analysis is to test the robustness of the module detection algorithms with respect to the addition of random edges to the network. The results of the performance of different module detection algorithms are shown in Fig. 2. In this figure we generated plots between modularity and mixing parameter (μ). From this analysis we find that the modularity of the modules predicted by different algorithms decrease as the percentage of added edges increases. The decrease in modularity is larger when the mixing parameter is higher. However, a small fraction of added edges do not effect the modularity, which can be seen in Fig. 2a and b. From this analysis we find that the fast greedy and label-propagation algorithms are the worse performing algorithms, for higher values of the mixing parameter (μ) the label propagation performs worse and for lower mixing parameter (μ) the fast greedy performs worse compare to other algorithms. The spinglass algorithm performs best in all cases, the multilevel algorithm also performs better but for higher mixing parameter (μ) the walktrap community and the clustering algorithms show a slightly better performance than the multi-level algorithm.

Biological networks
Next, we extend our investigation to biological networks. Specifically, we use 10 PPI networks from different species. Details of these networks can be found in Table 1.

Modularity in PPI networks
First, we estimate the modularity, Q, and the number of modules in these PPI networks for the 9 community detection algorithms. The results of this analysis are shown in Tables 3 and 4 respectively.
The first observation we make is that the best performing algorithms are the Multilevel and the Spinglass community algorithms. Interestingly, for some organisms, e.g., Schizosaccharomyces pombe and Homo sapiens, the Label propagation algorithm almost fails entirely to detect communities. In contrast, Fast-greedy and Walktrap are also finding acceptable modularity values for the networks for which the Label propagation algorithm has problems. Among the distance-based measures, D M pearson , is the best performing method.
For the predicted number of modules, the Walktrap algorithm results in many more modules than any other method, whereas the remaining methods predict a comparable number of modules. For instance, for the PPI network of Homo Sapiens (9606), Walktrap predicts 38 times more modules than Fast-greedy and 163 times more modules than the Spinglas method. This is interesting because this is not beneficially reflected in the modularity values Q, see Table 3, in a way that this would lead to superior modularity values.
Aside from the number of predicted modules, it is important to know the size distribution of these, i.e., how many proteins belong to the corresponding modules. The distributions of the sizes of the modules for the studied organisms are shown in Fig. 3. Here one can see that there is a considerable variation among the methods. For instance, the variation of module sizes predicted by Walktrap are generally smaller for all organisms. This is understandable because the predicted number of modules is for this method by far the largest, which leads in general to rather small modules. In contrast, the variations for the correlation-based methods depend crucially on the organism. Overall, the largest variability is observed for the Label propagation algorithm.
Considering the agreement among different methods, the module structure of Candida albicans is least different and, hence, shows the highest level of consensus. For this organism, even Walktrap results in a moderate number of predicted modules, which is comparable to all other methods.
In Fig. 4 we combine the results from Tables 3 and 4 as a scatter plot between the number of modules and the modularity. For reasons of clarity, we show only results for four out of the nine methods because the other algorithms add nothing for the following discussion. The interesting observation is that Fast greedy displays a curious behavior because for an increasing number of predicted modules in the networks, the modularity decreases. In order to quantitatively confirm this observation we fit a polynomial regression of second order by the Least Squares method minimizing the residual sum of squares (RSS). For the linear and the quadratic term we obtain p-values of 0.0194 and 0.0211, which are significant for α = 0.05. This confirms our observation statistically. In contrast, Multilevel and Spinglass can be approximated by a linear regression model, with p-values of 10 −5 and 0.004.
Interestingly, the A*SP Pearson algorithm is somehow located between these models in the sense that the best linear fit would only use an intercept but no slope and Table 7 GO pathways which are enriched to more than one modules predicted by spinglass and multilevel community detectin algorithms that are common among 6 organisms (see Fig. 8 Fig. 4.

Comparison of algorithms
In order to investigate the similarity of the identified modules for different algorithms in detail, we use again the NMI measure. However, this time we use the NMI to compare the predicted community structure of one method with the predicted community structure of another method. In this way, the similarity of the predicted communities is assessed. In other words, this analysis will provide us with information about the consistency of results among different methods but does not allow to gain insights into the absolute quality of the predicted module structures, because the ground truth does not enter this analysis.
The results of this analysis are shown in the form of level plots of the NMI values between different community detection algorithms in Fig. 5. The color code of the NMI values goes from violet (low values) to blue (high values), see Fig. 5 for the different scales for the different organisms. In general, there is a good agreement among different methods, however, on a moderate level. For instance, for Drosophila melanogaster the NMI values are around ∼ 0.4. Similarly, for House mouse and Homo sapiens. In contrast, for Norway rat the NMI values for A*SP Spearman are succinctly lower than from all other algorithms. Also Label propagation stands out in a similar way for Plasmodium falciparum and yeast.
By looking at the scale of the NMI values, one can see that for Candida albicans the lower values of the scale assumes higher values than for all other organisms, ranging from 0.86 to 1.00. This indicates that the similarity among all community detection algorithms is for this PPI networks highest, confirming our observation in Fig. 3, where we have seen that the variation of the size of modules is for all methods similar and quite small. Finally, we want to note that, in general, the distance-based measures are showing a higher similarity among each other than to the other community detection algorithms.

Robustness of module detection regarding perturbations
Our next analysis investigates the robustness of the predicted modules for perturbed PPI networks. Specifically, we test how a module detection algorithm changes its performance if some interactions in a PPI network are randomly deleted. The rationale of our analysis is based on the assumption that biological networks, and the interactions they are made of, are not known with absolute certainty. Instead, some interactions present in our PPI networks may be false positives due to measurement errors. Since all PPI networks we are using are inferred from experimental data, we think this assumption is very reasonable.
In order to study the effect of false positive interactions, we generate 20 perturbed networks for each PPI network, G sub 1 , G sub 2 . . . G sub 20 , by deleting randomly 5 % of the edges in a PPI network. In order to make sure the the resulting networks are still connected, we remove only edges from nodes having a degree of D(v i , G) ≥ 2 and prevent removal of the last remaining edge. Then, we apply the community detection algorithms to the networks, G sub 1 , G sub 2 . . . G sub 20 , and compare the predicted modules with the results from the unperturbed PPI network by using the NMI.
The results of this analysis are shown in Fig. 6. The first observation we make is that in all but two cases the NMI values are considerably smaller than 1.00, indicating a large change in the predicted communities. One exception is the Label propagation algorithm for Saccharomyces cerevisiae and the other is for all algorithms but Label propagation and Spinglass for Candida albicans. For all other algorithms and the remaining organisms, the obtained NMI values are much smaller, with the lowest value observed   Affinity chromatography  104  12  11  42  70  7  11  23  4215   Two hybrid  19  14  18  20  88  15  27  25  995   Biochemical  n 9  15  13  10  58  31  24  22  604   Pull down  11  7  17  13  75  156  46  25  370 for Label propagation for Plasmodium falciparum. In general, compared to other methods and across the organisms, the most robust method appears to be Walktrap.
Overall, the results show that even a moderate change in a PPI network leads, usually, in quite large changes of the predicted module structure, regardless of the algorithm or the organism.

Biological meaning of predicted modules
As far, we focused on more technical aspects of predicted modules. Now we switch gears by investigating the biological meaning of these modules. We do this by using external information, not included in the network structure itself, for assessing the predicted modules. As source for this external information we are using the Gene Ontology (GO) database [32] that provides comprehensive information about the involvement of genes across many organisms in diverse biological processes.
Specifically, we performed an enrichment analysis of biological pathways obtained from the GO database for the modules detected by the community detection algorithms. In order to test the statistical significance of biological pathways, corresponding to an over-representation of genes from a particular biological process, we use Fisher's exact test. Since we are conducting 1000s of hypothesis tests, we need to apply a multiple testing correction. For this reason, we apply a conservative Bonferroni correction for a significance level of 0.001. The results of this analysis are shown in Table 5.
In the last column of this table, the total number of tested biological processes is shown as a reference. Overall, the Multilevel and Spinglass community detection algorithms have the largest number of enrichment biological pathways. But in general, these numbers are not too far apart from the remaining methods, with some exceptions. It is interesting to note that for Plasmodium falciparum (36,329) none of the algorithms predicts modules that contain at least one enriched pathway. The reason for this may be in the very small number of total pathways (51) tested for this organism.
In Table 6, we show the the percentage of enriched pathways. The highest percentage is observed for Arabidopsis Thaliana (3702), Saccharomyces cerevisiae (559,292) and drosophila milanogaster (7227) for different module detection algorithms. In contrast, Norway rat (10,116) leads to the least percentage ∼ 6 %.
The results in Tables 5 and 6 provide us with an overview of the enriched pathways, but they do not tell us if a pathway is enriched in just one predicted module or in several. This information is shown in Fig. 7. In this figure, we color-coded the number of pathways showing enrichment for multiple modules, ranging from 1 to 11 modules. The maximum number of modules is also shown as a number in the barplots, for each algorithm. From the shown results, we see that most pathways are only enriched in one module (red) indicating a biological specification of these modules. In general, the number of enriched pathways decreases with an increasing number of modules for all methods and across all organisms. These observations support the hypothesis that modules are used as functional units to carry out specific biological functions. In general, the modules predicted by the Walktrap community algorithm have a larger number of enriched pathways to multiple modules. Furthermore, the pathways of House mouse and Arabidopsis thaliana have a higher maximum number of pathways that are enriched for the maximum number of modules. The Label propagation algorithm predicts the lowest number of pathways enriched to multiple modules, except for  Arabidopsis thaliana, which is a potential indicator of a poor predictability of modules in PPI networks. Another interesting aspect to remark, is that the algorithms Multilevel and Spinglass, which predicted modules with the highest modularity, are having in general the largest number of enriched pathways to the maximum number of modules.
Next, we study the significant pathways that are common across different organisms. Specifically, in Fig. 8, we plot a distribution of common pathways. The Multilevel and Spinglasss have three and ten pathways respectively in common among 6 organisms; see Table 7. These processes are mostly involved in metabolic processes and cell communication. Other algorithms, except Label propogation, predict pathways common in four to five organisms, while Label propagation, has pathways that are common in only three organisms. Overall, the Walktrap community algorithm predicts the largest number of 287, pathways that are common in at least two modules.

Subnetwork analysis of Homo sapiens obtained from different experimental methods
We extend our investigation to the subnetworks of Homo sapiens. Specifically, we use the 4 largest connected Fig. 9 Bar plots of the number of pathways (CORUM complex) that are enriched in multiple modules of PPI subnetworks of Homo Sapiens from different experimental types. The numbers inside each bar correspond to the maximum number of modules to which pathways are enriched PPI sub-networks from different experimental methods. Details of these networks can be found in Table 8. We estimate the modularity, Q, and the number of modules in these PPI networks for the 9 community detection algorithms. The results of our analysis are shown in Tables 9 and 10 respectively. The modularity of the subnetwork obtained from Affinity chromatography technology showing a slightly higher modularity for fastgreedy, multilevel and spinglass algorithms. However, for other subnetworks the modularity is considerably higher compared to the complete PPI network of Homo Sapiens. The modularities of of subnetworks highlight the fact that different subnetowrks obtained from different experimental methods provide a mixture of different structural properties of the complete PPI network. The analysis also highlights the fact that multilevel and spinglass algorithms are consistently performing better than other algorithms and walktrap community predicts more number of modules compare to other algorithms. Also the clustering based algotrithms and label propogation algorithms which perform better in synthetic networks are showing lowest modularity. In the next step of the analysis we performed enrichment analysis of pathways obtained form the CORUM complex database [40]. The results of this analysis are shown in Tables 11 and 12. The percentage of enriched pathways of CORUM complex database are higher compare to the GO pathways for individual organisms except the subnetwork obtained from two hybrid experimental data. In the next step we predicted that if a pathway is enriched in just one predicted module or in several. This information is shown in Fig. 9. In this figure, the color-coded barplots show the number of pathways showing enrichment for multiple modules, ranging from 1 to 4 modules. In this analysis a large fraction of pathways are enriched to just one modules and a few pathways are enriched to two or three modules predicted by different module detection algorithms. A list of pathways which are enriched to more than one modules predicted by multilevel and spinglass algorithms are shown in Table 13.

Time complexity of the algorithms
Finally, we show results for the time complexity of the community detection algorithms. In Table 14 the run time in seconds for the analysis of the PPI networks are shown. Overall, the fastest algorithm is Label propagation that provided for all studied networks the quickest results, below one second. For all other methods, even when they are in general fast, there is at least one network that requires much more time. For instance, Fast-greedy is in general quite fast and comparable to Label propagation, but for the networks Saccharomyces cerevisiae (559,292) and Homo sapiens (9606) it takes over 463 respectively 2287 times longer than for Label propagation. A similar observation can be made for Walktrap.

Discussion and conclusion
In our analysis, we used 9 community detection algorithms to predict modules in PPI networks of 10 different organisms. Overall, our analysis provides a comprehensive understanding of the performance of large community detection algorithms. Also, our analysis highlights organism-specific differences of PPI networks and the biological meaning of the predicted modules.
Overall, from our analysis of these networks we found that the Spinglass, Multilevel and Fastgreedy algorithm preform in general much better than the other algorithms. Furthermore, the Multilevel and Fast greedy algorithm have, in addition, a good run time (see Table 14) that allows to obtain results for large networks within seconds. Interestingly, despite the fact that these three algorithms are performing better, there is no complete similarity among these algorithms in terms of the predicted modules, but the results are to a large extend methodspecific. Another interesting fact about the Multilevel and Spinglass community algorithm is that the number of modules and the modularity are linearly correlated, while  the performance of Fast greedy decreases as the number of modules increases (see Fig. 4). At this point it is unclear which behavior reflects the modularity vs number of modules dependency best for biological organisms.
However, it appears reasonable to assume that there is a limiting factor in the growth of modularity of biological networks, which would suggest that the behavior of Fast greedy is a reflection of biological properties of the networks rather than a technical property or a bias of the method. Although, we studied extensively the performance of modules in biological networks and found high modularity for some organisms, still, for some organisms, such as Homo Sapiens and Saccharomyces cerevisiae, we find a low modularity. This is especially surprising for Homo Sapiens. One reason for the low modularity in these networks could be the existence of many overlapping nodes between communities giving raise to overlapping modules and pathways. Therefore, the standard non-overlapping community prediction methods may not be optimally suitable for detecting communities in such organisms. This would suggest that more effort needs to be placed on the development of such algorithms, because only in this way one could shed light on the nature of the overlapping modular structure of PPI networks. Another explanation could be that the PPI networks contain incomplete information. One reason for this argument is because the highest modularity is predicted by the Spinglass algorithm for Arabidopsis Thaliana (3702), which is a less complex organism, and for this reason is easier to study. Also the modularity of Arabidopsis Thaliana (3702) is constantly predicted higher by all other algorithms.
By studying the biological meaning of predicted modules, we found that a large proportion of pathways is enriched in only a single module, in all organisms and for all algorithms. This underlines the role of biological pathways as part of a special functioning component in an organism. However, a small set of biological pathways is enriched in more than one module, and an even smaller proportion of pathways is commonly enriched to multiple modules in all organisms. In general the classification of these pathways can broadly be grouped into the following categories: • Pathways which are part of a single module only across many organisms. • Pathways which are part of multiple modules across many organisms. • Pathways which are part of a single module and a single organisms. • Pathways which are part of multiple modules and a single organisms.
It would be interesting to see what biological processes they contribute to and what role they play in different organisms in order to see changes in an evolutionary perspective or the emergence of a higher level of functioning in different organisms.
In summary, the identification of modules in networks is a very complex problem and more work needs to be done. A potential future direction could be to extend the analysis for identifying communities with overlapping proteins/genes. This would be a major step forward because it would require the inclusion of the hierarchy among the modules and as such, require fundamentally different algorithms.