Functional module detection by functional flow pattern mining in protein interaction networks


A functional module has been defined as a group of molecules that participate in the same functional activities. Various graph-theoretic or data-mining techniques have been applied to discover functional modules from protein interaction networks [1]. However, their performance has been compromised by false-positive and false-negative interaction data and complex connectivity of the interaction networks. In our earlier study [2], we have introduced the functional flow-based approach to efficiently identify overlapping modules, which are generally large-sized, from interaction networks. In this abstract, we extend this approach by mining functional flow patterns for the purpose of detecting small-sized modules for specific functions.


Our approach includes three steps. First, we integrate the interaction network with semantic data from Gene Ontology [3] to generate a weighted interaction network, which is functionally reliable. Next, we simulate functional flow starting from selected informative proteins and identify primary modules for general-level functions [2]. As the last step, we obtain the set of functional flow patterns for each primary module by flow simulation from all nodes within the module. A functional flow pattern is defined as a sequence of quantities of functional influence of a source protein on target proteins. The coherent patterns are then captured by a pattern-based clustering algorithm [4] as final modules for specific-level functions. The significant assumption is that if two source proteins have similar functional flow patterns across all the other targets proteins, then they are likely to have the same function.


We tested our flow-pattern clustering method using a sub-network, structured by the proteins having functions on Cell Cycle and DNA Processing and the interactions between them. The output modules were compared to the functional categories and their annotations from MIPS [5] using statistical p-value analysis (see Table 1). We assessed the performance of our algorithm comparing to two competing methods: the clique percolation method [6] as a density-based approach to find densely connected sub-graphs, and the betweenness-cut method [7] as a hierarchical approach to iteratively separate a graph and find the best partition. As a result, our algorithm had higher accuracy than the others by approximately 20% (see Table 2).

Table 1
Table 2


The modules, identified from protein interaction networks, provide an understanding of functional associations among proteins. In this study, we introduced a framework to detect functional modules in protein interaction networks. We demonstrated that our approach accurately handles the erroneous and complex networks.


