Identification of functional hubs and modules by converting interactome networks into hierarchical ordering of proteins
© Cho et al. 2010
Published: 29 April 2010
Skip to main content
© Cho et al. 2010
Published: 29 April 2010
Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functional organizations and pathways. However, they have been challenged because ofcomplex connectivity. It has been investigated that protein interaction networks are typically characterized by intrinsic topological features: high modularity and hub-oriented structure. Elucidating the structural roles of modules and hubs is a critical step in complex interactome network analysis.
We propose a novel approach to convert the complex structure of an interactome network into hierarchical ordering of proteins. This algorithm measures functional similarity between proteins based on the path strength model, and reveals a hub-oriented tree structure hidden in the complex network. We score hub confidence and identify functional modules in the tree structure of proteins, retrieved by our algorithm. Our experimental results in the yeast protein interactome network demonstrate that the selected hubs are essential proteins for performing functions. In network topology, they have a role in bridging different functional modules. Furthermore, our approach has high accuracy in identifying functional modules hierarchically distributed.
Decomposing, converting, and synthesizing complex interaction networks are fundamental tasks for modeling their structural behaviors. In this study, we systematically analyzed complex interactome network structures for retrievingfunctional information. Unlike previous hierarchical clustering methods, this approach dynamically explores the hierarchical structure of proteins in a global view. It is well-applicable to the interactome networks in high-level organisms because of its efficiency and scalability.
Recent high-throughput experimental techniques, such as yeast two-hybrid system  and mass spectrometry , have made remarkable advances in identifying protein-protein interactions on a genome-wide scale. Since the evidence of protein-protein interactions provides insights into the underlying mechanisms of biological processes within a cell, the availability of a large amount interaction data has introduced a new paradigm towards functional characterization of proteins on a system level.
A protein interactome network is structured by the set of genome-wide protein-protein interactions determined in each organism. A wide range of computational approaches [3–6] have attempted to analyze the interaction networks effectively for the purpose of predicting protein function or detecting functional modules. However, unraveling the complex connectivity has been a critical challenge. The false positive interactions, which typically appear in high-throughput experimental data, and functionally inconsistent interacting pairs  have reinforced the complexity. Thus, refining the noisy data and restructuring the complex network into a well-organized data format should be crucial pre-processes to enhance the network analysis.
In recent years, it has been investigated that protein interaction networks are characterized by intrinsic features , such as high modularity and hub-oriented structure. A network comprises a collection of functional modules that are interpreted as sets of proteins participating in the same function . In general, a module is considered as a sub-graph whose nodes are densely connected with each other and sparsely connected with the others. Density-based clustering methods have been proposed to seek densely connected sub-graphs using various density functions [10–13]. However, they are not able to capture the global patterns of functional organizations from protein interaction networks. Functional modules are typically organized in a recursive manner such that a module includes one or more sub-modules having more specific functions. Hierarchical clustering methods have thus been applied to the networks for finding functional organizations[14–17]. The bottom-up approaches iteratively merge nodes or sub-networks, whereas the top-down approaches recursively divide the network into sub-networks. However, as a critical drawback, they are typically sensitive to complex connectivity and noisy data.
Hubs in a scale-free network  play a central role in characterizing its structure. Intramodule hubs ('party' hubs) have high connectivity to the members in a module, and intermodule hubs ('date' hubs) bridge different modules . Previous studies have observed that such hubs in protein interaction networks are essential in terms of functionality [19–22] and, in particular, intramodule hubs have low evolutionary rates [23, 24]. The concepts of modules and hubs, extending from specific (local) to general (global), suggest the potential structure of a hierarchy that might be hidden in complex interaction networks. How can we then effectively extract the hierarchical structure of proteins from the complex network to reveal the global picture of functional organizations?
In this study, we present a novel method for restructuring a complex interactome network into a hierarchical data format in order to reveal functional hubs and organizations. Our algorithm uses a weighted interaction network as an input. Because the network includes a significant number of false positive connections, the reliability or intensity of interactions should be assessed and assigned into the edges as weights. For network restructuring, we design a path strength model which proposes the quantification of functional similarity between two proteins. The interactome network having complex connectivity is then dynamically converted into a hub-oriented tree structure by the definition of path-strength-based centrality. From the hierarchical structure, we score hub confidence for each node, and generate hierarchically organized clusters of proteins. Unlike degree as a local significance measure, the hub confidence estimates the global significance of nodes. It is thus capable of selecting hubs that are located in critical positions of the network. The experimental results demonstrate that the hubs with high confidence are essential for performing functions. In network topology, they mostly bridge different functional modules. Furthermore, our approach has higher accuracy in identifying functional modules than other hierarchical clustering methods.
The path strength of a path p thus has a positive relationship with the weights of the edges on p, and a negative relationship with the weighted degrees of the nodes on p. Formula 2 also implies that the path strength has an inverse relationship with the length of p because the weighted probability, w i(i=1) / d wt (v i ), is in the range between 0 and 1, inclusive. As the length of p increases, the product of the weighted probability decreases monotonically. In the same manner, as the average degree of the nodes on p increases, the path strength of p is likely to decrease.
Since any node pair selected in a small world network  are directly or indirectly connected with a relatively small path length, the maximum path length between them is typically limited. However, Formula 3 still has a computational problem when it enumerates all possible paths between a and b. To solve the computational complexity, we restrict the maximal boundary of path length.
where l ≤ k ≤ l + θ and l is the shortest path length between a and b. Based on the assumption that edge weights represent the likelihood of functional linkage of interacting protein pairs, Formula 5 measures the potential of functional association between two proteins, directly or indirectly connected in a protein interactome network.
Selecting a parent node for each node by Formula 8 then efficiently constructs a hierarchical tree structure. The node having the highest centrality among all the nodes in the network has no parent and becomes the root node. This hierarchical structure is dynamically converted on network growth, depending on the distribution of the path-strength-based centrality of nodes.
The hub confidence in Formula 11 quantifies how likely the node is to be a structural hub. Since an edge weight represents the functional consistency between two ending nodes, the structural hubs have a significant role in not only maintaining topology but also functionality.
We finally generate clusters as functional modules from the tree structure. We iteratively select a structural hub a with the highest hub confidence score and output L a as a cluster until the hub confidence of the selected node a reaches a user-specified threshold. The clusters are hierarchically arranged based on the positions of their hubs in the tree structure.
Currently, genome-wide protein-protein interaction data of several model organisms are publicly available in a number ofopen databases, for example, BioGRID , MIPS , DIP , MINT  and IntAct . They have been mostly generated by high-throughput methods. However, because of unreliability of the high-throughput experimental data, we tested our algorithm using the core protein-protein interaction data of Saccharomyces cerevisiae from DIP, which were curated by other biological information such as protein sequences and expression profiles. They include total 2526 distinct proteins and 5725 interactions between them.
Next, we applied gene co-expression profiles for interacting proteins. The gene expression data were obtained from SMD , and the coherence of expressions was calculated by the Pearson coefficient. Finally, we adopted annotations in the GO  database. The semantic similarity measure  was used to compute the functional similarity of each pair of interacting proteins.
We implemented the conversion of the weighted interaction network to a hierarchical tree structure by Formula 8. We then identified the structural hub proteins based on their hub confidence scores in Formula 11. To make topological assessment of the structural hubs, we tested network vulnerability on random and hub attacks. It has been known that typical scale-free networks are robust on random attacks, but vulnerable on targeted attacks to the hubs. For this experiment, we observed the fractions of the largest component when we repeatedly disrupted a randomly selected node, a hub with the highest degree and a structural hub with the highest hub confidence score, respectively.
Overall, a protein interaction network is more vulnerable on structural hub attacks than random attacks. It is noticeable that the hub confidence measure is effective at selecting topologically significant hub proteins in complex networks. In general, hub confidence has a positive relationship with node degree. However, some low-degree structural hubs with high hub confidence can be detected by our algorithm. Whereas degree is a factor for local significance of nodes in network topology, the hub confidence formula measures the global significance of nodes to select hubs in the hierarchical structure.
We implemented clustering of proteins using the tree structure converted from a protein interaction network, and inspected whether the output clusters are likely to be functional modules. Modularity of a sub-network has been commonly estimated by the ratio of the number of edges within the sub-network to the number of all edges starting from the nodes in the sub-network. However, in this estimation, the modularity depends on the number of nodes in the sub-network. For example, suppose a network G has 500 nodes. Sub-networks G′ and G″ of G consist of 10 and 100 nodes, respectively. A node in G″ has a higher probability having links to the nodes within the same sub-network (intraconnections) and a lower probability having links to the nodes outside of the sub-network (interconnections), comparing to a node in G′. We thus normalized the formula of modularity by the probability of a node in the sub-network being linked to the members in the same sub-network.
Clustering performance comparison by f — measure
Decomposing, converting and synthesizing complex systems are fundamental tasks for modeling their structural behavior. Recently, such approaches in protein interaction networks has been widely attempted to understand biological processes and functional organizations within a cell. We have studied the methodology for converting a protein interactome network into an effective structure for the purpose of functional knowledge discovery. For this task, we designed the path strength model and exploited the novel concept of centrality. The generated hierarchical tree structure can be applied to selecting functionally essential hub proteins and identifying functional modules. Unlike other hierarchical clustering methods, our approach dynamically explores the entire hierarchical structure of proteins in a global view. All the individual parent-child relationships between proteins in the hierarchy are meaningful and comparable. The performance of our approach can be more improved by developing the advanced methods, which efficiently integrate a massive amount of current heterogeneous biological data and accurately analyze the reliability of functional associations between interacting proteins.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 3, 2010: Selected articles from the 2009 IEEE International Conference on Bioinformatics and Biomedicine. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S3.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.