SigFlux: A novel network feature to evaluate the importance of proteins in signal transduction networks
© Liu et al; licensee BioMed Central Ltd. 2006
Received: 23 May 2006
Accepted: 27 November 2006
Published: 27 November 2006
Measuring each protein's importance in signaling networks helps to identify the crucial proteins in a cellular process, find the fragile portion of the biology system and further assist for disease therapy. However, there are relatively few methods to evaluate the importance of proteins in signaling networks.
We developed a novel network feature to evaluate the importance of proteins in signal transduction networks, that we call SigFlux, based on the concept of minimal path sets (MPSs). An MPS is a minimal set of nodes that can perform the signal propagation from ligands to target genes or feedback loops. We define SigFlux as the number of MPSs in which each protein is involved. We applied this network feature to the large signal transduction network in the hippocampal CA1 neuron of mice. Significant correlations were simultaneously observed between SigFlux and both the essentiality and evolutionary rate of genes. Compared with another commonly used network feature, connectivity, SigFlux has similar or better ability as connectivity to reflect a protein's essentiality. Further classification according to protein function demonstrates that high SigFlux, low connectivity proteins are abundant in receptors and transcriptional factors, indicating that SigFlux candescribe the importance of proteins within the context of the entire network.
SigFlux is a useful network feature in signal transduction networks that allows the prediction of the essentiality and conservation of proteins. With this novel network feature, proteins that participate in more pathways or feedback loops within a signaling network are proved far more likely to be essential and conserved during evolution than their counterparts.
Structural analysis of signal transduction networks can provide insight into the function and evolution of the cellular networks. A proper network feature to evaluate the importance of each protein in signaling networks helps to identify the crucial proteins in a cellular process and further provides us with a better understanding of complex diseases and a guiding principle for therapy design. Relatively few methods [1–4] have been proposed so far to analyze the structure of signaling networks. Particularly, a software tool CellNetAnalyzer was developed in  to compute feedback cycles and all the signaling paths between any pair of nodes, but these cannot be used to evaluate the importance of each protein in signaling networks.
In another front, the structural analysis of metabolic networks has been well studied, although these studies have been scarcely applied to signaling networks. The method to evaluate the importance of enzymes in metabolic networks is based on the concept of elementary flux modes (EMs) [5, 6]. EMs are minimal sets of enzymes that can operate at steady state. The number of elementary modes in which an enzyme is involved assesses the importance of the enzyme . Elementary mode analysis appears to be well-suited to characterize network properties because each elementary mode is non-redundant. However, the algorithm for EM calculation cannot be used to signal transduction networks directly because the computation of elementary modes in a given network requires the stoichiometric matrix and the reversibilities of the reactions. While in large signaling networks, the construction of precise quantitative models is practically infeasible due to the huge amount of required but generally unavailable kinetic parameters and concentration values [8, 9].
Connectivity  and the clustering coefficient  are two well-known topological characteristics describing the importance of a protein in protein interaction networks. Connectivity of a node is the number of its interacting partners, and the clustering coefficient defines the cliquishness of each node. Proteins with high connectivity and clustering coefficient tend to be essential in protein interaction networks [10, 12]. However, it is unknown whether these two characteristics are also suited to measure the importance of proteins in signaling networks.
In this paper, we introduce a concept of minimal path sets (MPSs) to measure the importance of proteins in signaling networks. An MPS in signal transduction networks can be considered as a minimal set of proteins functioning together to perform signal propagation. MPSs are inherent and uniquely determined structural features of signaling networks similar to EMs known from metabolic networks. The conceptual properties of MPSs offer a number of potential applications both for obtaining a deep understanding of structural properties of cellular networks as well as for finding targets that efficiently activate or inhibit cellular functions. Based on MPSs, we further propose a network feature, which we call SigFlux, to assess the importance of each protein. We examined the usefulness of SigFlux for assessing the importance of proteins in the signaling network of the mouse hippocampal CA1 neuron  using mutant phenotype and the evolutionary rate of mouse genes. We compared the performance of SigFlux with two other network features, connectivity and the clustering coefficient.
A novel network feature reflecting a protein's importance
Similar to the definition of elementary flux modes in metabolic pathway analysis, an MPS in signal transduction networks refers to a minimal set of proteins that can propagate the signal from input to output, while regarding the extracellular ligand as input and the finally regulated gene in the nucleus as output. In addition, feedback loops, which were suggested in  to have a specific biological function, widely exist in signaling networks and can be regarded as another type of MPSs.
One important application of EMs in metabolic networks is to evaluate the importance of one or a set of enzymes. Similarly, MPSs facilitate the assessment of the importance of each protein in a signaling network. In this paper, we develop a C++ program for computing MPSs by generating all the paths between input and output using the breadth-first search method . Feedback loops in the networks are identified using MFinder [16, 17] and also counted as MPSs. Because feed-forward loops have already been counted in the computation of paths between the input layer and the output layer, there is no need to count them separately.
We propose the following SigFlux measurement to assess the importance of each protein in signaling networks. For each protein, we define SigFlux to be proportional to the number of all MPSs in which the protein is involved. More precisely, SigFlux of protein i is defined as
where m pi denotes the number of signaling paths from input to output in which protein i is involved, m fi denotes the number of feedback loops including protein i, and n is the number of all proteins in the network. The more MPSs protein i is involved in, the more important protein i is for the signaling network. The value of SigFlux varies between 0 and 1. The extreme value of zero occurs when protein i is not a member of any MPS, and the extreme value of one is assigned to the most important protein in the network, i.e. the protein whose removal causes a disruption in the topological structure of the network.
To demonstrate the performance of SigFlux, we studied the signaling network in the hippocampal CA1 neuron of a mouse . We collected the mutant phenotype and evolutionary rate of mouse genes to evaluate SigFlux [see Additional file 1].
SigFlux is significantly correlated with a protein's essentiality
The functional significance of a gene is elementarily defined by its essentiality . In simple terms, an essential gene is one whose removal renders the cell unviable. An effective experiment for evaluating the importance of genes in a cell or body is to mutate their phenotypes. Thus, to assess SigFlux as a measure of the importance of genes, we first sought to find the correlation between SigFlux and the essentiality of genes.
We remark that although there was a high consistency between essentiality and SigFlux, some exceptions still exist. One case in point is that some proteins with an essential mutant phenotype cannot be involved in any MPS because these proteins have no global impact to the signaling network or the signaling network is not completely uncovered.
SigFlux may act as the indictor of protein evolutionary rate
[18, 19] reported that proteins with topological importance in networks and increased essentiality are likely to be conserved in evolution to preserve their functional stability. Therefore, proteins involved in more MPSs are supposed to evolve slower in principle since an MPS can be seen as the minimal function module. The relationship between SigFlux and evolutionary rate is investigated in this paper to validate this hypothesis.
Pearson correlation coefficients r (p-values) between three network features and the evolutionary rate of mouse genes.
Comparison of SigFlux with connectivity
Although we found that both SigFlux and connectivity were correlated with essentiality and evolutionary rate of mouse genes, they describe different topological features of signaling networks. The distribution of the connectivity across the nodes of the network has been used as a measure to characterize natural networks and was suggested to correlate with the importance of the protein. However, this is only valid if the immediate neighbors are the only ones determining the properties of a protein in the network. In contrast, SigFlux indicates how important the node is within the wider context of the entire network. Instead of its direct interactors, all correlative proteins between input layer and output layer are counted to determine a protein's importance in signaling networks. Thus, it seems reasonable to integrate SigFlux and connectivity to analyze the structural properties of signaling networks.
The definition of MPSs is in principle similar to EMs in metabolic networks, but there are significant differences between them. In metabolic networks we are particularly interested in the reactions (edges) because they respond to enzymes that are subject to regulatory processes and can be knocked-out in experiments . In contrast, in signaling networks we usually focus on the nodes since they are often knocked-out in experiments or medical treatment. Whereas an edge in signaling networks represents mostly a direct interaction between a pair of nodes with no mediator. An MPS in signaling networks is a set of proteins functioning together instead of enzymes in EM. From this point of view, MPS and EM methods have very different biological interpretations, although their mathematical definitions are similar to each other.
In this paper we introduce a novel network feature SigFlux in signal transduction networks on the basis of the concept of minimal path sets. We found a significant correlation between SigFlux and the essentiality, as well as between SigFlux and the evolutionary rate of genes. These correlations held true for connectivity but not for the clustering coefficient. The comparison between SigFlux and connectivity implies that SigFlux and connectivity may both be useful features to measure the importance of proteins, although they describe the different topological properties of signaling networks. Further classification according to proteins' function demonstrated that HSLC proteins are abundant in receptors and transcriptional factors. This means that SigFlux may indicate some important proteins located at the connected regions of high clustering, even though they have low connectivity.
While in this paper we focused on signaling networks, the methods can be easily applied to any kind of interaction networks, such as gene regulatory networks. The final aim for analysis of network structural properties is to give some clues to better understand the function of the biological network. Insights into inherent properties of biological systems will provide us with a better understanding of complex diseases and a guiding principle for therapy design.
Signaling network in the hippocampal CA1 neuron of a mouse
We downloaded the signaling network in the hippocampal CA1 neuron of a mouse from the supplemental material of , including 608 biological molecules and 1427 interactions between them.
In this signaling network, three types of functional links are specified. Links may be activating, inhibitory or neutral. Some pathways activate a target gene while others inhibit it. For example, EGF (epidermal growth factor) [9, 21] activates cell growth, survival or differentiation through some pathways and inhibits apoptosis through other pathways. Therefore, the networks can be separated and investigated respectively according to function as activators or inhibitors to a target node. But in the computation of the network feature we propose above, activating and inhibitory paths are not distinguished and both seen as MPSs.
The method to generate all the MPSs
Phenotype and evolutionary rate of mouse genes
Since there are rich data about mutant phenotype of mouse genes and evolutionary information between mouse and orthologous genes, we can easily download them from a public database, such as MGD  and the Ensembl Gene database .
After removing lipids, messengers and Ions etc. that have no corresponding genes from the set of nodes in the mouse signal transduction network, there are 549 proteins reserved for investigation. Of these, the mutant phenotype of 383 genes are found in MGD, including 34 genes with no obvious phenotype, 191 with viable phenotype and 158 with lethal phenotype. "Lethal" refers to perinatal lethal, postnatal lethal or embryonic lethal; "viable" phenotypes lead to abnormal response or illness but no death; "no obvious phenotype" are mice for which disruptions of this gene display a normal phenotype. There are still 166 proteins with genetic phenotypic information unavailable due to the lack of gene mutation experiments or data not stored in MGD. According to the mutant phenotype of mouse genes, the essentiality of proteins in signaling networks is grouped into 3 categories: no obvious, viable and lethal phenotypes.
To build a source of orthologous data, we browsed gene clusters compiled in the Ensembl Gene database, which provides evolutionary information of orthologous gene pairs including mouse and other eukaryotes. Using this, we downloaded 525 proteins' dN/dS  between Mus musculus and their orthologous gene pairs in Homo sapiens, Bos Taurus, Pan troglodytes, Macaca mulatta and Canis familiaris. dN/dS is defined as the nonsynonymous rate divided by the number of synonymous differences per synonymous site. Usually dS is an estimate of the neutral rate of molecular evolution. Then, investigating dN/dS may provide information about the degree of selection operating on a species. Therefore, average dN/dS between mouse genes and their orthologous ones in other five species can represent the evolutionary rate of mouse genes to some extent.
List of abbreviations
elementary flux mode(s)
minimal path set(s)
high SigFlux, low connectivity
We thank Songfeng Wu, Jiangqi Li and Xiushan Yin for excellent advice and assistance as well as all members in the bioinformatics lab of Beijing Proteome Research Center for a helpful discussion. This work is supported by the Chinese National Key Program of Basic Research (2001CB510209) and the National High Technology Research & Development Program of China (2002BA711A11 and 2004BA711A21).
- Papin JA, Hunter T, Palsson BO, Subramaniam S: Reconstruction of cellular signalling networks and analysis of their properties. Nature Reviews Molecular Cell Biology 2005, 6(2):99–111. 10.1038/nrm1570View ArticlePubMedGoogle Scholar
- Klamt S, Saez-Rodriguez J, Lindquist JA, Simeoni L, Gilles ED: A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinformatics 2006, 7: 56. 10.1186/1471-2105-7-56PubMed CentralView ArticlePubMedGoogle Scholar
- Papin JA, Palsson BO: The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophysical Journal 2004, 87(1):37–46. 10.1529/biophysj.103.029884PubMed CentralView ArticlePubMedGoogle Scholar
- Papin JA, Palsson BO: Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk. J Theor Biol 2004, 227(2):283–297. 10.1016/j.jtbi.2003.11.016View ArticlePubMedGoogle Scholar
- Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnol 2000, 18(3):326–232. 10.1038/73786View ArticleGoogle Scholar
- Schuster S, Hilgetag C: On elementary flux modes in biochemical reaction systems at steady state. J Biol Syst 1994, 2: 165–182. 10.1142/S0218339094000131View ArticleGoogle Scholar
- Oancea I: Topological analysis of metabolic and regulatory networks. Syst Biol 2004, 1: 1. 10.1049/sb:20049001View ArticleGoogle Scholar
- Gilman AG, Simon MI, Bourne HR: Overview of the Alliance for Cellular Signaling. Nature 2002, 420(6916):703–706. 10.1038/nature01304View ArticlePubMedGoogle Scholar
- Schoeberl B, Eichler-Jonsson C, Gilles ED, Muller G: Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat Biotechnol 2002, 20(4):370–375. 10.1038/nbt0402-370View ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411(6833):41–42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature 1998, 393(6684):440–442. 10.1038/30918View ArticlePubMedGoogle Scholar
- Yu H, Greenbaum D, Xin Lu H, Zhu X, Gerstein M: Genomic analysis of essentiality within protein networks. Trends Genet 2004, 20(6):227–231. 10.1016/j.tig.2004.04.008View ArticlePubMedGoogle Scholar
- Ma'ayan A, Jenkins SL, Neves S, Hasseldine A, Grace E, Dubin-Thaler B, Eungdamrong NJ, Weng G, Ram PT, Rice JJ, Kershenbaumm A, Stolovitzky GA, Blitzer RD, Iyengar R: Formation of regulatory patterns during signal propagation in a Mammalian cellular network. Science 2005, 309(5737):1078–1083. 10.1126/science.1108876PubMed CentralView ArticlePubMedGoogle Scholar
- Mangan S, Alon U: Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci USA 2003, 100(21):11980–11985. 10.1073/pnas.2133841100PubMed CentralView ArticlePubMedGoogle Scholar
- Gazit H, Miller GL: An improved parallel algorithm that computes the BFS numbering of a directed graph. Information Processing Letters 1988, 28(2):61–65. 10.1016/0020-0190(88)90164-0View ArticleGoogle Scholar
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks. Science 2002, 298(5594):824–827. 10.1126/science.298.5594.824View ArticlePubMedGoogle Scholar
- Kashtan N, Itzkovitz S, Milo R, Alon U: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 2004, 20(11):1746–1758. 10.1093/bioinformatics/bth163View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science 2002, 296(5568):750–752. 10.1126/science.1068696View ArticlePubMedGoogle Scholar
- Wuchty S: Evolution and topology in the yeast protein interaction network. Genome Res 2004, 14(7):1310–1314. 10.1101/gr.2300204PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Albert R: Emergence of scaling in random networks. Science 1999, 286(5439):509–512. 10.1126/science.286.5439.509View ArticlePubMedGoogle Scholar
- Wiley HS, Shvartsman SY, Lauffenburger DA: Computational modeling of the EGF-receptor system: a paradigm for systems biology. Trends in Cell Biology 2003, 13(1):43–50. 10.1016/S0962-8924(02)00009-0View ArticlePubMedGoogle Scholar
- Weng G, Bhalla US, Iyengar R: Complexity in biological signaling systems. Science 1999, 284(5411):92–96. 10.1126/science.284.5411.92PubMed CentralView ArticlePubMedGoogle Scholar
- Hill DP, Begley DA, Finger JH, Ringwald M: The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res 2004, (32 Database):D568–571. 10.1093/nar/gkh069
- Birney E, Andrews D, Caccamo M, Hubbard TJ: Ensembl 2006. Nucleic Acids Res 2006, (34 Database):D556–561. 10.1093/nar/gkj133
- Rocha EP, Smith JM, Smith NH, Feil EJ: Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol 2006, 239(2):226–235. 10.1016/j.jtbi.2005.08.037View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.