Volume 16 Supplement 17
Selected articles from the Fourth IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2014): Bioinformatics
Signal reachability facilitates characterization of probabilistic signaling networks
 Haitham Gabr^{1}Email author and
 Tamer Kahveci^{1}
https://doi.org/10.1186/1471210516S17S6
© Gabr and Kahveci; 2015
Published: 7 December 2015
Abstract
Background
Studying biological networks is of extreme importance in understanding cellular functions. These networks model interactions between molecules in each cell. A large volume of research has been done to uncover different characteristics of biological networks, such as largescale organization, node centrality and network robustness. Nevertheless, the vast majority of research done in this area assume that biological networks have deterministic topologies. Biological interactions are however probabilistic events that may or may not appear at different cells or even in the same cell at different times.
Results
In this paper, we present novel methods for characterizing probabilistic signaling networks. Our methods do this by computing the probability that a signal propagates successfully from receptor to reporter genes through interactions in the network. We characterize such networks with respect to (i) centrality of individual nodes, (ii) stability of the entire network, and (iii) important functions served by the network. We use these methods to characterize major H. sapiens signaling networks including Wnt, ErbB and MAPK.
Keywords
probabilistic networks signaling reachabilityBackground
Studying the structure and functions of individual biological molecules such as genes and proteins has led to major discoveries in molecular biology. However, in order to understand how cells function and respond to internal or external factors, it is crucial to extend our understanding beyond individual molecules to how these molecules collaborate through interactions. These interactions among molecules are often modeled as biological networks, in which nodes represent molecules and edges represent the interactions.
Signaling networks constitute one of the key classes of biological networks. These networks model how extracellular signals propagate inside the cell leading to designated responses. A signal starts at a receptor protein, typically located at the membrane. It propagates through a series of interactions between intermediate proteins and reaches to a reporter protein, typically a transcription factor. Analysis of these networks is of great importance, since their defects cause many disorders such as type2 diabetes, Alzheimer, neurodegeneration, cancer, obesity, congenital malformations and osteoporosis [1–3].
A fundamental strategy in analyzing biological networks is to characterize their topological features computationally. There has been a plethora of studies to model a multitude of characteristics of biological networks. Among them, degree [4–6] or joint degree distribution [7], node centrality [8–10], network robustness [11, 12] are just a few examples. We elaborate on the literature later in this section.
One inherent and key feature of biological networks that is often overlooked in the literature while characterizing them is that their topology is uncertain. Signaling networks share the same uncertainty. There are numerous reasons for such uncertainty. One of them arises from an inherent characteristic of the DNA replication process, that this process can initiate at many different locations on the chromosome with varying probabilities [13]. Furthermore, the set of initiation locations can vary across cells of even the same type. Recent studies have demonstrated that the replication timing as well as other epigenetic factors can alter the expression of the genes [14] and thus the probability of cisacting and transacting interactions taking place in the cell. Therefore, even the most putative signaling networks are better off studied in an uncertain model.
Uncertainty of each interaction is often modeled as a probability value which shows the confidence in its presence [15]. These probability values can be obtained through some interaction databases, like STRING [16] and MINT [17], or through other methods of interaction data quality assessment such as linear regression of various factors including transcriptome and network topology [18, 19]. In the rest of this paper, we call a network a probabilistic network if it contains at least one uncertain interaction. Otherwise, we call it a deteriministic network. We represent a probabilistic network as a graph G = (V, E, P ), where V denotes the set of nodes (i.e., proteins), E denotes the set of edges (i.e., interactions), and P : E →[0, 1] denotes a function that returns the existence probability of each edge in E.
The vast majority of literature about characterization of biological networks considers them as deterministic networks, and ignores the probabilistic nature of their underlying topologies. The massive volume of research done in this area cannot be entirely covered in a few pages. We refer the interested readers to an extensive review on the topic [20]. In the following, we summarize some of the key recent studies.
Studies on deterministic networks. Jeong et al. [9] studied node centrality in protein interaction networks. They showed that the protein interaction network from S. cerevisiae follows a scalefree topology. They also showed that the chances that removal of a protein will prove lethal is proportional to the number of interactions the protein takes part in.
Yook et al. [4] presented methods for characterization of protein interaction networks. They characterized networks from four different datasets. First, they showed that both degrees and clustering indices of the nodes, along with the average cluster size, all follow power law distribution. Second, they studied the relation between network topology and both node functional and localization classes. Last, they studied the relations among functional and localization classes.
Jeong et al. [5] investigated the largescale organization of metabolic networks from 43 different organisms. They showed that, in all studied organisms, the probability that a given substrate participates in k reactions (i.e. node degree) follows a power law distribution. Furthermore, when a randomly selected group of substrates are removed, the average distance among the remaining ones is not affected. This signifies a high level of robustness and low sensitivity to random perturbation.
Ravasz et al. [21] studied modularity in metabolic networks. They used the average clustering coefficient as a measure for modularity. They showed that metabolic networks follow a special hierarchical model where small modular subnetworks come together to form larger subnetworks, which in turn form larger subnetworks and so on. This model explains the scale free topology of metabolic networks, as well as their scaling clustering coefficients.
Kwon et al. [12] studied robustness in biological networks based on feedback dynamics. They showed that networks are likely to be more robust against perturbation if they have more positive feedback loops and fewer negative feedback loops. On the other hand, they also showed that nodes with large numbers of feedback loops are more essential to the network, and more lethal if mutated.
All these studies present valuable results about characteristics of various types of biological networks. However, they consider them having purely deterministic topologies. Hence, they fail to acknowledge and account for the probabilistic nature of biological events.
Studies on probabilistic networks. Relatively little research has been done for analysis and characterization of probabilistic biological networks. Network reliability of probabilistic networks has long been represented by special versions of the Tutte polynomial [22–24]. Such models represent the probability that an edge in the network fails. This facilitates the analysis of global characteristics of the network such as network connectivity. However, it does not facilitate the analysis of local characteristics, like node centrality.
Todor et al. [7] developed a novel method for characterizing the degree distribution of probabilistic biological networks. They used probability generating functions to model both degree distributions and joint degree distributions. They showed that power law and lognormal models are the best fit for degree distribution in probabilistic protein interaction networks. They also showed that, in such networks, nodes of high degrees are more likely to interact with nodes of low degrees. The method is specific to characterizing network degree distribution, with no results for network stability or individual node centrality.
We earlier developed an efficient method called PReach [25] for computing the reachability probability between sets of nodes in probabilistic signaling networks. We use this method in order to compute the reachability probability while implementing the methods described in this paper.
Contributions. In this paper, we test the hypothesis that the ability of signals to propagate between the proteins of a probabilistic signaling network determines the key characteristics of that network. The rationale behind this hypothesis is that through these signals, the proteins can regulate the expression levels of different genes and thus the cellular functions and responses. To test this hypothesis, we develop novel computational methods to characterize probabilistic signaling networks based on reachability probability. i.e., the probability that signals can successfully reach from its source nodes (i.e., receptors) to its target nodes (i.e. reporters). More specifically, we characterize signaling networks at three levels of granularity. At the lowest granularity level, we focus on individual proteins (i.e. nodes) of the underlying network. At the second level, we consider the entire network topology including all the proteins along with the interactions between them. At the highest level, we study the biological functions served by the network. Preliminary results for the three levels were published here [26]. We summarize each level next.
At the level of individual nodes, we investigate the centrality of each node in a probabilistic network. Node centrality is a metric that determines the relative importance of a node within the graph [9]. It is well defined in the scope of deterministic networks, with a number of variants including degree centrality and betweenness centrality [8]. However, it is not well defined in the scope of probabilistic networks. Here, we introduce a new node centrality measure that deals with the uncertainty of the network topology. This measure models the centrality of a node in terms of its contribution to the reachability probability between other nodes.
At the level of the entire network, we investigate the stability of the network with respect to the external factors that alter its topology or interaction probabilities on the ability of the network to carry out its functions. We say that a network is stable if small changes to its topology or edge probabilities do not cause massive changes in the probability that signals initiated at source nodes reach to target nodes. In other words, stability means that the network can continue to operate normally after such perturbations. We develop a new method for measuring stability of probabilistic signaling networks.
At the level of the biological functions we explore the set of functions that a given probabilistic signaling network performs. To do this, we use the Gene Ontology (GO) [27] term annotations of the source and target nodes of the given network. We develop two methods to model two orthogonal characteristics. The first one finds the most popular GO terms (i.e. the GO terms that are enriched by the most reachable target nodes). The second one finds which functions can be initiated with the highest probability (i.e., most reachable GO terms). Collectively, these two methods explain the prevalent functions that are carried out by a given signaling network through propagating signals from receptors to reporters.
Results and discussion
In this section, we present experimental results for characterization of probabilistic signaling networks. We used the Homo sapiens signaling networks taken from KEGG [28] in our experiments. Among those, we used the largest ones (i.e., networks with more than 50 edges), which are ErbB, MAPK and Wnt. We obtained the sources and targets of each signaling network based on the hierarchical organization of its proteins [29]. We set the genes at the top of the hierarchy as the source nodes and the ones at the bottom as the target. We extracted the confidence scores for each interaction from STRING [16] and used them as edge probabilities. STRING computes the confidence values by benchmarking groups of associations against the KEGG functional classification scheme, which is manually curated. STRING has confidence values in the [1, 1000] interval, where 1000 indicates 100% confidence. We normalized this number to the[0, 1] interval for each interaction by dividing by 1000.
Node centrality in probabilistic networks
In this section, we present experimental results for measuring probabilistic node centrality in probabilistic signaling networks. As explained later in the Methods section, We measured the probabilistic centrality value for all proteins in ErbB, MAPK and Wnt. The first question we need to answer at this point is whether probabilistic networks yield different centrality values than deterministic ones. If yes, what is the significance of the difference? To answer these questions, we compared our results with the betweenness centrality of each node in the underlying deterministic topology, where all edges are certain. We used the betweenness centrality for comparison as it is used frequently in the literature[30–32]. Also, it is the closest centrality measure to ours in terms of the biological meaning of centrality. We ranked the proteins according to both centrality values separately. We then measured the disagreement between the two rankings as follows. For each protein x, we counted the number of proteins whose position relative to x in one of the ranking disagree with the other. In other words, a protein y was counted if it is more central in the deterministic centrality ranking and less central in the probabilistic one, or vice versa. We normalized the resulting number to the[0, 1] interval by dividing it by the total number of proteins in the network.
Assessment of network stability
In this section, we evaluate the stability of probabilistic signaling networks ErbB, MAPK and Wnt using our method (see the Methods section). We measured network stability in terms of its reaction to random perturbation. The more a network maintains its signal reachability levels under perturbation, the more it is considered stable. On the other hand, if reachability levels drop dramatically in the event of perturbation, then it is considered unstable. We measured stability under two possible network perturbation models: alterations applied to the interaction probabilities and modifications applied to the network topology. We do not present results for MAPK in the topology perturbation experiment as measuring reachability probabilities in such a large number of randomly perturbed topologies takes more time than feasible in that experiment.
Characterizing network functions
In this section, we characterize the important functions performed by probabilistic signaling networks based on their enrichment and reachability, as explained in the Methods section. We use ErbB, MAPK and Wnt networks in our experiments.
Details of gene groups marked in Figure 9 and Figure 8.
N1: EGF, EGFR, TGFA, AREG, EREG, HBEGF, BTC  T1: EGF, EGFR, TGFA, AREG, EREG, HBEGF, BTC 
N2: SHC2, GRB2, ERBB3, PIK3R5, AKT3, GAB1  T2: SHC2, GRB2, ERBB3, NRG1, NRG2 
Conclusion
We developed a comprehensive set of methods for characterizing probabilistic signaling networks on three levels. First, we developed a method for measuring node centrality, based on the node's contribution towards the probability of signal reachability through the network. Second, we developed methods for characterizing the level of stability of the entire network, based on the amount of change in reachability probability when perturbing either the network topology or interaction probabilities. Last, we developed methods for characterizing the functional terms in the network, based on the enrichment and reachability of these terms.
We presented novel results by applying these methods to the ErbB, MAPK and Wnt signaling networks of H. sapiens obtained from KEGG. For node centrality, our results showed that the novel centrality measure we present here is more suitable for probabilistic signaling networks. We also showed that node centrality is dominated by the network topology in Wnt, as opposed to being more dominated by edge probabilities in ErbB. For network stability, Our results showed that the original topology and edge probabilities serve each network better than any randomly perturbed version of them. We also showed that Wnt showed the highest level of stability against random perturbations, while ErbB showed the lowest level of stability. Finally, for the functional terms, our enrichment and reachability results both showed that Wnt is a lot more specific than both ErbB and MAPK in terms of the functions it serves. Results also showed a little overlap between ErbB and Wnt in terms of the highly reachable functions. We also showed functional terms tend to be consistent in terms of both enrichment and reachability. Last, we showed that functionallyrelated groups of proteins can be identified by clustering of their reachability probabilities towards different functional terms, as well as towards other proteins.
This paper makes a significant contribution over the existing literature. First, it extends our understanding of biological networks from the simple deterministic topology to probabilistic. Second, we demonstrate that signaling networks can be computationally characterized in terms of the reachability of signals from receptors to reporters.
Methods
In this section we present the novel methods we developed to arrive at the results explained above.We characterize probabilistic signaling networks based on the probability that signals can travel in the network, particularly from receptors to reporters. First, we present a method for measuring node centrality. Then, we present methods for evaluating the stability of a network with respect to random perturbation. Last, we present methods for characterizing biological functions served by the network based on their functional enrichments and reachability probabilities.
Throughout the rest of this paper, we use the following notation. We denote the set of nodes (i.e., proteins) by V and the set of directed edges (i.e., interactions) by E. We denote an edge probability function by P : E → [0, 1], which returns the existence probability of each edge e ∈ E. Consider a probabilistic signaling network G = (V,E, P), a set of source nodes S ⊆ V , and a set of target nodes T ⊆ V . We denote the reachability probability from s ∈ V to t ∈ V by P_{ reach }(G, s, t), which returns the probability of a signal reaching successfully from s to t in G.
Computing reachability probability
We use PReach [25] for measuring the signal reachability probability between receptors and reporters in probabilistic signaling networks. Let U = {1, . . . , n}, where n = E. Let Θ be a subset of U. Let S_{1}, . . . , S_{ k } be k different subsets of Θ. Let X and Y be two sets of n variables, where X = {x_{1}, . . . , x_{ n }} and Y = {y_{1}, . . . , y_{ n }}. Let ${x}_{{S}_{i}}={\prod}_{j\in {S}_{i}}{x}_{j}$ and ${y}_{{S}_{i}}={\prod}_{j\in {S}_{i}}{y}_{j}$, where i ∈ {1, . . . , k}. Let x^{∗} and y^{∗} be two free variables. Let a_{1}, . . . , a_{ k }, b and c be real numbers. PReach defines an xy  polynomialover Θ as $F={\sum}_{i=1}^{k}{a}_{i}{x}_{{S}_{i}}y\Theta \backslash {S}_{i}+b{x}^{*}+c{y}^{*}$.
PReach associates every edge e_{ j } ∈ E with a variable x_{ j } ∈ X and a variable y_{ j } ∈ Y , where j ∈ U. x_{ j } designates the case where e_{ j } is present, while y_{ j } designates the case where e_{ j } is absent. Each of the nonfree terms ${a}_{i}{x}_{{S}_{i}}{y}_{\Theta}\backslash {S}_{i}$ represents a combination where e_{ j } is present ∀j ∈ S_{ i } and absent ∀_{ j } ∈ Θ \ S_{ i }. a_{ i } is the probability of this specific combination. The free variable x^{∗} designates the case where T is reachable from S, and b is its probability. Inversely, The free variable y^{∗} designates the case where T is unreachable from S, and c is its probability.
Let p_{ i } = P(e_{ i }) and q_{ i } = 1 − p_{ i }. PReach proceeds by associating every edge e_{ i } ∈ E with a binomial p_{ i }x_{ i }+q_{ i }y_{ i }. It then proceeds by multiplying these binomials into a growing xypolynomial. After every mulitiplication, PReach checks the polynomial for nonfree terms that can be collapsed into one of the two free terms. For any of the nonfree terms a_{ i }xS_{ i }y_{Θ}\S_{ i } , if the edge set associated with S_{ i } contains a path from S to T , the term is replaced by a_{ i }x^{∗}. If the edge set associated with Θ \ S_{ i } contains a cut between S and T , the term is replaced by a_{ i }y^{∗}. Any later multiplication of a new term p_{ i }x_{ i } with bx^{∗} results in bp_{ i }x^{∗}. Similarly, (p_{ i }x_{ i })(cy^{∗}) = cp_{ i }y^{∗}, (q_{ i }y_{ i })(bx^{∗}) = bq_{ i }x^{∗}, and (q_{ i }y_{ i })(cy^{∗}) = cq_{ i }y^{∗}. Therefore, the size of the xy polynomial avoids growing in an exponential rate.
Characterizing node centrality
The smallest building blocks of a probabilistic signaling network are the individual nodes that make up the network. Therefore, as a first step in characterizing these networks, we focus on the roles of individual nodes in how signaling networks function. To do that, we develop a new model to explain the centrality of individual nodes.
Our method mimics the betweenness centrality measure. Traditionally, this measure has been frequently used for deterministic networks. In such studies, it considers a node x to be between nodes y and z if x is on the shortest path from y to z. These studies however have two major flaws. First, a probabilistic network can yield many alternative deterministic network topologies. As a result, different sets of nodes can be between y and z for different deterministic topologies. Thus, it is not certain whether x is in that set. Second, there is no guarantee that a signal traveling from y to z will always choose the shortest path. Thus, limiting betweenness to only the shortest paths is unrealistic.
We develop a new method for measuring node centrality in a probabilistic network based on reachability probability. We consider a node as highly central in a probabilistic network if a signal traveling from a source node to a target node visits that node with a high probability. Based on this, we measure the node centrality as the expected number of sourcetarget pairs whose connectedness relies on the presence of the subject node. We explain this in detail next.
Given a node v ∈ V and a sourcetarget pair (s, t), we call v an essential node for (s, t) if the removal of v from the network disconnects s and t. Given a node v, for each sourcetarget pair (s, t), we want to measure the probability of v being essential for (s, t). To do this, we first measure the probability of a signal propagating successfully from s to t given the existence of v. This value is denoted by P_{ reach }(G, s, t). We then measure that probability in the absence of v. To do this, we construct a modified network G′ by removing v and all its incoming and outgoing edges. We then compute the reachability probability P_{ reach } (G′, s, t). The difference between the first and the second probability values represents the probability of a signal having to pass through v in order to reach from s to t. Therefore, given these two probability values, we calculate the probability of v being an essential node to (s, t) as Cv(G, s, t) = P_{ reach }(G, s, t) − P_{ reach }(G′, s, t).
For a given node v, given the value of C_{ v } (G, s, t), ∀_{s} ∈ S, t ∈ T , we compute the centrality of v as the average number of (s, t) pairs for which v is essential. To do this, we consider the random variable Xv that follows Poisson Binomial distribution with parameters C_{ v }(G, s, t), ∀s, t. Thus, the expected number of (s, t) pairs for which v is essential becomes equivalent to the expected value $E\left[{X}_{v}\right]={\sum}_{s\in S,t\in T}{C}_{v}\left(G,s,t\right)$.
Computing the centrality of a node involves computing reachability two times: before and after removing the node. Therefore, the time complexity is the same as that of PReach O(2^{E}). However, this is a theoretical upper bound, as PReach avoids the exponential growth in practice[25].
Characterizing network stability
In the previous section, we characterized individual nodes in probabilistic signaling networks. Here, we expand our model to characterize the entire network. More specifically, we develop a method for evaluating stability of probabilistic signaling networks. Briefly, we say that a network is stable if the probability that a signal travels successfully from source to target nodes in that network does not change greatly when a small amount of random perturbations are applied to that network. We consider two types of perturbations: (i) alteration of edge probabilities and (ii) modification of network topology. We describe a parametric model for each of them later in this section.
Consider the given network G = (V, E, P ) and the sets of source and target nodes S and T . Let us denote the network obtained after perturbing G with G^{ δ } . Given a source node s ∈ S and a target node t ∈ T , the difference P_{ reach }(G^{δ} , s, t)−P_{ reach }(G, s, t) indicates the change in the reachability probability from s to t after the network is perturbed. We compute the stability of G with respect to G^{ δ } in terms of the average of this difference over all possible pairs of s ∈ S and t ∈ T (i.e., $\frac{1}{\leftS\right\times \leftT\right}{\sum}_{s\in S,t\in T}\left({P}_{reach}\left({G}^{\delta},s,t\right){P}_{reach}\left(G,s,t\right)\right)$. A large magnitude for this value indicates that G is unstable. The sign of this value shows the direction of instability. A negative sign indicates a drop in reachability and thus the cell getting unresponsive to external signals. A positive sign indicates a rise in reachability and thus the cell getting oversensitive to such signals.
Next, we describe in detail how we model perturbation of a network G given a perturbation parameter δ to obtain a perturbed network G^{ δ } .

Perturbation of edge probabilities. In this case, the parameter δ denotes the maximum change in edge probabilities. We define a perturbed edge probability function P ^{ δ }→[0, 1] that, for each edge e ∈ E, returns a value drawn uniformly at random from the range P (e) ± δ ∩[0, 1]. We construct G^{ δ }= (V, E, P ^{ δ }).

Perturbation of network topology. In this case, we inflict topology perturbation by degreepreserving edge shuffling for a fraction of the edges. Each shuffling operation randomly picks a pair of edges (u_{1}, v_{1}) and (u_{2}, v_{2}). It then replaces these edges with (u_{1}, v_{2}) and (u_{2}, v_{1}) and randomly assigns each of the old edge probabilities to the new ones. The parameter δ denotes the fraction of edges to be shuffled. We construct G^{ δ }= (V, E^{ δ } , P ), where E^{ δ }is obtained from E by randomly shuffling a fraction δ of the edges.
Similar to computing node centrality, characterizing network stability involves computing reachability two times: before and after introducing the perturbation. The time complexity is the same as that of PReach O(2^{E}). Again, this is a theoretical upper bound, as PReach avoids the exponential growth in practice[25].
Characterizing network functions
Each signaling network is responsible for carrying out various functions. In this section, we develop a method to mathematically characterize the biological functions that can be realized by a given probabilistic signaling network. We use the GO terms associated with the target genes to denote the set of possible functions of the underlying signaling network. The GO database organizes terms in a hierarchy of "isa" and "partof" relationships, such that the highest level is the most generic. We ignore the terms in the top five levels of the hierarchy in our analysis. Note that these terms are commonly ignored as they are generic and commonly assigned to a large number of genes [15].
A target gene's ability to perform the functions it is annotated with is affected by the extracellular signals which reach to that gene. Following from this observation, we conjecture that a network is more likely to regulate a function if the set of target genes annotated with that function are reachable with higher probability than the other genes. More specifically, we model two different characteristics of the functional annotations of a probabilistic signaling network in terms of reachability of the nodes of that network. Namely, these are the enrichment and the reachability of the annotations. We elaborate on them next.
Enrichment of functional annotations. Consider the given network G = (V, E, P ) and the sets of source and target nodes S and T . For each target node t ∈ T , we compute the reachability probability from at least one of the source nodes in S to t. The resulting reachability probabilities provide a ranking of the target node in T . We then consider the set A of all GO terms in G. For each term a ∈ A, given a parameter d ∈ {1, 2, . . . T}, we consider the set T_{ d } ⊂ T as the set containing only the top d target nodes in T with the highest reachability probability. We calculate the enrichment value for a in T_{ d } as follows. Let N be the number of all targets in T annotated with a, n be the number of targets annotated with a in T_{ d }. We compute the enrichment value of a as P (X ≥ nT , d, N ) where X is a random variable under a hypergeometric distribution with these parameters. We try all possible values of d ∈ 1, . . . , T  and select the best enrichment value. In other words, the enrichment value shows the probability that a random subset of T of size d contains at least n nodes annotated with a. The lower the enrichment value is, the more significant the term a is.
Reachability of functional annotations. Consider the given network G = (V, E, P ) and the sets of source and target nodes S and T . Also consider the set A of all GO terms in the network. For each a ∈ A, we construct the two sets S_{ a } ⊂ S and T_{ a } ⊂ T . Here, S_{ a } and T_{ a } denote the set of source and target nodes that are annotated by a respectively. We measure the reachability probability of a as the reachability probability from any node in S_{ a } to any node in T_{ a }. We rank the terms in A according to their reachability probabilities. Higher reachability probability of a term means that the paths available for the associated function contains more reliable interactions and/or more redundant paths. Therefore, we expect that the functional terms with higher reachability probability play more critical roles than others within that network.
The time complexity of both methods is dominated by that of calculating reachability probability. Therefore, the time complexity is the same as that of PReachO(2^{E}). Again, this is a theoretical upper bound, as PReach avoids the exponential growth in practice[25].
Data access
All data, scripts and results are available at http://bioinformatics.cise.ufl.edu/PReach/characterization.htm
Declarations
Acknowledgements
This work was supported partially by NSF under grant IIS0845439.
Declarations
Publication of this article was funded by NSF under grant CCF1251599.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 17, 2015: Selected articles from the Fourth IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS 2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S17.
Authors’ Affiliations
References
 Laplante M, Sabatini DM: mTOR signaling in growth control and disease. Cell. 2012, 149 (2): 274293.PubMedPubMed CentralView ArticleGoogle Scholar
 MacDonald BT, et al: Wnt/βcatenin signaling: components, mechanisms, and diseases. Developmental cell. 2009, 17 (1): 926.PubMedPubMed CentralView ArticleGoogle Scholar
 Mizuno S, et al: Alzpathway: a comprehensive map of signaling pathways of alzheimer's disease. BMC systems biology. 2012, 6 (1): 52PubMedPubMed CentralView ArticleGoogle Scholar
 Yook S, et al: Functional and topological characterization of protein interaction networks. PROTEOMICS. 2004, 4 (4): 928942.PubMedView ArticleGoogle Scholar
 Jeong H, et al: The largescale organization of metabolic networks. Nature. 2000, 407 (6804): 651654.PubMedView ArticleGoogle Scholar
 Wagner A, et al: The small world inside large metabolic networks. Proceedings of the Royal Society of London Series B: Biological Sciences. 2001, 268 (1478): 18031810.PubMedPubMed CentralView ArticleGoogle Scholar
 Todor A, Dobra A, Kahveci T: Uncertain interactions affect degree distribution of biological networks. Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference. 2012, 15.View ArticleGoogle Scholar
 Hahn MW, Kern AD: Comparative genomics of centrality and essentiality in three eukaryotic proteininteraction networks. Molecular Biology and Evolution. 2005, 22 (4): 803806.PubMedView ArticleGoogle Scholar
 Jeong H, et al: Lethality and centrality in protein networks. Nature. 2001, 411 (6833): 4142.PubMedView ArticleGoogle Scholar
 Tasa T: Centrality in biological networks. 2011Google Scholar
 Alon U: Biological networks: the tinkerer as an engineer. Science. 2003, 301 (5641): 18661867.PubMedView ArticleGoogle Scholar
 Kwon Y, Cho K: Quantitative analysis of robustness and fragility in biological networks based on feedback dynamics. Bioinformatics. 2008, 24 (7): 987994.PubMedView ArticleGoogle Scholar
 Ryba T, et al: Replication timing: a fingerprint for cell identity and pluripotency. PLoS computational biology. 2001, 7 (10): 1002225View ArticleGoogle Scholar
 Sch¨ubeler D, et al: Genomewide dna replication profile for drosophila melanogaster: a link between transcription and replication timing. Nature genetics. 2002, 32 (3): 438442.View ArticleGoogle Scholar
 Todor A, et al: Probabilistic biological network alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012, 99 (PrePrints): 1Google Scholar
 Szklarczyk D, et al: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic acids research. 2011, 39 (suppl 1): 561568.View ArticleGoogle Scholar
 Ceol A, et al: MINT, the Molecular INTeraction database: 2009 update. Nucleic acids research. 2009, 983Google Scholar
 Bader JS, et al: Gaining confidence in highthroughput protein interaction networks. Nature biotechnology. 2003, 22 (1): 7885.PubMedView ArticleGoogle Scholar
 von Mering C, et al: Comparative assessment of largescale data sets of proteinprotein interactions. Nature. 2002, 417 (6887): 399403.PubMedView ArticleGoogle Scholar
 Barabási A, Oltvai ZN: Network biology: understanding the cell's functional organization. Nature Reviews Genetics. 2004, 5 (2): 101113.PubMedView ArticleGoogle Scholar
 Ravasz E, et al: Hierarchical Organization of Modularity in Metabolic Networks. Science. 2002, 297 (5586): 15511555.PubMedView ArticleGoogle Scholar
 Oxley J, Welsh D: Chromatic, flow, and reliability polynomials: the complexity of their coefficients. Combinatorics, Probability & Computing. 2002, 11 (4): 403426.View ArticleGoogle Scholar
 Brown DB: A computerized algorithm for determining the reliability of redundant configurations. Reliability, IEEE Transactions. 2971, 20 (3): 121124.Google Scholar
 Sokal AD: The multivariate tutte polynomial (alias potts model) for graphs and matroids. Surveys in combinatorics. 2005, 327: 173226.Google Scholar
 Gabr H, Todor A, Zandi H, Dobra A, Kahveci T: Preach: Reachability in probabilistic signaling networks. Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics BCB'13. 2013, ACM, Wshington DC, USA, 312.Google Scholar
 Gabr H, Kahveci T: Characterization of probabilistic signaling networks through signal propagation. Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference. 2014, IEEE, 12.Google Scholar
 Ashburner M, et al: Gene ontology: tool for the unification of biology. Nature genetics. 2000, 25 (1): 2529.PubMedPubMed CentralView ArticleGoogle Scholar
 Kanehisa M, et al: The KEGG resource for deciphering the genome. Nucleic acids research. 2004, 32 (suppl 1): 277280.View ArticleGoogle Scholar
 Gulsoy G, et al: HIDEN: Hierarchical decomposition of regulatory networks. BMC bioinformatics. 2012, 13 (1): 250PubMedPubMed CentralView ArticleGoogle Scholar
 Yoon J, et al: An algorithm for modularity analysis of directed and weighted biological networks based on edgebetweenness centrality. Bioinformatics. 2006, 22 (24): 31063108.PubMedView ArticleGoogle Scholar
 Joy MP, et al: Highbetweenness proteins in the yeast protein interaction network. BioMed Research International. 2005, 2005 (2): 96103.Google Scholar
 Yu H, et al: The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS computational biology. 2007, 3 (4): 59View ArticleGoogle Scholar
 BarJoseph Z, et al: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics. 2001, 17 (suppl 1): 2229.View ArticleGoogle Scholar
 Bazley LA, Gullick WJ: The epidermal growth factor receptor family. EndocrineRelated Cancer. 2005, 12 (Supplement 1): 1727.View ArticleGoogle Scholar
 Gorgoulis V, et al: Expression of egf, tgfalpha and egfr in squamous cell lung carcinomas. Anticancer research. 1992, 12 (4): 1183PubMedGoogle Scholar
 Jackson LF, et al: Defective valvulogenesis in hbegf and tacenull mice is associated with aberrant bmp signaling. The EMBO journal. 2003, 22 (11): 27042716.PubMedPubMed CentralView ArticleGoogle Scholar
 Pathak B, et al: Mouse chromosomal location of three egf receptor ligands: amphiregulin (areg), betacellulin (btc), and heparinbinding egf (hegfl). Genomics. 1995, 28 (1): 116118.PubMedView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.