Hierarchical decomposition of dynamically evolving regulatory networks
 Ahmet Ay^{1},
 Dihong Gong^{2} and
 Tamer Kahveci^{2}Email author
https://doi.org/10.1186/s1285901505299
© Ay et al.; licensee BioMed Central. 2015
Received: 27 October 2014
Accepted: 9 March 2015
Published: 15 May 2015
Abstract
Background
Gene regulatory networks describe the interplay between genes and their products. These networks control almost every biological activity in the cell through interactions. The hierarchy of genes in these networks as defined by their interactions gives important insights into how these functions are governed. Accurately determining the hierarchy of genes is however a computationally difficult problem. This problem is further complicated by the fact that an intrinsic characteristic of regulatory networks is that the wiring of interactions can change over time. Determining how the hierarchy in the gene regulatory networks changes with dynamically evolving network topology remains to be an unsolved challenge.
Results
In this study, we develop a new method, named DHIDEN (DynamicHIerarchical DEcomposition of Networks) to find the hierarchy of the genes in dynamically evolving gene regulatory network topologies. Unlike earlier methods, which recompute the hierarchy from scratch when the network topology changes, our method adapts the hierarchy based on the wiring of the interactions only for the nodes which have the potential to move in the hierarchy.
Conclusions
We compare DHIDEN to five currently available hierarchical decomposition methods on synthetic and real gene regulatory networks. Our experiments demonstrate that DHIDEN significantly outperforms existing methods in running time, accuracy, or both. Furthermore, our method is robust against dynamic changes in hierarchy. Our experiments on human gene regulatory networks suggest that our method may be used to reconstruct hierarchy in gene regulatory networks.
Keywords
Background
Regulatory interactions between genes in a cell are often represented as a network, also called gene regulatory network. These networks are often modeled as directed graphs where nodes represent the genes or their products, and directed edges between these nodes represent the regulation of a gene by another one. Gene regulatory networks govern almost every biological activity in the cell [18]. Due to their central role in the development of organisms and human diseases, the analysis of gene regulatory networks holds the key for understanding how biological processes are regulated. Subtle changes, such as an increase or decrease of the abundance of regulatory proteins or aberrations in regulatory interactions between these proteins, can have profound effects in many biological processes including human diseases such as cancer.
Biological networks have been characterized by a variety of graph theoretic measures such as degree distribution and clustering coefficient [9]. Analysis of the gene regulatory networks has shown that these networks share common characteristics such as the scalefree degree distribution and hierarchical ordering of the genes [1020].
Correct assignment of the hierarchy levels to the genes in the regulatory networks is essential for comprehensive analysis of the gene regulatory networks, particularly for understanding the impact of external perturbations and diseases on the transcription and abundance of genes and gene products. While the experimental approaches can be used to identify the genes and their interactions, finding the hierarchy between these genes remains to be a computational task.
Some of the key approaches for finding the hierarchy of genes in regulatory networks computationally include BFS [11], VertexSort [12], HINO [13], HIDEN, and Divide and Conquer HIDEN (DCHIDEN) [14]. The BFS method uses breadthfirst search technique to assign hierarchies to genes in a network [11]. Although this method is applicable to any gene regulatory network, it fails to correctly assign hierarchy levels for networks that contain cycles. VertexSort uses a topological sort algorithm to assign hierarchies to genes in the given network [12]. This method does not assign a single hierarchy level for each gene; it instead assigns a range of possible levels for genes. HINO improves the BFS method by introducing two correction steps, called “downgrade” and “upgrade”. Downgrade step places each node at the lowest level where that vertex and its regulating nodes are assigned to. Upgrade step assigns nodes to the next higher level if they regulate other nodes which are located at the same level and no other node regulates them. Despite this improvement, its accuracy remains very low. HIDEN [14] formulates the problem using integer linear programming. In comparison to BFS, VertexSort and HINO, HIDEN produces more accurate hierarchy assignments. However, this comes at the price of significantly increased computational cost. As a result, HIDEN does not scale to large gene regulatory networks. To remedy this shortcoming, the same study, proposes a divide and conquer strategy, named DCHIDEN. This strategy computes a set of local solutions for randomly selected subnetworks of the given network and then combines them. As a result of this localization, this method scales to large networks, but its accuracy levels are lower in comparison to that of HIDEN. In summary, all of these aforementioned approaches are either grossly inaccurate or take enormous amount of time as the network size grows. As we explain next, such enormous cost leads to another challenge that makes these methods impractical.
An inherent characteristic of gene regulatory networks is that their topologies are dynamic. New interactions can appear or existing interactions can become infeasible over a period of time [21]. For example, many genes are involved in regulatory networks that control differentiation in different cell types during early development. The interactions between these genes vary throughout development and in different cell types. One reaction that exists at one time point may be lost at a later time point, or a reaction that exists in one cell type may be missing in another cell type. These dynamic changes in the regulatory interactions reflect the changes in the needs of the system and are key for correct development of the organism. After the organism develops into its adult form, cell differentiation is minimal and most of the regulatory interactions that are involved in these regulatory pathways are lost. However, the dynamic rewiring of the gene regulatory networks for other cellular functions is still needed to efficiently respond to environmental changes [2224]. Clearly, such topological changes in the regulatory networks can alter the position of a subset of genes in the hierarchical decomposition, yet we do not know which genes will change or how they will change their hierarchies [1520]. Determining such topological changes is key for understanding how gene regulatory networks function, how the development works for different organisms and finding cures for diseases such as cancer.
To the best of our knowledge, all the methods in the literature targeted at computing the hierarchical decomposition of biological networks ignore the dynamic nature of the network topologies. As a result, they cannot handle dynamic changes in the gene regulatory networks. An obvious way to apply these methods on dynamic network topologies is to recompute the hierarchical decomposition from scratch each time the network topology changes, even when the change is miniscule. This is, however, not practical as finding the hierarchical decomposition is a computationally expensive task. New methods that can quickly update the hierarchical decomposition from the existing hierarchy when network topology is updated slightly are needed.
Our contributions In this paper, we develop a new method, namely Dynamic Hierarchical Decomposition of Regulatory Networks (DHIDEN) which finds the hierarchy in gene regulatory networks. Unlike existing methods, DHIDEN can handle networks whose topologies change dynamically. The idea behind DHIDEN is that small changes in the topology of a network alters the hierarchies of only a small number of nodes of that networks. When the network topology changes, instead of recomputing the hierarchy from scratch for the entire network, DHIDEN computes the hierarchy levels of a small part of the underlying network that is most likely to move in the hierarchy. We formulate this problem as an integer linear programming problem. The challenge here is to predict which subnetwork yields the variables in the resulting mixed integer linear programming. DHIDEN tackles this problem by learning a function that describes the probability that a given gene in the network will move in the hierarchy, based on a given set of alterations in the network topology (i.e., insertions and deletion of interactions). Our extensive experimental results on both synthetic and real networks demonstrate that DHIDEN achieves much better performance (accuracy and running time) than five currently available methods. Applications of our method to the human gene regulatory networks could successfully reconstruct the hierarchy in these networks. Our analysis on the human gene regulatory networks evolving due to cell differentiation reveals changes in the hierarchy of the regulatory roles of genes.
The implementation of the methods we developed in this paper and the datasets we used in our experiments are available at http://bioinformatics.cise.ufl.edu/dhiden.
The rest of the paper is organized as follows. In the Methods section, we formally define the problem and describe the DHIDEN algorithm. In the Results section, we present the results of our method. In the Discussion and Conclusion section we summarize our findings.
Methods
In this section, we first present the key terms that are essential to describe our method. We then explain our method in detail.
Terms and definitions
A gene regulatory network describes the regulatory interactions between genes and their products. Mathematically, we model a gene regulatory network using a directed graph denoted with G=(V,E). In this notation, V denotes the set of nodes where each node corresponds to a unique gene. E⊆V×V denotes the set of directed edges where each edge corresponds to an interaction. In the rest of this paper, we will use the term graph to denote directed network unless otherwise stated explicitly. We start by defining the inversion operation on graphs and how we use it to enrich a graph.
Definition 1 (Inversion of a graph).
Given a graph G=(V,E), we say that graph G ^{ i }=(V,E ^{ i }) is the inverse of G if E ^{ i } is obtained by reversing the directions of all the edges in E. Formally, this happens when both of the following two conditions hold: (i) ∀(u,v)∈E, we have (v,u)∈E ^{ i }, and (ii) ∀(u,v)∈E ^{ i }, we have (v,u)∈E.
Definition 2 (Direction enriched graph).
Consider a graph G=(V,E) and its inverse G ^{ i }=(V,E ^{ i }). We say that graph G ^{ e }=(V,E ^{ e }) is direction enriched graph of G if E ^{ e }=E∪E ^{ i }.
We say that a graph G=(V,E) is connected if its direction enriched graph contains at least one path from all the nodes of V to all the other nodes of V. In the rest of this paper, unless otherwise specified, we assume G to be directed and connected. It is worth noting that our method does not rely on this assumption. We only make this assumption to simplify the method description. We defer the discussion on disconnected graphs to the end of this section.
Definition 3 (Hierarchical decomposition of a graph).
Consider a graph G=(V,E), and a positive integer M denoting the highest possible hierarchy level. The hierarchical decomposition of G is a function H:V→{1,…,M}.
Verbally, the hierarchical decomposition assigns a level, which is described by an integer, to the nodes of the graph. In this hierarchy, larger numbers indicate a higher level in the hierarchy, and thus, M is the highest level of hierarchy. The Figure 1(b) shows an example of hierarchical decomposition where M=3.
The hierarchical decomposition of a graph describes the relative role of the genes as the regulators or the regulated ones in that graph. Ideally, if a gene denoted by node u regulates another gene denoted by node v (i.e., there is a directed edge from u to v in the corresponding graph), then we would like to place node u at a higher level in the hierarchy than node v. The following two definitions capture this.
Definition 4 (Conflicting edge).
Consider a graph G=(V,E), and its hierarchical decomposition, H:V→{1,…,M}. Consider an edge (u,v)∈E. We say that the edge (u,v) is conflicting if H(u)≤H(v).
For instance, in Figure 1(b), the edge (4,5) is a conflicting edge because H(4)=H(5). Similarly, the edge (5,3) is also conflicting as H(5)<H(3).
Definition 5 (Penalty of hierarchical decomposition).
For instance, in Figure 1(b), penalty of the given hierarchy is two since there are two conflicting edges. We say that a hierarchical decomposition of a given graph into M hierarchical levels is optimal if it yields the least penalty among all possible hierarchical decompositions.
Definition 6 (Dynamically evolving graphs).
 1.
V _{ i−1}=V _{ i },
 2.
E _{ i−1}−E _{ i }+E _{ i }−E _{ i−1}≤τ.
In Definition 6 above, the first condition enforces that the set of nodes should be preserved (we will relax this constraint later in this section). The second condition requires that only a limited amount of edges can be altered (i.e., inserted or removed) from one graph to obtain the next graph in the sequence.
When the threshold τ in Definition 6 is small, instead of considering the K graphs in the dynamically evolving sequence G as K independent graphs, we view them as single graph whose topology is gradually changing from G _{1} to G _{2}, then to G _{3}, and so on until we reach G _{ K }.
Notice that as the graph topology evolves as provided in the sequence G, the hierarchy level assigned to the nodes of the graph by the optimal hierarchical decomposition can also change. This is because as new edges are inserted or existing ones are removed, the role of the gene (i.e., node) incident to those edges as the regulator of other genes can change. For instance, in Figure 1(a), if we remove edge (2,4) and insert edge (4,3), we obtain the graph in Figure 1(c). Figure 1(d) presents the optimal hierarchical decomposition of the resulting graph. Notice that the hierarchical decomposition in Figure 1(d) differs from that in Figure 1(b). Following from this observation, we define two sets of nodes based on the graphs corresponding to two consecutive graphs of a given dynamically evolving graphs next.
Definition 7 (Dynamic and complementary nodes).
Consider two consecutive graphs G _{ i−1} and G _{ i } in a sequence of dynamically evolving graphs. Let us denote their optimal hierarchical decomposition with H _{ i−1} and H _{ i } respectively.
We are now ready to define the problem considered in this paper formally.
The naive solution to the problem above is to compute the hierarchical decomposition for each graph G _{ i } in G independently using existing algorithms, such as HIDEN [14]. This is however not practical as the size of the graph (number of nodes and edges) and the number of graphs in G increases, the cost of finding the optimal hierarchical decomposition grows rapidly.
Our algorithm: DynamicHIDEN
Integer linear programming formulation of hierarchical decomposition
DHIDEN computes the hierarchical decomposition of each graph in a sequence of dynamically evolving graphs G using Integer Linear Programming (ILP). Unlike, existing methods that also use ILP, it creates variables only for a small set of nodes and edges. As we will explain later in detail, these variables correspond to the nodes that will change their position in the hierarchical decomposition with a high probability. We present the ILP formulation next by focusing on the ith graph in the sequence denoted by graph G _{ i }=(V _{ i },E _{ i }).
M is the maximum number of hierarchy levels
This formulation minimizes the number of conflicting edges, where ε _{ uv } can take either 0 (if (u,v) is not a conflicting edge) or 1 (if (u,v) is a conflicting edge). The first constraint ensures that if t _{ u }≤t _{ v } (which implies (u,v) is a conflicting edge), then it holds if and only if ε _{ uv } takes 1. While t _{ u }>t _{ v } (which implies (u,v) is not a conflicting edge), the condition holds no matter ε _{ uv } takes 0 or 1. But in order to minimize \(\sum _{(u,v) \in {E_{i}^{s}}}\epsilon _{\textit {uv}}\), the ε _{ uv } will take 0. The second constraint requires ε _{ uv } to be either 0 or 1. The third constraint assigns nodes in D _{ i } to one of the levels from 1 to M. The last constraint says that level assignment for nodes in C _{ i } will be taken from H _{ i−1}.
The problem above has \(D_{i}+{E_{i}^{s}}\) variables. Since the topologies of G _{ i−1} and G _{ i } are highly similar (see Definition 6), it is expected that their corresponding assignments H _{ i−1} and H _{ i } also share high similarity. Thus, we expect that the cardinality of the dynamic set D _{ i } to be very small as compared to that of V _{ i }. In other words, very small number of nodes are likely to change their hierarchy assignments as graphs evolve gradually. As a result, the number of variables in our ILP formulation is dramatically smaller than the standard ILP solution, such as HIDEN.
Dealing with node insertion and deletion
So far we have formulated our DHIDEN algorithm for solving a sequence of dynamic graphs G=(G _{1},G _{2},…,G _{ K }) whose nodes are preserved (i.e., V _{ i }=V _{ j },∀G _{ i },G _{ j }∈G). According to Definition 6 however, the set of nodes may also slightly change as graph evolves. That is, new nodes can be inserted or existing nodes can be removed in a graph in G to obtain the next graph in the sequence. We solve this problem as follows. Consider two consecutive graphs G _{ i−1} and G _{ i } in G such that V _{ i−1}≠V _{ i }. To compute the hierarchy of G _{ i }, we do not need to define any variable for the nodes that are removed from V _{ i−1} (i.e. the nodes in V _{ i−1}−V _{ i }). Since we do not know the initial assignment of newly introduced nodes (i.e., nodes in V _{ i }−V _{ i−1}), we include all of these new nodes in the dynamic node set while creating the ILP constraints.
Dealing with disconnected graphs
Another assumption we have previously made is that the graph is connected. For disjointed graphs, we apply DHIDEN on each of the individual components to obtain hierarchical decompositions. The rationale behind this strategy is the following: as there is no edge between any pair of disjointed components, the decomposition is optimal if and only if the decompositions of these components are optimal.
So, what is the challenge?
The ILP formulation we provide above is brief and varies slightly from the traditional ILP solutions for the same problem. So, the obvious question is: What is the challenge here?
The crux of our algorithm lies in creating the dynamic set of nodes D _{ i }, which will be detailed in this section. Recall that this set can only be precisely known after the decomposition H _{ i } is actually computed. This information is however not known at this stage. It is actually what is computed at the end of this stage. To deal with this cyclic dependency, we predict the set D _{ i } prior to computing H _{ i }. Let us denote our predicted set with \({D^{p}_{i}}\). The performance and the accuracy of our algorithm depends on how well \({D^{p}_{i}}\) represents D _{ i }. Thus, the central challenge lies in predicting the dynamic set precisely. In the following, we describe how we address this challenge.
OUR PREDICTION STRATEGY IN A NUTSHELL. In order to predict whether a node will change its hierarchy level as the graph evolves from G _{ i−1} to G _{ i }, we learn a probabilistic model that predicts whether a given node will change its hierarchy level based on its neighboring nodes. We do this by learning from a large collection of graphs with random topological perturbations. The rationale behind this local modelling strategy is that the penalty arising from the hierarchy level of a node is determined by its neighboring nodes. This is because the penalty associated with this node involves conflicting edges between itself and its neighbors. We explain our method in detail next.
The first term in this equation is the normalized average contribution from the neighboring nodes. The second term is the prior of the given node v. The parameter α is a real number in the [0,1] interval. It controls the relative contributions of these two terms.
Recall that, in order to evaluate Equation (1), we need to compute the prior function f _{0} and the function P _{ ψ } that captures the conditional probability of the relative location of a node in the hierarchy conditioned on the hierarchy of its neighboring nodes. In the following, we explain how we compute these two functions in detail.
Computation of the prior function
The computation of f _{0} involves two parts. The first part trains a model using a large number of random graphs. The second part predicts the value of f _{0} for a given graph.
Algorithm 1 presents the first part. Briefly, for training, the algorithm takes two consecutive graphs G _{1} and G _{2} in the dynamically evolving sequence along with their hierarchical decompositions H _{1} and H _{2} as input. It then iterates over all the edges (lines 527) and identifies the ones that are not common to the two graphs (lines 613). These are the edges that have dynamically changed (i.e., inserted or removed) in the input graphs. For all such edges (u, v), it then counts the number of times we observe that these nodes have moved up, moved down, or stayed at the same level (lines 1415). Next, it records the same statistics for the neighboring nodes of u and v (lines 1625). Finally, it normalizes these counts with the total number of nodes in each class to find the fraction of nodes that move up, move down, or stay at the same level for each class (lines 2835).
Algorithm 2 presents the second part. This algorithm takes the two consecutive graphs G _{1} and G _{2} along with the hierarchical decompositions H _{1} of G _{1}, and the model M learned in Algorithm 1. It starts by initializing the prior function, so that all the nodes stay at the same level with 100% probability (lines 57). This defines the base case, where there is no change in the graph topology. It then iterates over all the edges (lines 931) and identifies the ones that are not common to the two graphs (lines 1117). For each such edge (u, v), it then updates the prior function for nodes u, v and all of the neighbors of u and v using the model M (lines 1829).
Algorithms for MOVEMENTOFNODE and POSITIONOFNODE functions
Computation of the function P _{ ψ } . The function P _{ ψ } computes the conditional probability of the relative locations (i.e, are they at the same level or is one above another) of the neighboring nodes in the hierarchy conditioned on each other. Algorithm 3 shows a pseudocode that describes how we learn this function.
The algorithm takes a graph G _{1}, its hierarchical decomposition H _{1}, and a potentially large integer N denoting the number of random hierarchy perturbations (i.e., number of times, the hierarchy of nodes will be altered) to be used for training as input. It iteratively alters the position of a node N times (line 8). At each iteration, it randomly picks a node and moves it to a different level in the hierarchy (lines 910). After this alteration, it recomputes the hierarchy of the rest of the nodes by fixing the location of this node at the newly selected location (line 11). We denote this new hierarchy with the function H _{2}. It then records the movement of the randomly selected node as well as those of all of its neighbors by comparing H _{1} and H _{2} (lines 1224). It records the number of times each of the relative movement class is observed over all these neighbors (lines 2526). Finally, it returns the distribution of the fraction of times each class is observed as the function P _{ ψ }.
How do we choose the dynamic node set?
So far, we have described how we compute the probability distribution for each node in G _{ i−1} to move up, down, or remain at the same level in the hierarchy based on the topological differences between G _{ i−1} and G _{ i }. Recall that the aim of DHIDEN is to recompute the hierarchy levels for only a small set of nodes (called dynamic node set) that can move in the hierarchy rather than recompute it for the entire graph. The final question we need to answer to complete the description of our method is how we construct the dynamic node set. To find the node in this set, we first sort all the nodes in G _{ i } in ascending order of their probability to stay unchanged in the hierarchy. More specifically, we sort all v∈V _{ i } in increasing order of \(x^{*}_{v,same}\). The reason behind this ranking strategy is that nodes with higher ranks have higher probability to change their hierarchy levels, and are more preferable to be a part of the dynamic node set.
Results
In this section, we evaluate the performance of the DHIDEN method extensively on both synthetic and real datasets. We compare the DHIDEN method to five currently available methods that can only deal with a static network topology: BFS, VertexSort, HINO, HIDEN, and DCHIDEN. To the best of our knowledge, our method is the first one to address the hierarchical decomposition problem for dynamically evolving networks. To use them for a sequence of dynamically evolving networks G=(G _{1},G _{2},…,G _{ K }), we run each of these competing methods independently on each G _{ i } in G. We compute the performance of these methods in terms of the resulting penalty and the running time.
Datasets
We use both synthetically generated and real gene regulatory network datasets in our experiments.
SYNTHETIC DATA SETS We randomly generate scalefree gene regulatory networks following the BarabasiAlbert model [10] for varying number of nodes and densities (i.e., number of edges per node). Using this model, we create the first network, G _{1}, in the sequence of dynamically evolving networks for each sequence, G=(G _{1},G _{2},…,G _{ K }). Let us denote each network G _{ i } in this sequence with G _{ i }=(V _{ i },E _{ i }). In order to construct the subsequent networks G _{ i } (1<i≤K) in each sequence, we mutated the topology of G _{ i−1} using the degree preserving edge shuffling method [25]. This technique is frequently used in the literature to alter network topology while ensuring that all nodes maintain their degrees. Briefly, each mutation step of this technique randomly picks two edges, such that the edges do not share a node. It then swaps the end points of those edges. For instance, assume that the two randomly selected edges are (u _{1}, v _{1}) and (u _{2}, v _{2}). The mutation step replaces these two edges with (u _{1}, v _{2}) and (u _{2}, v _{1}). Given a mutation rate r≥0, we perform \(\left \lceil \frac {r \times E_{i1}}{2}\right \rceil \) mutation steps on G _{ i−1} to generate G _{ i }. We use r=0.1 as the mutation rate in our experiments (i.e. at most 10% of the edges are swapped) unless otherwise specified.

We report the number of conflicting edges (see Definitions 4 and 5) as the penalty of a given hierarchical decomposition. Smaller values of this measure indicates better results.

Recall that our method aims to accelerate the hierarchical decomposition in dynamically evolving networks by predicting the nodes that are likely to move in the hierarchy. Thus, after the first network in the sequence, errors in prediction may increase the penalty of the resulting decomposition. We report the accuracy of our method in terms of how it compares to the exhaustive method, HIDEN, when HIDEN uses all the nodes and edges as variables for all networks in the sequence. Let us denote the penalties of the methods HIDEN and DHIDEN with penalty(HIDEN) and penalty(DHIDEN) respectively. We compute the accuracy as$$\frac{1 + \textrm{penalty(HIDEN)}}{1 + \textrm{penalty(DHIDEN)}}. $$
Accuracy takes a value in the (0,1] interval. Larger values indicate better results. It is worth mentioning that the mixed integer linear programming solution of the hierarchical decomposition problem can have time complexity exponential in the number of variables used. Therefore, the true optimal result can only be computed for very small networks which are not considered in this paper.
Implementation and sytem details We implemented the DHIDEN, HIDEN, DCHIDEN, BFS, VertexSort and HINO algorithms using MATLAB. We conducted all the experiments on a hexa core 4.5GHz CPU (Intel 3960x) 64 GBmemory computer, running the Linux operating system.
Evaluation of large scale networks
We first compare our method against the stateoftheart methods in the literature, namely BFS [11], VertexSort [12], HINO [13] and DCHIDEN [14] for large scale networks. We omit the HIDEN method in this experiment as it does not scale to networks with over 100 nodes.
DHIDEN yields significantly less penalty than all the competing methods for all network sizes (Figure 6(a)). Thus, it is the most accurate among all competing methods. Furthermore, the gap between the accuracy of DHIDEN and the rest of the methods grow consistently as network size increases. These results are highly encouraging especially given the fact that DHIDEN considers updating the hierarchy levels of only a small subset of nodes while all the other methods consider updating all the nodes.
To determine the computational cost of these methods next we compare the running times of the two most accurate methods, DHIDEN and DCHIDEN, as the remaining methods have extremely low accuracies. Figure 6(b) shows that DHIDEN is orders of magnitude faster than DCHIDEN. As the network size grows, DCHIDEN’s running time increases exponentially while DHIDEN’s running time increases linearly. As the network size grows to 1000 nodes, the penalty of the hierarchical decomposition of our method is 15% less than DCHIDEN and it is also about 100 times faster. These results suggest that as the network size increases, our method becomes even more advantageous.
Evaluation of DHIDEN under various network characteristics
So far in our experiments, we have demonstrated the superiority of DHIDEN against other scalable methods. Here, we focus further on our method and compare it against HIDEN. The HIDEN method is exhaustive as it minimizes the penalty while allowing all node and edges to be variables. On the other hand, DHIDEN creates variables for only a small fraction of the nodes and edges. Therefore, HIDEN yields the smallest possible penalty among all the hierarchical assignments (using the ILP method). Thus, the purpose of the next set of experiments is to determine how close our solutions are to those of HIDEN. We use synthetically generated scalefree networks of different characteristics in these experiments. Due to explosive increase in the running time of the HIDEN method, we limit our experiments to the largest possible network sizes and densities we could run HIDEN on our system.
Parameter settings used in our experiments for evaluating the impact of various network characteristics comparison
Evaluated  Network  Number of  Number of allowed 

characteristics  density  nodes  hierarchy levels 
Network density  [1,2,3]  50  5 
Number of nodes  2  [30,50,70]  5 
Hierarchy levels allowed  2  50  [3,5,10] 
Impact of network density
Impact of number of nodes
Impact of the number of allowed hierarchy levels
Evaluation of the impact of the number of dynamic steps
So far, we have analyzed the performance of our method for only one dynamic step. In other words, in the sequence of dynamically evolving networks G=(G _{1},…,G _{ K }), K was set to 2. In this experiment, we evaluate how the performance of our method is affected by the length of this sequence (i.e. K). The purpose is to explore how the error propagates as the deviation between the topology of the subsequent networks from that of the initial network grows with increasing value of K.
Evaluation on large networks
In this experiment, we generate synthetic network with 500 nodes and density equal to 2 to use as the initial network G _{1} in the sequence. We then create 25 subsequent networks G _{1},G _{2},…,G _{25}. To construct each G _{ i } (1<i≤25), we mutate G _{ i−1} with 0.1 mutation rate. We repeat this process 1000 times to construct 1000 such independent sequences of dynamically evolving networks. Because these networks are too large for the HIDEN method, we compare our method to the most accurate competing method (DCHIDEN) which can solve networks of this size.
Evaluation on small networks
Next, we use networks with a small number of nodes to compare the performance of our method against the exhaustive HIDEN method. More specifically, we construct networks with 50 nodes and density equal to 2. Similar to the previous experiment, we generate sequences of dynamically evolving networks with 25 networks in each sequence, and use 0.1 mutation rate while altering the topology of consecutive networks. We repeat this process 1000 times and report the average results.
We observe that as the number of mutation steps increase, the advantage of HIDEN over our method enlarges (Figure 10 (b)). This reveals that HIDEN is intrinsically better than our method in terms of accuracy only (note that again the computational cost of HIDEN is exponentially higher than our method as it exhausitively considers all nodes in all networks. Thus, for networks with over 100 nodes, it becomes intractable). The reason behind this trend is also similar to the previous experiment but this time the initial assignment of base networks were computed by HIDEN. Interestingly the curve converges quickly around step 10, where the penalty of our method is only about 25% higher than HIDEN. This is very promising, because even after mutating the initial network 24 times with 10% mutation at each step, the error incurred by our method does not grow significantly. This suggests that our method does not diverge from the optimal solution greatly even for a large number of dynamic steps.
Results on real data
To assess the applicability of our method to real biological regulatory networks, in this section we analyze four dynamically evolving cell lineages each containing three cell types. In the first three cell lineages embryonic stem cells are developed into hematopoietic stem cells, and the hematopoietic stem cells are developed into erythroid, Tlymphocyte and Blymphocyte respectively. In the fourth cell lineage, embryonic stem cells develop into skeletal myoblast and eventually to skeletal muscle.
Penalty comparison on real data
In the above sections, we have demonstrated the superiority of DHIDEN against the state of the art methods BFS, VertexSort, HINO and DCHIDEN on synthetic datasets. To show the applicability of our method to real gene regulatory networks, here we compare the accuracy of the DHIDEN method to these four methods on a real dataset (described above). In this comparison, we focus on the gene regulatory networks of the seven cell types in our dataset: hESC, CD34+, HSMM, K562, Th1, CD20+ and SKMC. To apply our method; we first find the initial hierarchical decomposition of the hESC gene regulatory network (first network in the chain) using DCHIDEN method and then determine the hierarchy assignments of the genes in the rest of the cells using DHIDEN. In the final step, we compute the penalties for each celltype based on the resulting hierarchy assignments. We apply the other methods as follows; we first find the hierarchical decomposition of all the networks in different cell types using the specific method and then compute the penalties for each network based on the resulting assignments.
Biological derivations on real data
Distribution of the number of genes to different levels for each cell line
Level  hESC  CD34+  HSMM  K562  Th1  CD20+  SKMC 

1  106  117  122  156  152  157  170 
2  95  94  91  90  95  92  90 
3  94  90  91  73  74  83  80 
4  92  90  89  71  81  73  75 
5  146  135  130  103  116  110  114 
Total  533  526  523  493  518  515  529 
Number of genes that move up, move down, stay same, or fluctuate in hierarchy rankings in different cell lineages during the cell development
Cell type  Move up  Move down  Stay same  Fluctuate 

K562  52  105  317  18 
Th1  64  104  332  8 
CN20+  53  118  328  8 
SKMC  44  119  337  16 
For example the MYOG gene moves up in the hierarchy for skeletal muscle cell lineage. Myogenin (MYOG) is the second musclespecific gene to be expressed in embryonic stem cells in the process of muscle cell development [26]. In a study in myoblasts Brunetti and Goldfine found MYOG to be important in muscle cell development [27]. Mice with a mutation in MYOG could develop myoblasts but the myoblasts could not further form functional myofibers; i.e. MYOG is important for later stages of muscle cell differentiation [28]. In addition, Hasty et al. 1993 found that when MYOG is knocked out in mice embryos, the embryos develop but die after birth [29]. These results are consistent with this gene moving up in the genetic hierarchy, as it becomes important later in the muscle cell development. One gene that moves down in the muscle cell lineage is MEF2C because of its importance in early myoblast development. In embryonic stem cells, this gene is used as a marker of cardiac cells and its expression is found to be increased with greater expression of miR499 [30]. In myoblasts, this gene is an important indicator of early muscle cell differentiation activity because of its importance in regulating skeletal muscle differentiation [31,32]. Taken together, genes that move up or down in the hierarchy seem to have a change in their function and relative importance. The literature supports this conclusion, although more research should be performed to assess the exact functions of these genes in these specific cell types at different stages in cell development.
STAT5 is another gene that moves up the hierarchy in the erythroid cell lineage. When active STAT5 is expressed in embryonic stem cells, the cells are more likely to undergo formation of hematopoietic stem cells (CD34+) [33]. STAT5 is suggested to be important for cell proliferation in hematopoietic stem cells. Overexpression of STAT5 in hematopoietic stem cells resulted in greater cell growth of these cells through the PI3kinase/AKT pathway [34]. Furthermore, if STAT5 is over expressed in hematopoietic stem cells (CD34+) there is increased renewal of these cells and the hematopoietic stem cells are much more likely to acquire erythroid cell fate [35]. Other researchers have found that in erythroid cells STAT5 is in its active, phosphorylated state and when STAT5 expression is knocked out, these cells exhibit reduced growth in vitro [36]. One gene that moves down in the erythroid cell lineage is GFI1. In embryonic stem cells, GFI1 and GFI1B are downstream targets of RUNX signaling and when RUNX is knocked out, GFI1 along with GFI1B promote hematopoiesis as in the studies with RUNX/, the cells display increased expression of the hematopoietic cell marker CD41 as well as a rounded shape [37]. This study suggests that in normal embryonic stem cell development, RUNX expression causes GFI1 expression to be reduced. Since GFI1 may prevent proliferation of hematopoietic stem cells [38], and GFI1 is a downstream target of C/EBP α, which prevents cell proliferation when GFI1 levels are low not all cells will become hematopoietic [39]. Thus the roles for the STAT5 and GFI1 genes are changing through development and it is likely that the hierarchy levels of these genes are consequently changing in the gene regulatory network. Our analysis suggests that the STAT5 gene moves up in hierarchy, and GFI1 moves down in the hierarchy. The literature cited above is not clear about the relative importance of these genes in different cell types. So, our method’s predictions suggest potentially new experiments to elucidate the changing role of these genes in different cell types. Clearly, a thorough experimental study is needed to find the relative position of these genes in the regulatory hierarchy in different cell types.
Discussion and conclusion
Biological systems are tightly controlled, and more importantly, they are affected by dynamic changes in the gene regulatory networks. Because of that, understanding the dynamic changes in gene regulatory networks is critical for a true comprehension of biological systems. One particularly important organizational feature of the gene regulatory networks is the hierarchy of the interactions. Hierarchy in gene regulatory networks describes the flow of control in biological systems. Major experimental and computational efforts are performed to establish the gene regulatory networks; however finding hierarchy in these networks is very challenging. There have been studies to discover the hierarchy in gene regulatory networks, however these studies mainly focused on static networks.
In this study, we presented a novel method named DHIDEN for discovering hierarchy in dynamic gene regulatory networks. To the best of our knowledge, this is the first method which can incrementally update the hierarchy in a sequence of dynamically evolving networks. DHIDEN formulates the hierarchy level assignment problem as a local mixed integer linear programming problem. We compared the DHIDEN method to five currently available methods, namely BFS, VertexSort, HINO, HIDEN and DCHIDEN, on synthetic and real gene regulatory networks. Our analysis on these networks demonstrated that the DHIDEN method outperforms the other five methods in terms of minimizing the conflicting edges in hierarchy. In addition to the superior accuracy level of DHIDEN methods, the running time for this method was also faster than the DCHIDEN method, which is the next best method in terms of accuracy. Application of the DHIDEN method to the human gene regulatory networks shows that human genes’ hierarchy levels change dynamically. These changes in the hierarchy levels are not random and reflect functional changes in the network. This method could be applied to other biological datasets such as in cancer biology or neuronal development.
Declarations
Acknowledgements
We thank the members of the Kahveci Lab for thoughtful discussions. This project was supported by NSF Career Award (IIS0845439) to TK and Carter Wallace Fellowship to AA.
Authors’ Affiliations
References
 She M, Ye X, Yan Y, Howit C, Belgard M, Ma W. Gene networks in the synthesis and deposition of protein polymers during grain development of wheat. Funct Integr Genomics. 2011; 11(1):23–35. doi:10.1007/s101420100196x.View ArticlePubMedGoogle Scholar
 Watson E, Walhout AJM. Caenorhabditis elegans metabolic gene regulatory networks govern the cellular economy. Trends Endocrinol Metab. 2014. doi:10.1016/j.tem.2014.03.004.Google Scholar
 Peter IS, Davidson EH. Evolution of gene regulatory networks controlling body plan development. Cell. 2011; 144(6):970–85. doi:10.1016/j.cell.2011.02.017.View ArticlePubMedPubMed CentralGoogle Scholar
 Ó’Maoiléidigh DS, Graciet E, Wellmer F. Gene networks controlling Arabidopsis thaliana flower development. New Phytol. 2014; 201(1):16–30. doi:10.1111/nph.12444.View ArticlePubMedGoogle Scholar
 Buckingham M, Rigby PWJ. Gene regulatory networks and transcriptional mechanisms that control myogenesis. Dev Cell. 2014; 28(3):225–38. doi:10.1016/j.devcel.2013.12.020.View ArticlePubMedGoogle Scholar
 Lander AD. How cells know where they are. Science. 2013; 339(6122):923–7. doi:10.1126/science.1224186.View ArticlePubMedPubMed CentralGoogle Scholar
 CsikászNagy A, Palmisano A, Zámborszky J. Molecular network dynamics of cell cycle control: transitions to start and finish. Methods Mol Biol. 2011; 761:277–91. doi:10.1007/9781617791826_19.View ArticlePubMedGoogle Scholar
 Belz GT, Nutt SL. Transcriptional programming of the dendritic cell network. Nat Rev Immunol. 2012; 12(2):101–13. doi:10.1038/nri3149.View ArticlePubMedGoogle Scholar
 Barabási AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004; 5:101–13. doi:10.1038/nrg1272.View ArticlePubMedGoogle Scholar
 Barabási A AR. Emergence of Scaling in Random Networks. Science. 1999; 286(5439):509–12. doi:10.1126/science.286.5439.509.View ArticlePubMedGoogle Scholar
 Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci USA. 2006; 103:14724–31. doi:10.1073/pnas.0508637103.View ArticlePubMedPubMed CentralGoogle Scholar
 Jothi R, Balaji S, Wuster A, Grochow JA, Gsponer J, Przytycka TM, et al. Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture. Mol Syst Biol. 2009; 5:294. doi:10.1038/msb.2009.52.View ArticlePubMedPubMed CentralGoogle Scholar
 Hartsperger ML, Strache R, Stümpflen V. HiNO: An approach for inferring hierarchical organization from regulatory networks. PLoS ONE. 2010; 5. doi:10.1371/journal.pone.0013698.Google Scholar
 Gulsoy G, Bandhyopadhyay N, Kahveci T. HIDEN: Hierarchical decomposition of regulatory networks. 2012. doi:10.1186/14712105 13250.Google Scholar
 Bhardwaj N, Kim PM, Gerstein MB. Rewiring of transcriptional regulatory networks: hierarchy, rather than connectivity, better reflects the importance of regulators. Sci Signaling. 2010; 3:79. doi:10.1126/scisignal.2001014.View ArticleGoogle Scholar
 Bhardwaj N, Yan KK, Gerstein MB. Analysis of diverse regulatory networks in a hierarchical context shows consistent tendencies for collaboration in the middle levels. Proc Nat Acad Sci USA. 2010; 107:6841–6846. doi:10.1073/pnas.0910867107.View ArticlePubMedPubMed CentralGoogle Scholar
 Ma HW, Buer J, Zeng AP. Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new topdown approach. BMC Bioinformatics. 2004; 5:199. doi:10.1186/147121055199.View ArticlePubMedPubMed CentralGoogle Scholar
 Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP. An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res. 2004; 32:6643–9. doi:10.1093/nar/gkh1009.View ArticlePubMedPubMed CentralGoogle Scholar
 Cosentino Lagomarsino M, Jona P, Bassetti B, Isambert H. Hierarchy and feedback in the evolution of the Escherichia coli transcription network. Proc Nat Acad Sci USA. 2007; 104:5516–20. doi:10.1073/pnas.0609023104. 0701035.View ArticlePubMedPubMed CentralGoogle Scholar
 FreyreGonzález JA, AlonsoPavón JA, TreviñoQuintanilla LG, ColladoVides J. Functional architecture of Escherichia coli: new insights provided by a natural decomposition approach. Genome Biol. 2008; 9:154. doi:10.1186/gb2008910r154.View ArticleGoogle Scholar
 Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012; 150:1274–86. doi:10.1016/j.cell.2012.04.040.View ArticlePubMedPubMed CentralGoogle Scholar
 Han JDJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, et al. Evidence for dynamically organized modularity in the yeast proteinprotein interaction network. Nature. 2004; 430:88–93. doi:10.1038/nature02795.View ArticlePubMedGoogle Scholar
 Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA. Structure and evolution of transcriptional regulatory networks. 2004. doi:10.1016/j.sbi.2004.05.004.Google Scholar
 Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, et al. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004; 431:99–104. doi:10.1038/nature02800.View ArticlePubMedPubMed CentralGoogle Scholar
 Scott J, Ideker T, Karp RM, Sharan R. Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks. J Comput Biol. 2006; 13(2):133–44.View ArticlePubMedGoogle Scholar
 Rohwedel J, Maltsev V, Bober E, Arnold HH, Hescheler J, Wobus AM. Muscle cell differentiation of embryonic stem cells reflects myogenesis in vivo: developmentally regulated expression of myogenic determination genes and functional expression of ionic currents. Dev Biol. 1994; 164:87–101. doi:10.1006/dbio.1994.1182.View ArticlePubMedGoogle Scholar
 Brunetti A, Goldfine ID. Role of myogenin is myoblast differentiation and its regulation by fibroblast growth factor. J Biol Chem. 1990; 265:5960–3.PubMedGoogle Scholar
 Arnold HH, Braun T. Targeted inactivation of myogenic factor genes reveals their role during mouse myogenesis: A review. Int J Dev Biol. 1996; 40:345–53.PubMedGoogle Scholar
 Hasty P, Bradley A, Morris JH, Edmondson DG, Venuti JM, Olson EN, et al. Muscle deficiency and neonatal death in mice with a targeted mutation in the myogenin gene. Nature. 1993; 364:501–6. doi:10.1038/364501a0.View ArticlePubMedGoogle Scholar
 Wilson KD, Hu S, Venkatasubrahmanyam S, Fu JD, Sun N, Abilez OJ, et al. Dynamic microRNA expression programs during cardiac differentiation of human embryonic stem cells: role for miR499. Circ Cardiovasc Genet. 2010; 3:426–35. doi:10.1161/CIRCGENETICS.109.934281.View ArticlePubMedPubMed CentralGoogle Scholar
 Gossett LA, Kelvin DJ, Sternberg EA, Olson EN. A new myocytespecific enhancerbinding factor that recognizes a conserved element associated with multiple musclespecific genes. Mol Cell Biol. 1989; 9:5022–33. doi:10.1128/MCB.9.11.5022.Updated.View ArticlePubMedPubMed CentralGoogle Scholar
 Potthoff MJ, Olson EN. MEF2: a central regulator of diverse developmental programs. Dev (Cambridge, England). 2007; 134:4131–40. doi:10.1242/dev.008367.View ArticleGoogle Scholar
 Kyba M, Perlingeiro RCR, Hoover RR, Lu CW, Pierce J, Daley GQ. Enhanced hematopoietic differentiation of embryonic stem cells conditionally expressing Stat5. Proc Nat Acad Sci USA. 2003; 100 Suppl 1:11904–10. doi:10.1073/pnas.1734140100.View ArticlePubMedGoogle Scholar
 Santos SC, Lacronique V, Bouchaert I, Monni R, Bernard O, Gisselbrecht S, et al. Constitutively active STAT5 variants induce growth and survival of hematopoietic cells through a PI 3kinase/Akt dependent pathway. Oncogene. 2001; 20:2080–90. doi:10.1038/sj.onc.1204308.01.View ArticlePubMedGoogle Scholar
 Schuringa JJ, Chung KY, Morrone G, Moore MAS. Constitutive activation of STAT5A promotes human hematopoietic stem cell selfrenewal and erythroid differentiation. J Exp Med. 2004; 200:623–35. doi:10.1084/jem.20041024.View ArticlePubMedPubMed CentralGoogle Scholar
 de Groot RP, Raaijmakers JA, Lammers JW, Jove R, Koenderman L. STAT5 activation by BCRAbl contributes to transformation of K562 leukemia cells. Blood. 1999; 94:1108–12.PubMedGoogle Scholar
 Lancrin C, Mazan M, Stefanska M, Patel R, Lichtinger M, Costa G, et al. GFI1 and GFI1B control the loss of endothelial identity of hemogenic endothelium during hematopoietic commitment. Blood. 2012; 120:314–22. doi:10.1182/blood201110386094.View ArticlePubMedGoogle Scholar
 Hock H, Hamblen MJ, Rooke HM, Schindler JW, Saleque S, Fujiwara Y, et al. Gfi1 restricts proliferation and preserves functional integrity of haematopoietic stem cells. Nature. 2004; 431:1002–7. doi:10.1038/nature02994.View ArticlePubMedGoogle Scholar
 Lidonnici MR, Audia A, Soliera AR, Prisco M, FerrariAmorotti G, Waldron T, et al. Expression of the transcriptional repressor Gfi1 is regulated by C/EBP α and is involved in its proliferation and colony formationinhibitory effects in p210BCR/ABLexpressing cells. Cancer Res. 2010; 70:7949–59. doi:10.1158/00085472.CAN101667.View ArticlePubMedPubMed CentralGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.