
Software

Open

 Published:
Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors
BMC Bioinformaticsvolume 13, Article number: 281 (2012)
Abstract
Background
Techniques for reconstruction of biological networks which are based on perturbation experiments often predict direct interactions between nodes that do not exist. Transitive reduction removes such relations if they can be explained by an indirect path of influences. The existing algorithms for transitive reduction are sequential and might suffer from too long run times for large networks. They also exhibit the anomaly that some existing direct interactions are also removed.
Results
We develop efficient scalable parallel algorithms for transitive reduction on general purpose graphics processing units for both standard (unweighted) and weighted graphs. Edge weights are regarded as uncertainties of interactions. A direct interaction is removed only if there exists an indirect interaction path between the same nodes which is strictly more certain than the direct one. This is a refinement of the removal condition for the unweighted graphs and avoids to a great extent the erroneous elimination of direct edges.
Conclusions
Parallel implementations of these algorithms can achieve speedups of two orders of magnitude compared to their sequential counterparts. Our experiments show that: i) taking into account the edge weights improves the reconstruction quality compared to the unweighted case; ii) it is advantageous not to distinguish between positive and negative interactions since this lowers the complexity of the algorithms from NPcomplete to polynomial without loss of quality.
Background
Techniques for the reconstruction of biological networks, such as genetic, metabolic or signaling networks, are used for getting insight into the inner mechanisms of the cell. They are usually based on perturbation experiments, e.g., gene knockouts or knockdowns, in which one or more network nodes (e.g. genes) are systematically perturbed and the effects on the other nodes are observed. More concretely, in the context of genetic regulatory networks with knockout experiments, the nodes of the networks are genes. Each gene is knocked out at least once and the expressions of the other genes are measured for each knockout experiment. The expression change with regard to the unperturbed wild type defines the influence of the knocked out gene on the other genes. Based on that, connections between the genes can be established. Using the difference in the expression between the perturbed and the wild type, weights and signs can be associated with the connections to quantify the influence and indicate over and underexpression, respectively.
An important problem in this kind of network reconstruction is that direct connections between genes might be established that are spurious, i.e., do not exist in reality. We illustrate this with the following example. Suppose that the transcription factor A, produced by gene a, is needed to activate gene b. The activation of b results in the production of transcription factor B which is encoded by b. Let B on its turn activate gene c, which is manifested by the production of the corresponding protein C. Also, assume that genes b and c cannot be activated by any other gene/transcription factor. Now, if we disable gene a, for instance by deleting it from the genome, the production of transcription factor A is prevented and as a result B and C will not be produced. Thus, our measurement of the expression of genes a, b, and c will show that a directly influences both b and c. In the reconstructed networks we prefer to directly relate two nodes only if there is a direct influence between them. So, a direct connection between a and c, resulting from the transitive influence via b, would be obsolete, and the process of removing such indirect influences is called transitive reduction (TR) [1].
There are several ways to remove spurious direct relations depending on the representation of the biological networks. For instance, in [2] this is done by comparing the measured influence strength between the direct and indirect interactions, i.e., an interaction chains that do not contain the direct interaction. The direct interaction is removed if it is weaker than the last edge of the indirect one. Similarly, in the genetic networks inference tool Aracne the method of data processing inequality is used to remove indirect interactions [3]. This method works for undirected networks. For each triple of genes their pairwise interactions are checked. The one with the lowest score is assumed to be a result of an indirect interaction of the other two. Other approaches to remove obsolete interactions can be found in [4].
Using TR for filtering out spurious connections was proposed by Wagner [5]. The rule is to remove the direct interaction for each gene pair g and g^{′}, if there is an alternative chain of interactions between g and g^{′}. Going back to our example above, TR would mean that the undesired edge between a and c, caused by the indirect interaction, is removed from the network.
The first algorithms for TR date from the seventies [1] and there are several other relevant publications on this subject [6, 7]. In all previous works on TR the networks were represented as directed graphs without weights on the edges. However, in the network inference algorithms interaction strengths between genes usually play an important role. Motivated by this fact, the concept of TR of weighted graphs was introduced [4, 8], where the weights correspond to interaction strengths. Both these papers take interaction signs (promoting or inhibiting) into account.
In [4] several types of TR are described depending on how individual edge weights are extended to paths to quantify the indirect interactions. One of the variants of TR coincides with the definition that we employ in this paper. However, the interaction strength along paths, which is actually used in [4], is defined quite differently from ours. Also, they do not use thresholds in their definitions.
With regard to the definition of transitive closure for weighted graphs and the general theoretical background, the closest to our work is [9]. Their work is also motivated by a biological application, in particular, the analysis of proteinprotein interaction networks. The authors define the notion of transitive closure of weighted graphs, but stop short of introducing TR. Originally, we were inspired by their ideas about the transitive measure of interaction along a path in the network.
Unlike the previous work in [4, 8], we adopt an approach that disregards the interaction signs. A benefit of this decision is that the TR algorithms are of polynomial complexity and amenable to parallelisation. In contrast, the algorithm in [8] is in the worst case NPcomplete, which means that its runtime grows exponentially with the size of the networks. Tests with the Dream challenges [10] show that the omission of signs does not incur any degradation in overall performance compared to the signed weighted TR of [8].
We present parallel versions of the TR algorithms for both unweighted and weighted directed graphs. These algorithms are developed for general purpose graphics processing units (GPUs). GPUs have been extensively used for various applications, including bioinformatics and systems biology. Since GPUs are standard in modern computers, parallel algorithms become an attractive possibility to speed up computations.
The crucial idea of TR on GPUs is to formulate the algorithm in terms of matrix operations. Since GPUs are very efficient in implementing the latter, this results in a remarkable speedup so that networks consisting of tens of thousands of nodes can be handled within seconds on a standard desktop computer.
Parallel algorithms for computing transitive closure of a graph, which is closely related to TR, have been developed before e.g., [11, 12]. However, the only work that we could find that deals explicitly with parallel algorithms for TR is [13]. Their algorithms are restricted to unweighted graphs and claim efficiency for the special case of sparse graphs. To the best of our knowledge, there does not exist a previous implementation of a TR algorithm on GPUs and thus, our work is novel in that sense, too.
Approach for unweighted graphs
We adopt a graphtheoretic framework to formally represent biological networks. A directed graph G=(V,E) is a pair of sets of nodes V and edges E⊆V×V, i.e., an edge e∈E is an ordered pair of nodes (v,v^{′})∈V×V. Without loss of generality, we identify V with an arbitrary but fixed order represented by the set of numbers $\underset{\u2013}{n}=\{1,2,\dots ,n\}$, where n is the number of nodes in V. For example, in genetic networks, we associate the nodes with genes and the edges with interactions between genes. More formally, a gene i is connected to a gene j via an edge e=(i,j) if and only if i influences j. The graph can be equivalently represented by an adjacency matrix A. For standard graphs which do not have weights associated with the edges, the elements A_{i,j} of the matrix have value 1, if there is an edge from nodes i to node j or 0, otherwise.
A path from node i to node j is a sequence P_{ ij }=(k_{0},k_{1},…,k_{ m }), where k_{0}=i, k_{ m }=j and (k_{l−1},k_{ l })∈E, for 0<l≤m. Nodes k_{ l }, where 1≤l≤m−1, are called intermediate nodes. A cycle is a path P_{ ij } whose first and last node coincide, i.e., i=j. A cycle that consists of one edge is called a selfloop. A graph is acyclic if it does not contain any cycles. We denote the set of edges of P_{ ij }with Edges(P_{ ij }). Let Paths(G) and Paths(i,j) denote the sets of all paths in the graph G and all paths from i to j, respectively.
Definition 1 (Unweighted Transitive Closure and Reduction)
The transitive closure of a graph G is the graph G^{T}=(V E^{T}) with (i j)∈E^{T}if and only if there exists a path from i to j in G. The transitive reduction of an acyclic graph G is the unique [1] smallest graph G^{t}=(V E^{t}), i.e., with the least number of edges, such that (G^{t})^{T}=G^{T}.
Intuitively, this means that the transitive closure is preserved by the reduction, i.e., no information about reachability is lost. For an acyclic graph G it can be shown [1] that the TR G^{t} can be obtained by removing each redundant edge (i j)∈E from the original graph G for which there is an indirect path, i.e., not including edge (i j), between i and j in G. An example of TR for acyclic graphs is given in Figure 1.
The definition of TR can be extended in a natural way to graphs with cycles. However, the reduced graphs are not unique and in general cannot be generated by deleting edges from the graph (see Figure 2). To solve this, Aho, Garey and Ullman [1] shrink the strongly connected components of the graph to single nodes and apply TR on the resulting acyclic graph. After the reduction, we expand the components as they were in the original graph, as is done in [14].
Extension to weighted graphs
We aim at modelling experiments that use node perturbations, e.g., gene knockouts, for the reconstruction of interactions between nodes. We already saw in the example above that spurious direct interactions are added if there exist an indirect path between two genes. Hence, the outcome of the experiments actually produces a transitive closure of the real (original) network. By applying TR as a kind of inverse operation of the transitive closure, we try to cancel these side effects by removing direct interactions between two nodes, if there is an alternative indirect path between them. Finding the TR amounts to reconstruction of the network as by removing those direct interactions we usually obtain a good approximation of the real network.
However, sometimes both direct and indirect interactions can exist at the same time. Examples of this are the feedforward loops that occur in the genetic networks of many organisms [15]. Unfortunately, in such cases, the TR as defined above will still remove the direct interaction. To avoid this anomaly, we use the notion of TR on weighted graphs where the weights represent interaction uncertainties. Knowing the interaction uncertainty allows to refine the edge removal criterion: an edge (i j) is removed from the original graph only if it is less certain than some indirect interaction between i and j. In other words, if an edge is at least as certain as all indirect interactions, then it is kept in the graph.
The interaction uncertainties are represented as weights of the edges. Formally, we associate to a directed graph G=(V,E) a function $w:E\to \mathbb{R}$, which maps each edge to a real number, to obtain a weighted directed graph G=(V,E,w). The adjacency matrix A becomes a matrix of weights where the special value ⊤>_{maxe∈E}{w(e)} denotes the absence of an edge between two nodes. There are several plausible ways to choose the weight function w. In this paper we assume that the weights are pvalues or similar measures, i.e., of the interval [0,1], which are obtained from the postprocessing of the perturbation experiments. Thus, the smaller the weight w, the less uncertain is the interaction between two genes.
Definition 2 ((Minimal) Transitive Interaction Uncertainty)
Transitive interaction uncertainty along a path is defined as a function $W:\mathrm{Paths}\left(G\right)\to \mathbb{R}$ that maps each path to a real number. We apply the principle of the weakest link and define W(P)=_{maxe∈Edges(P)}{w(e)} for a path P. Then, for a given edge (i,j)∈E we define the minimal transitive interaction uncertainty$h:E\to \mathbb{R}$ as the strongest weakest link, i.e., the minimal transitive path uncertainty over all paths between nodes i and j:
Note that if edge (i,j)∈E, then (i,j)∈Paths(i,j). This implies directly that h(i,j)≤w(i,j). Recall that the criterion for preserving an edge in the reduced graph is that its uncertainty is not greater than the minimal uncertainty of the indirect paths, i.e., w(i,j)≤h(i,j).
By putting the last two inequalities together, we can refine the edge preservation criterion to w(i,j)=h(i,j) and obtain the following definition:
Definition 3 (Weighted Transitive Reduction)
The transitive reduction of a weighted graph (TR) G=(V,E,w) is the graph G^{t}=(V,E^{t},w^{t}) with E^{t}={e∈E∣w(e)=h(e)} and w^{t}(e)=w(e) for all e∈E^{t}.
Informally, edge (i,j) is kept in E^{t} if and only if its weight/uncertainty equals the minimal transitive uncertainty of all paths between i and j. The edge weights remain the same in the reduced graph. It is worth noting that the above definition of TR for weighted graphs supports the presence of cycles. An example of TR for weighted graphs is given in Figure 3.
Of course, there are other possible options to define the path weights based on the edge weights. For instance, summing up the edge weights to obtain the weight of the path is in some cases even a more natural choice than the maxmin (weakest link) approach that we use. However, in the case when pvalues (or similar metrics in the interval [0,1], e.g., correlation) are used, this is not the best option. Summing up the pvalues of all edges in the path can produce a result which is greater than 1, i.e., something which is not a probability. Since a trivial path consisting of only one edge is also a path, it is preferable to have a weight for nontrivial paths that is of the same nature as the edge weight.
In general, the refined notion of TR with weights does not entirely resolve the anomaly of removing a direct interaction which exists in the network. One way to further improve the filtering of the edges is to use thresholds. We introduce a lower threshold t_{low} determining that any edge e with w(e)≤t_{low} is unconditionally kept in E^{t}, i.e., regardless of the existence of more certain indirect interactions. In this way we ensure that interactions which are measured with high certainty are not removed from the network. Similarly, we use an upper threshold t_{up} such that any edge e with w(e)≥t_{up} is unconditionally removed from the network. Hence, very uncertain connections are always removed from the original graph G. This filtering with threshold t_{up}is actually independent of the TR concept and it can be done as a pre or postprocessing step.
This can be shown by reasoning towards a contradiction. Assume that thresholding (TH) and TR are dependent, in other words, that the final result depends on the order in which TH and TR are performed. Say we have a graph G for which this holds. Then, there must be at least one edge e from a node v to a node v^{′} which is not present in either TH(G^{t}) or TH(G)^{t}, but it is in the other. Let it not be present in TH(G^{t}) (the case when it is not in TH(G)^{t}is similar). Then, either it was removed by TR (case 1), or by TH (case 2). Case 1: If e was removed by TR, there must exist a path P in G between v and v^{′} with W(P)<w(e). The assumption was that e is still present in TH(G)^{t}. Therefore, we must have w(e)<t_{up}. But then, also W(P)<t_{up}, in other words, P must exist in TH(G). The existence of P in TH(G) means that e must be removed from TH(G) when applying TR, leading to a contradiction. Case 2: If e was removed by TH, then w(e)>=t_{up}. But then, it would also be removed when applying TH on G, leading to a contradiction.
Using a threshold t_{low}splits the edge weights into two sets. Within the sets the difference between values does not play any role. For instance, assuming t_{low} = 0.5, the difference between the edge weights 0.8 and 0.7 is the same as between 0.16 and 0.06. An analogous remark holds also for t_{up}.
For many graph problems, the unweighted problem is often a special case of a more general weighted problem. For example, an algorithm to determine shortest paths in a weighted graph can be used to find shortest paths in unweighted graphs by assigning the same positive weight to every edge. For our problem of (weighted) TR, a similar analogy does not hold, i.e., we cannot use the algorithm for weighted cyclic graphs to calculate the TR of an unweighted cyclic graph. This fact results from the different natures of our definitions: For the weighted case, we choose the greatest weight on a path without considering its length. For the unweighted case, however, we will be adding up the edges of paths to obtain their length (cf. next section). The first approach is not affected by cycles – the transitive interaction uncertainty of a path with cycles is always greater or equal to the one for the same path with all cycles removed, thus, cycles are ignored “automatically” when searching a path with the minimal transitive interaction uncertainty. In contrast, for the second approach, cycles must be actively detected to ensure that only paths in which no node occurs more than once are considered (the longest simple path problem is NPhard). Figure 4 illustrates this with the path (a b a c) which has a length greater than one but still there is no indirect alternative path from a to c. For a similar reason, our algorithm for cyclic weighted graphs cannot be adapted to cyclic signed weighted graphs. The problems with cycles in signed graphs are discussed in [8].
Implementation
After briefly discussing the emergence of manycore processors and the resulting need for parallelisation of programs, this section presents our parallelised algorithms for transitive reduction: first, for unweighted acyclic graphs and afterwards the extension to weighted cyclic graphs. Finally, we discuss the problems with cycles in unweighted graphs.
Nvidia Cuda GPUs
Although Moore’s Law[16], stating that the number of transistors per chip doubles roughly every 18 months, still holds, the clock frequency of the processors does not increase exponentially anymore. Thus, to keep improving the processor performance, the manufacturers turned to processors with multiple cores. In particular, General Purpose Graphics Processing Units (GPUs) are an example of massively parallel manycore systems available in desktop computers for an affordable price. However, to benefit from these many cores, programs have to be redesigned for the new architectures.
The Compute Unified Device Architecture (Cuda) [17] is Nvidia’s Cbased approach to program GPUs. Sequential parts, including input and output operations, are executed “as usual” on the host CPU while parallel parts are executed on the GPU device by calling a special kind of functions, called kernels. The body of a kernel is executed by a configurable number of threads, each having its own ID. The ID determines the data, e.g., part of a matrix in memory, which is processed by the thread. As the GPU device has no access to the host’s main memory, the required data needs to be copied explicitly between them.
Algorithm for unweighted acyclic graphs
For obtaining the TR of an unweighted acyclic graph G, typically its transitive closure G^{T} would be calculated first. Then, in a second step, all edges e=(i j)∈E would be removed if there exists a node k, k≠i and k≠j, such that (i k)∈E^{T} and (k j)∈E^{T}, i.e., there exists an alternative path (i,…,k,…j) from i to j in G and, since the graph is acyclic, not using e. Consequently, e is redundant because of the alternative path. The second step is necessary since the transitive closure adds new shortcuts but does not identify shortcuts already existing in G. Instead of just calculating the transitive closure, we determine the length of the longest path connecting every pair of nodes (i j), i≠j. This can be done efficiently using a variant of the FloydWarshall algorithm [18–20] since the graph is acyclic. Afterwards, we delete all direct edges (i j)∈E for which a path from i to j of length at least two exists. Obviously, for this condition the considered lengths can be bounded to two as it does not matter whether the alternative path is exactly of length two or longer.
Algorithm 1 gives a pseudocode description of our approach. First, the integervalued adjacency matrix A is loaded on the host and copied to the GPU; this copy is called B (lines 1–2). Afterwards, in the nested forloops in lines 3–6, the length of the longest path for all pairs of nodes is calculated by a parallel version of the FloydWarshall algorithm. Whenever there is a path of length at least one from i to k, i.e., B_{i,k}≥1, and one from k to j, i.e., B_{k,j}≥1, this gives a path from i to j of length at least two. Since we are interested only in the existence of an indirect path and not in its actual length, we limit the length of the longest paths to two. Therefore, B_{i,j} is set to two, denoting that there exists an indirect path between i and j. The last parallel forloop deletes all edges (i,j)∈V×V for which such a detour exists, i.e., B_{i,j}=2, by setting B_{i,j}:=0 (lines 7–9). Finally, the transitively reduced matrix B is copied from the GPU to A on the host which stores it to a file.
Algorithm 1
Pseudocode description of parallelised transitive reduction of unweighted acyclic graphs Requires: Adjacency matrix A
1: read A from input file
2: copy A from host to B on GPU
3: for k=1,…,n do sequentially
4:
for $(i,j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$ do in parallel
5:
if B_{i,k}≥1 and B_{k,j}≥1then
6:
B_{i,j}:=2
7: for$(i;j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$do in parallel
8:
if B_{i,j}=2then
9:
B_{i,j}:=0
10: copy B from GPU to A on host
11: write A to output file
In the Cuda implementation, the parallel doloop over $(i,j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$ is realized by starting n^{2}threads of a kernel containing the loop’s body. Thus, according to its thread ID, each thread executes the loop’s body for one particular $(i,j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$. In contrast, the sequential doloop is performed “as usual” by the main program on the CPU.
Complexity
The algorithm iterates n + 1 times over the n^{2}elements of the matrix: n times in lines 3–6 and finally once in lines 7–9. However, different steps of the same iteration can run in parallel on different processors. Thus, the overall time complexity depending on the number of processors p is: $O\left(\left(n+1\right)\ast \u2308\frac{{n}^{2}}{p}\u2309\right)$. The PRAM model assumes an arbitrary large number of available processors – in particular we can assume p≥n^{2}. In this model, the time complexity becomes O(n) linear in the number of nodes. For the sequential case, i.e., p=1, we have time complexity O(n^{3}) as for the FloydWarshall algorithm. Similarly, the space (memory) complexity is O(n^{2}).
Correctness
We claim that the output matrix is the TR of the input and prove this for the sequential version, i.e., with sequential execution of the parallel loops, first, before considering the specifics of parallelisation. Thereby we can rely on two things: The correctness of the FloydWarshall algorithm (cf. [18]) and the fact that the TR of an unweighted acyclic directed graph can be constructed by successively removing redundant edges from the graph in any arbitrary order ([1], p. 133). It remains to show that our algorithm deletes exactly these redundant edges. An edge (i j) is deleted if B_{i,j}=2 holds before lines 7–9, i.e., there exists a path from i to j of length at least two. Since the graph is acyclic, this means that an indirect path from i to j without the direct edge (i j) exists. Thus, the edge (i j) is redundant and can be deleted. On the other hand, if an edge (i j) is redundant, an indirect path from i to j without the edge (i j) exists. This path must have length at least two. Consequently, we have B_{i,j}=2 and the edge will be deleted.
For the correctness of the parallelised algorithm, we have to show that the steps performed in parallel are independent and do not interfere. For the parallel doloop in lines 7–9 this is obvious as each iteration reads from and writes to its individual memory location. Analogously, all iterations of the inner loop of the FloydWarshall algorithm in lines 4–6 write to different memory locations. Furthermore, it cannot happen that for a fixed k_{0} the elements ${B}_{{i}_{0},{k}_{0}}$ and ${B}_{{k}_{0},{j}_{0}}$ read in the iteration for (i_{0},j_{0}) were already changed by the iterations for (i_{0},k_{0}) and (k_{0},j_{0}), respectively, since both check ${B}_{{k}_{0},{k}_{0}}\ge 1$ which is never true since the graph is acyclic. The outer loop cannot be parallelised as it is crucial for the correctness of the FloydWarshall algorithm that all calculations of the k^{th} iteration are completed before the (k + 1)^{th}iteration starts.
Algorithm for weighted (cyclic) graphs
In order to obtain the TR of a weighted graph, according to the definition in Section “Background”, all edges (i,j)∈E for which an alternative path from i to j with a better transitive interaction uncertainty, i.e., h(i,j)<w(i,j), exists have to be removed. Consequently, the core of our algorithm is the calculation of the minimal transitive interaction uncertainties h(i,j) for every pair of nodes (i,j)∈V^{2}. Again, we use a variant of the FloydWarshall algorithm whose pseudocode description can be found in lines 3–7 of Algorithm 2. The matrix B is initialised with the original adjacency matrix (lines 1–2), i.e., B_{i,j}=w(i,j), if (i,j)∈E, otherwise B_{i,j}=⊤. Then, for increasing k=1,…,n all paths between each pair of nodes are successively considered that use only 1,…,k as intermediate nodes. The idea behind this is as follows: Let B_{i,j}, B_{i,k}, and B_{k,j}contain the minimal transitive interaction uncertainty of all paths that go from i to j, from i to k, and from k to j, respectively, and that use only intermediate nodes up to k−1. Then, the minimal transitive interaction uncertainty of all paths from i to j that use intermediate nodes up to and including k is the maximum of the minimal transitive interaction uncertainties of the path segment from i to k and from k to j (line 5). If this value is smaller than the minimal transitive interaction uncertainty calculated so far, B_{i,j} is updated to it (lines 6–7). Figure 5 illustrates this principle. In the last step of the FloydWarshall algorithm, all paths with intermediate nodes up to n, which are in fact all paths, are considered. After this, the edges (i,j)∈E fulfilling the removal condition can be deleted by setting B_{i,j}:=⊤ (lines 8–10) and the reduced matrix can be copied to and stored by the host (lines 11–12)
Algorithm 2.
Pseudocode description of parallelised transitive reduction of weighted, potentially cyclic, graphs Requires: Adjacency matrix A, thresholds t_{low}and t_{up}
1: read A from input file
2: copy A from host to B on GPU
3: for k=1,…, n do sequentially
4:
for $(i,j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$ do in parallel
5:
viaK:=max(B_{i,k}),B_{k,j}
6:
if (B_{i,j}<0 or B_{i,j}>t_{low}) and viaK<B_{i,j}then
7:
B_{i,j}:=−viaK
8: for$(i,j)\in \underset{\u2013}{n}\times \underset{\u2013}{n}$do in parallel
9:
if is Negative(B_{i,j}) or B_{i,j}≥t_{up}then
10:
B_{i,j}: = T
11: copy B from GPU to A on host
12: write A to output file
However, two things are slightly more involved in the presented pseudocode: First, to save memory, we use just one matrix B on the GPU for the whole computation. The problem is that if a path from i to j with a smaller transitive interaction uncertainty than the interaction uncertainty of the direct edge (i,j) is found, the original value in B_{i,j}is overwritten. Thus, some necessary information to check the removal condition h(i,j)<w(i,j) is lost on the GPU. Yet, if this situation arises we already know that the edge (i,j) is to be deleted and we use the sign bit to indicate that. Consequently, instead of checking the original removal condition, we check for negative values in B and delete the corresponding edges in the post processing step (lines 8–10). Note that it is not possible to remove such edges (i,j) directly during the FloydWarshall algorithm since the absolute values for B_{i,j}might be needed for later iterations. If, for example, in the second iteration (k=2=c) in Figure 5, case (c), the edge 3^{→0. 8}4 was directly removed and not replaced by 3^{→0. 3}4, the algorithm could not find in the next iteration (k=3=b) that 1^{→0. 1}3^{→0. 3}4 has a better (transitive) interaction uncertainty, namely 0. 3, than the direct edge 1^{→0. 5}4.
The second aspect in the pseudocode is the incorporation of thresholding. The upper threshold t_{up} is applied in line 9 (B_{i,j}≥t_{up}), ensuring that all edges with this or a bigger weight are always deleted. Due to the onematrix representation, the lower threshold t_{low}cannot be applied in this post processing fashion. Instead, every edge with this or a lower weight is skipped in line 6 (B_{i,j}>t_{low}) so that its original weight is preserved in B_{i,j}.
Time complexity
The same as for Algorithm 1.
Correctness
We claim that the output of our algorithm is the thresholded weighted TR of the provided input, i.e., that in the end B_{i,j}=w(i,j), for (i,j)∈E^{t}, and otherwise B_{i,j}=⊤ holds. Again, we prove the correctness of the sequential algorithm first, before we argue about the issues that come up with its parallelisation. Furthermore, we rely on the correctness of the FloydWarshall algorithm, i.e., that it correctly determines for each pair of nodes the minimum of the maximal weights of all paths between them.
First, we show that all edges which are not in E^{t}are deleted from B (“ B ⊆ E^{t}”). Initially, every matrix element B_{i,j} is nonnegative. If a matrix element B_{i,j}is changed for the first time by the FloydWarshall algorithm, then a negative value (−viaK) is written to it. This can happen only in two cases: i) For the edge (i,j)∈E that is not forced to be kept by the lower threshold, i.e., w(i,j)>t_{low}, a path via some node k with a smaller transitive interaction uncertainty is found. Thus, this edge is to be deleted to obtain the TR. ii) A path from i to j via some node k was found and (i,j)∉E, i.e., no direct edge exists in the input graph. Of course, this edge must not be in the TR. Although the absolute value might change in later iterations if an even better path is found, once B_{i,j} becomes negative it stays negative. Consequently, in both cases, these edges will be removed in line 10. Note that for nonexisting edges (i,j)∉E for which no path from i to j could be found the value of B_{i,j}stays ⊤ during the whole algorithm. Finally, the post processing removes all of the remaining edges (i,j)∈E that are excluded from the TR by the upper threshold, i.e., w(v_{ i },v_{ j })≥t_{up}, as for them the condition B_{i,j}≥t_{up}holds.
One might wonder whether the protection of edges by the lower threshold affects the correctness of the FloydWarshall algorithm. Consider, for example, the situation depicted in Figure 6 and assume t_{low}=0. 5: The edge b^{→0. 5}c is preserved although the better path b^{→0. 2}a^{→0. 3}c exists. Consequently, for the direct edge d→c the best alternative path P=d^{→0. 1}b^{→0. 2}a^{→0. 3}c with W(P)=0. 3 is not found but only ${P}^{\prime}=d\stackrel{0.1}{\to}b\stackrel{0.5}{\to}c$ with W(P^{′})=0. 5. Nevertheless, that still leads to a correct final result: If w(d,c)≤t_{low}, e.g., 0. 4, this edge is not removed even though the better alternative path P exists, since it is protected by the lower threshold. In contrast, if w(d,c)>t_{low}, e.g., 0. 6, this edge is not protected by the lower threshold and should be removed. This indeed happens since the path found (P^{′}) is better.
It remains to show, that the edges that are in E^{t}and their weights are preserved in B (“ B ⊇ E^{t}”): As reasoned above, the matrix entry B_{i,j}for an edge (i,j)∈E is changed only if a better path can be found and the edge is not protected by the lower threshold. Conversely, it stays unchanged and thus still holds the values w(i,j) for all edges (i,j)∈E for which no better path exists or that are protected by the lower threshold.
Finally, we have to argue that the parallelisation of the algorithm does not break its correctness. For the post processing in lines 8–10 this is definitely the case since every iteration reads only from and writes only to its individual memory location B_{i,j}. As already discussed for the algorithm for unweighted acyclic graphs, it is crucial that the outer loop (line 6) of the FloydWarshall algorithm over k is executed sequentially. The inner loop (lines 7–8) however can be parallelised since for a fixed k_{0} of the outer loop every iteration (i_{0},j_{0}) writes only to its own memory location ${B}_{{i}_{0},{j}_{0}}$ and reads only the values from ${B}_{{i}_{0},{k}_{0}}$ and ${B}_{{k}_{0},{j}_{0}}$ which are not changed throughout the whole k_{0}th outer iteration. The values of ${B}_{{i}_{0},{k}_{0}}$ and ${B}_{{k}_{0},{j}_{0}}$ would be changed only if $max\left(\left{B}_{{i}_{0},{k}_{0}}\right,\left{B}_{{k}_{0},{k}_{0}}\right\right)<\left{B}_{{i}_{0},{k}_{0}}\right$ or $max\left(\left{B}_{{k}_{0},{k}_{0}}\right,\left{B}_{{k}_{0},{j}_{0}}\right\right)<\left{B}_{{i}_{0},{k}_{0}}\right$, respectively, holds. However, max(…,r,…)<r never holds for any $r\in \mathbb{R}$.
Results and discussion
We implemented our unweighted and weighted transitive reduction (TR) algorithms in the tools Cutteru and Cutterw, respectively. We performed two sets of experiments with these tools which were aimed at showing their scalability as well as their competitiveness and quality with regard to similar techniques.
Scalability experiments with SynTReN generated graphs
In this set of experiments we used networks of various sizes (1,000, 2,500, and 10,000 nodes), with and without weights, as inputs for the TR algorithms. The unweighted networks were generated using the Directed Scale Free Graph algorithm [21] and the ErdősRényi algorithm [22]. To obtain the weighted graphs from them, the networks were simulated using the tool SynTReN[23]^{a}. Using moderated ttest, the pvalues for the interactions between each node pair were generated and used as edge weights.
The goal of this set of experiments was to test the scalability of Cutterw and Cutteru. We were interested in the speedups achieved by the parallel versions of our TR algorithms with regard to various sequential counterparts. All these experiments have been performed on (one core of) a personal computer with a 2.67 GHz Intel Core i7 CPU 920, and 12 GB RAM, running Ubuntu 10.10. For the parallel executions, we used Cuda 4.0 and Nvidia driver version 4.0 with an Nvidia GeForce GTX 480 graphics card with 1.5 GB VRAM and 480 streaming multiprocessor cores, each running at 1.4 GHz.
For each graph size, we considered both acyclic unweighted and cyclic weighted graphs. We tested five implementations of TR algorithms. On acyclic unweighted graphs we applied Wagner’s algorithm [5], the sequential and the parallel implementations of Cutteru. For the weighted cyclic graphs we compared the sequential and parallel implementations of Cutterw.
The results of the algorithms are summarized in Tables 1 and 2. For the graphs of size 1,000 and 2,500 we considered five different graphs, whereas for the size 10,000 we used one scalefree graph^{b}. For each graph the absolute runtimes were measured five times. The results for all runs on all graphs of the same size and type were very similar (with a spread less than 5%) and the averages are shown in Tables 1 and 2. In each table, the first three rows correspond to Wagner’s algorithm, the sequential Cutteru, and the parallel Cutteru, respectively. The last three rows give the relative speedups. One can see that the speedups increase with the graph size. For instance, for a weighted scalefree graph of 10,000 nodes, the parallel implementation is 92 times faster than the sequential one, which makes a difference between one and a half hours versus two minutes of reconstruction time.
It is important to note that the structure and the density of an input graph has no significant effect on the performance of the TR algorithms. This can be concluded when comparing the results in Tables 1 and 2. Additional experiments that we performed with dense ErdősRényi graphs further confirm this^{c}. In the case of dense ErdősRényi graphs the average runtimes for the weighted graphs of size 1000 and 2500 nodes are 2.50 and 7.36, which are comparable with their counterparts in Table 2. For the weighted graphs, this is not surprising, since each pair of nodes has a pvalue assigned to it (e.g., recall from the Background section that “no edge” is represented by ⊤.). This means that for, e.g., a graph G with 10,000 nodes, the TR algorithms have to process 100,000,000 pvalues, no matter what the edge density of G is. Thus, one can conclude that Cutterw never requires significantly more than two minutes to reduce a graph of that size (Table 1).
Quality experiments with the Dream 4 benchmark
In recent years, the Dialogue of Reverse Engineering Assessments and Methods (Dream) [10] challenge on in silico generated networks reconstruction has become an important benchmark. We tested Cutterw, which implements our TR algorithm for weighted graphs, using the fourth challenge (Dream 4). The first goal of this set of experiments was to test the performance of Cutterw compared to other stateoftheart tools which had participated in the Dream 4 competition. Therefore, we compare Cutterw with Cutteru as well as with Transwesd[8], a tool which is also based on transitive reduction of weighted graphs. Since, besides the parallelisation, the main difference between Cutterw and Transwesd is that the latter takes interaction signs into account, it was important to see how disregarding them affects the quality of the reduction in our approach. Our second goal was to investigate to which extent using interaction signs and the weights contributes to the reduction quality.
In our evaluation with the Dream suite we repeated the tests which were applied to Transwesd in [8]. In this way we wanted to ensure maximally fair comparison between the tools. Thus, we tested Cutterw for the data set of the InSilico_Size_100 subchallenge which can be downloaded from [24]. The data set consists of five networks of 100 nodes which are parts of real networks from E.coli and yeast. Noisy measurements representing steady state mRNA expression levels are generated in silico using GenNet Weaver[25]. We used only the data from the simulated knockout and knockdown experiments, ignoring the time series data. The original networks (gold standards) are known. (In the real competition they were provided, of course, later.) This makes the evaluation of the results much more objective than in the case of real networks, for which usually there is no consensus among the experts. To simulate the conditions of the real competition we tuned the parameters for Cutterw (the thresholds) based on the results obtained with the Dream 3 challenge – in the same way like this was done for Transwesd in [8].
The network reconstruction was done in two steps, described in more detail in [8]. In the first step, the socalled perturbation graph is produced. In the second step, a (weighted) TR is applied on the graph to remove spurious edges. For all three reduction tools, Cutteru, Transwesd, and Cutterw we used as input the perturbation graphs produced by the first step of Transwesd. The edge weights were generated as conditional correlations from the knockout and knockdown data as in [8]. To generate those graphs we used the Matlab programs which were kindly provided by the authors of Transwesd. In the case of Cutteru we simply disregarded both the weights and the signs in the perturbation graphs. Besides that, before applying Cutteru, the possibly cyclic perturbation graph was transformed into an acyclic graph, whose nodes are strongly connected components, as described in Section “Background”.
The output files representing the reconstructed graphs were evaluated using the corresponding Matlab scripts provided by the Dream 4 challenge. The output files were formed by dividing the edges in three classes. The first class contains the accepted edges, the second, the edges accepted by the first step, but rejected in the second (TR) step, and the third one, the edges rejected already in the first step, i.e., already in the production of the perturbation graph. Each class is sorted based on the weights.
In the Transwesd tests in [8] the output files were composed by using only two classes of edges. The accepted edges after the second step were followed by the rejected ones. The edges within the same class were sorted in descending order according to their weights, like in our output files described above.
The results of our experiments are given in Table 3. For each of the five networks we give the results 1) without TR for the generated raw perturbation graphs, 2) with Cutteru, 3) with TR using Transwesd and 4) with Cutterw. The results with Transwesd are slightly different from the ones in [8]. This is due to a minor improvement in Transwesd as well as to the above mentioned different manner for producing the output files which were evaluated by the Dream 4 challenge.
AUPR (area under the precisionrecall curve) and AUROC (area under the receiveroperator curve) are quite standard scoring metrics for binary classifiers, computed using the TP (true positives), TN (true negatives), FP (false positives), and FN (false negatives). For the definitions and a more detailed discussion on the scoring metrics for the Dream challenges see [10]. In all cases there is a clear gain from the weighted TR compared to the input perturbation graph. TR removes a significant number of false positives at the price of just a few false negatives. Cutterw has on average 6% better AUPR than the perturbation graph. Also there is a clear improvement in the results with the weighted graphs compared to Cutteru. Besides the significantly smaller number of false positives, Cutterw has on average 7% better AUPR than Cutteru with AUROCs being almost the same. The results of Cutterw and Transwesd are quite similar. Cutterw consistently generates significantly smaller number of false positives, which is compensated by Transwesd with more true positives, except for Network 1.
The overall Dream 4 score is calculated as $\mathrm{lo}{g}_{10}\sqrt{{P}_{1}\times {P}_{2}}$, where P_{1}and P_{2} are the overall pAUPR and pAUROC values, respectively. The latter are obtained as geometric means of the individual pAUPR and pAUROC values of each of the five networks. Intuitively, the pAUPR and pAUROC are pvalues that indicate how much our results, and in particular AUPR and AUROC, are better than randomly chosen networks. Some of these intermediate parameters were not included in the table, since they are summarized in the overall score. For the definitions of all of the above mentioned parameters see [10]. The overall Dream 4 score for Cutterw is 70.93, for Transwesd it is 70.59 and for the unreduced perturbation graphs, however, it is only 68.27. With that we would have been ranked third behind the two best teams who had participated in the Dream 4 challenge at the time, with the scores 71.59 and 71.30, respectively. Considering that Cutterw uses only part of the available data, i.e., only knockout and knockdown expression levels, this is quite an encouraging result. The improvement with regard to Transwesd is marginal. However, our intention was to show that disregarding the interaction signs, as we do in Cutterw in practice does not result in loss of quality compared to Transwesd.
The results with Cutterw can be even further improved by combining the methods of [8, 14]. When generating the perturbation graph we use both knockout and knockdown data and the same parameters as for Transwesd. However, in the statistics we use instead of the wild type the average expression, like in [14]. For the transitive reduction step we generate the weights as before with Transwesd. As a result, we achieve the overall Dream 4 score 73.33, which is better than the above mentioned winning performance.
At first sight, the fact that by omitting the signs there is virtually no loss of quality of reconstruction can be paradoxical. However, in many cases the relative loss of information by the omission of the edge signs is compensated by the lower threshold t_{low} in the TR algorithm. We illustrate this with the following example. Consider a subnetwork of three nodes a, b, and c. Let there be a direct positive edge between a and b and an indirect negative influence path consisting of a negative edge between a and c and a positive edge between c and b. Being of opposite sign, the indirect influence via c does not explain (a,b). Nevertheless, an unsigned algorithm without a lower threshold will still remove edge (a,b). However, one can assume that the direct influence between a and b will result in a strong original weight (small uncertainty) on the edge (a,b). This will prevent an unsigned algorithm with a lower threshold to remove the edge.
Conclusions
We presented parallel versions of algorithms for transitive reduction (TR) to reconstruct perturbation networks. The main improvement of our algorithms compared to the existing methods is the speedup and scalability without loss of reconstruction quality. Moreover, our algorithms are applicable to both weighted and unweighted networks. The gain of the TR is significant since it mostly removes spurious direct interactions, which are overlooked by the first filtering step that produces the socalled perturbation graph.
We implemented our algorithms in the tool Cutter. Compared to similar approaches, like Transwesd, Cutter provides the same reconstruction quality, as measured by the Dream challenge benchmark. This is achieved despite the fact that we do not use signs of network interactions (note that this applies only to the TR phase; after reconstruction, the signs can be restored). The gain of this simplification is that our method can be efficiently parallelised, hence reconstruction is much faster and scalable. The algorithm in Transwesd is NPcomplete, whereas our algorithms have a polynomial time complexity O(n^{3}), where n is the number of nodes. This will be of utmost importance in the future when reconstruction methods will have to deal with whole genome knockouts or “hybrid” networks of genes, proteins and signaling molecules, i.e., with networks containing tens or even hundreds of thousands of nodes.
Since the TR algorithms depend on the threshold, fine tuning of this parameter might require several experiments. The advantage of obtaining the results within seconds or in the worst case minutes, instead of hours, can be very significant. Currently, it takes less than a couple of minutes to process with Cutterw weighted graphs of 10,000 nodes and potentially 100 million edges.
Our weighted TR algorithm is independent of the nature of weights. Therefore, instead of correlations which were used for the Transwesd perturbation graphs, one can use pvalues or other statistical estimates of the interaction strengths. Also, it might be interesting to explore alternative definitions of path weights. For example, the max operator, which was crucial in the definition of the path weight, can be replaced by any associative operator, like addition or multiplication.
Finally, although our approach is already quite effective despite its simplicity, it is worth considering combining it with other reconstruction methods.
Availability and requirements
● Project name: Cutter
● Project home page:http://www.win.tue.nl/emcmc/cutter
● Operating system(s): Linux, Mac OS, Windows
● Programming language: C, CUDA
● Other requirements: CUDA
● License: none
● Any restrictions to use by nonacademics: none
Endnotes
^{a}A special version of SynTReN, kindly provided by its authors, was used that allows simulation of single gene perturbation experiments. For more information see [26].^{b}The reason for this is that, unfortunately, generating graphs of size 10,000 with SynTReN is very time and resource consuming.^{c}These additional experimental results can be obtained at [26].
References
 1.
Aho AV, Garey MR, Ullman JD: The transitive reduction of a directed graph. SIAM J Comput 1972, 1(2):131–137. 10.1137/0201008
 2.
Hu Z, Killion P, Iyer V: Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet 2007, 39(5):683–687. 10.1038/ng2012
 3.
Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A: Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006, 7(Suppl 1):S7. 10.1186/147121057S1S7
 4.
Tresch A, Beissbarth T, Sültmann H, Kuner R, Poustka A, Buness A: Discrimination of direct and indirect interactions in a network of regulatory effects. J Comput Biol 2007, 14(9):1217–1228. 10.1089/cmb.2007.0085
 5.
Wagner A: How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps. Bioinformatics 2001, 17(12):1183–1197. 10.1093/bioinformatics/17.12.1183
 6.
Gries D, Martin AJ, van de Snepscheut JL, Udding JT: An algorithm for transitive reduction of an acyclic graph. Sci Comp Progr 1989, 12(2):151–155. 10.1016/01676423(89)900397
 7.
Goralčíková A, Koubek V: A reductandclosure algorithm for graphs. In Mathematical Foundations of Computer Science 1979, Volume 74 of Lecture Notes in Computer Science. Edited by: Becvár J. Berlin/Heidelberg: Springer; 1979:301–307.
 8.
Klamt S, Flassig RJ, Sundmacher K: Transwesd: inferring cellular networks with transitive reduction. Bioinformatics 2010, 26: 2160–2168. 10.1093/bioinformatics/btq342
 9.
Ding C, He X, Xiong H, Peng H, Holbrook SR: Transitive closure and metric inequality of weighted graphs – detecting protein interaction modules using cliques. Int J Data Min Bioinformatics 2006, 1(2):162–177. 10.1504/IJDMB.2006.010854
 10.
Stolovitzky G, Prill RJ, Califano A: Lessons from the Dream2 Challenges. Ann New York Acad Sci 2009, 1158: 159–195. 10.1111/j.17496632.2009.04497.x
 11.
Hirschberg DS: Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, STOC ’76. New York: ACM; 1976:55–57. http://doi.acm.org/10.1145/800113.803631
 12.
Wang BF, Chen GH: Constant time algorithms for the transitive closure and some related graph problems on processor arrays with reconfigurable bus systems. IEEE Trans Parallel Distrib Syst 1990, 1(4):500–507. 10.1109/71.80177
 13.
Chang P, Henschen L: Parallel transitive closure and transitive reduction algorithms. In Int Conference on Databases, Parallel Architectures and Their Applications, PARBASE90. Washington D.C.: IEEE Computer Society Press; 1990:152–154.
 14.
Pinna A, Soranzo N, de la Fuente A: From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PLoS ONE 2010, 5(10):e12912. 10.1371/journal.pone.0012912
 15.
Mangan S, Alon U: Structure and function of the feedforward loop network motif. PNAS 2003, 100: 11980–11985. 10.1073/pnas.2133841100
 16.
Moore G: Cramming more components onto integrated circuits. Electronics 1965, 38(10):114–117.
 17.
Parallel Programming and Computing Platform — CUDA — NVIDIA http://www.nvidia.com/object/cuda_home_new.html
 18.
Warshall S: A theorem on boolean matrices. J ACM 1962, 9: 11–12. 10.1145/321105.321107
 19.
Floyd RW: Algorithm 97: Shortest path. CommunACM 1962, 5(6):345.
 20.
Micikevicius P: GPU computing for protein structure prediction. In GPU Gems 2: Programming Techniques for HighPerformance Graphics and GeneralPurpose Computation. Edited by: Pharr M, Fernando R. London: AddisonWesley; 2005:695–702. http://www.thedreamproject.org
 21.
Bollobás B, Borgs C, Chayes J, Riordan O: Directed scalefree graphs. In Proceedings of the Fourteenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’03. Philadelphia: Society for Industrial and Applied Mathematics; 2003:132–139.
 22.
Erdős P, Rényi A: On the evolution of random graphs. Publications of the Math Ins Hungarian Acad Sci 1960, 5: 17–61.
 23.
Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, De Moor B, Marchal K: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006, 7: 43. http://www.biomedcentral.com/1471–2105/7/43 10.1186/14712105743
 24.
The DREAM project http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM_Project
 25.
Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of Reverse engineering methods. J Comput Biol 2009, 16(2):229–239. 10.1089/cmb.2008.09TT
 26.
CUTTER  CUDA Technology based TransitivE Reduction of Networks http://www.win.tue.nl/emcmc/cutter
Acknowledgements
We would like to thank Perry Moerland, Barbera van Schaik, Piet Molenaar, and Mark van den Brand for the inspiring discussions, Steffen Klamt for the feedback on Transwesd, and Damian Sulewski and Stefan Edelkamp for the technical support with the GPUs.
Partially supported by the project Efficient MultiCore Model Checking (pr. nr. 612.063.816) funded by the Netherlands Organisation for Scientific Research (NWO) and by the BioRange program of The Netherlands Bioinformatics Centre (NBIC).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DB, WL, and PH conceived the project. DB came up with the idea to use GPUs for TR. All authors contributed to the theoretical part of the paper. DB, MO, and AW drafted the manuscript. MO and AW implemented the TR algorithms, performed the experiments and evaluated them. WL, AW, MO and DB implemented part of the support programs. All authors read and approved the final manuscript.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Adjacency Matrix
 Edge Weight
 Direct Edge
 Weighted Graph
 Transitive Closure