Given a graph G = (V, E) and a grid of h rows and w columns, we define a cost function for mappings of nodes to grid points and show an algorithm that finds the mapping of nodes, minimizing the cost function in a greedy manner. The cost function is defined by the weighted sum of four components:
(a) Attraction force F
a
(d(P (v), P (u))) between pairs of adjacent nodes v and u in the graph G, where P (v) and P (u) are grid points to which v and u are mapped, respectively, and d(P (v), P (u)) is the distance between two grid points P (v) and P (u).
(b) Repulsion force F
r
(d(P (v), P (u))) between any pairs of nodes v and u.
(c) Number of edge-edge crossings ∑e.f∈EI
e
(e, f), where I
e
(e, f ) is a binary function that returns 1 if e and f cross with each other and 0 otherwise.
(d) Number of node-edge crossings ∑u∈V,e∈EI
n
(v, e) where I
n
(v, e) is a binary function that returns 1 if v and e cross with each other and 0 otherwise.
Formally, the cost function is given by
where
(v) is the set of adjacent nodes of v, and w
a
, w
r
, w
e
, and w
n
∈ R+ are weights for the components.
Search algorithm
In grid layout, nodes are mapped to different grid points, i.e., no grid point is occupied by more than one node. Our algorithm optimizes the cost function by moving a node to an empty grid point at each step in a greedy manner. Note that, given positional constraints, nodes are allowed to be moved only to empty grid points satisfying the positional constraints, e.g., if a node is localized only in the cellular membrane, it can be mapped only to those grid points corresponding to cellular membrane. The above operation can be performed by calculating delta cost, which is the cost difference by the movement of a node to a grid point, for all nodes and for all vacant grid points. Although a naïve algorithm requires O(|V|2·h·w) time to find the movement that reduces the cost most at each step, we devise an efficient method that requires O(|E|2·min(h, w) + h·w) time for finding the movement, which is described below.
Efficient calculation of spring force
Repulsion force for a node v is given by
where the function P (i) returns the grid point to which i ∈ is mapped. Checking the movement of a node to all the vacant grid points requires |V|·h·w calculations, and hence, O(|V|2·h·w) time is required in total at each step.
Although the above naïve calculation has a higher time complexity than existing grid layout algorithms, we propose an efficient calculation. When v is moved from P (v) to q, the repulsion force for v is given by:
Because the term ∑u∈VF
r
(d(q, P(u))) in the above equation depends on q, but not on v, by calculating ∑u∈VF
r
(d(q, P(u))) for all the vacant points q initially, the calculation of c
r
(v) requires a constant time. The term ∑u∈VF
r
(d(q, P(u))) for all the vacant points requires O(|V|·h·w) time, and |V|·h·w movements are considered at each step. Therefore, in total, O(|V|·h·w) time is required at each step to calculate the repulsion force.
For the attraction force, the delta cost Δv, pinduced by the movement of a node v to grid point p can be calculated by considering the attraction force between v and its adjacent nodes. In addition, the movement of a node v influences the delta costs only for v and its adjacent nodes, i.e., the delta costs for its non-adjacent nodes at the previous and current steps are the same. Thus, by using the cached delta costs obtained at the previous step, we can calculate the delta costs efficiently. If v is moved from p to q at the previous step, the delta cost for the movement of v to r can be updated by
and for a node u in
(v) to r,
Efficient counting of edge-edge and node-edge crossings
The delta cost caching technique is used for counting crossings as well. When v is moved at the previous step, the following cases need to be considered for calculating the delta costs induced by the movement of node u.
(i) edge-edge crossing between e
u
∈ E
u
and e
v
∈ E
v
, where E
v
and E
u
are the sets of edges connected to v and u, respectively.
(ii) node-edge crossing between e
u
∈ E
u
and v.
(iii) node-edge crossing between e
v
∈ E
v
and u.
(iv) edge-edge crossing between edge e(u, v) and E\(E
u
∪ E
v
) if edge e(v, u) exists.
(v) node-edge crossing between edge e(u, v) and V\{v, u} if edge e(v, u) exists.
In a naïve way, the crossings of the above cases are counted in each movement of a node to a grid point. Thus, the above cases (i), (ii), (iii), (iv), and (v) may respectively require O(|E
u
||E
v
|), O(|E
u
|), (|E
v
|), O(E), and O(|V|) time. Thus, each movement of a node u requires O(|E
u
||E
v
|) time if u ∈
(v) and O(|E
u
||E
v
| + |E|) time otherwise. Hence, in total, O(h·w·deg(v)|E|) time is required at each step, where deg(v) is the degree of v.
These time complexities can be reduced by using more sophisticated crossing counting algorithms [29–31]. In this study, we employ the sweep calculation algorithm [22], which is known to require less time complexity than even sophisticated crossing counting algorithms under the assumption that h and w are proportional to
and the average degree is bounded by O(|V1/4). The grid resolution in the former assumption is commonly employed in existing grid layout algorithms [19–22]. In addition, because the biological networks we are motivated to tackle can be modeled as scale-free networks whose average degree is bounded by a constant value [32], the latter assumption is reasonable.
Given an edge e, a node v connected with e, and a set of edges F ⊆ E on the grid, we consider the counting of crossings between e and edges in F for the movement of v to each grid point. Unlike conventional crossing counting algorithms, the sweep calculation can simultaneously count the crossings for all the movements of v in O(|F|·min(h, w) + h·w) time [22]. Because node-edge crossings can be counted in a manner similar to the case of edge-edge crossings, by replacing the number of edges with the number of nodes, the time complexity for counting node-edge crossings is obtained. Therefore, for the five cases mentioned above, the sweep calculation simultaneously counts crossings for mappings of u to q for all grid points q in O(|E
u
||E
v
|·min(h, w) + h·w), O(|E
u
|·min(h, w) + h·w), O(|E
u
|·min(h, w) + h·w), O(|E|·min(h, w) + h·w), and for (v) O(|V|·min(h, w) + h·w) time, respectively. Thus, the algorithm using sweep calculation requires O(deg(v)|E|·min(h, w) + h·w·|V|) time at each step.
Time complexity at the initial step
The calculation of delta costs at the initial step requires more computational time than those at latter steps because no cached delta costs are available. Here, the time complexity for the first step is analyzed for each component.
(a) Repulsion force: The computation of repulsion forces does not rely on the cached delta costs. Thus, O(|V|·h·w) time is required.
(b) Attraction force: Because attraction forces between a node v and its adjacent nodes
(v) are calculated, O(deg(v)) time is required for each movement of v. Thus, O(|E|·h·w) time is required in total.
(c) Edge-edge crossing: Because crossings between edges in E
v
and other edges are checked for the movement of a node v, O(|E|2·min(h, w) + h·w) time is required by sweep calculation.
(d) Node-edge crossing: When a node v is moved, we need to consider two cases: (i) crossings between edges in E
v
and all nodes other than v, and (ii) crossings between v, and all the edges other than edges in E
v
. Thus, O(|E||V|·min(h, w) + h·w) time is required by sweep calculation.
From the above analysis, the proposed algorithm requires O(|E|2·min(h, w) + h·w) time at the initial step.
Procedures for resizing and repositioning of compartments
The resizing and repositioning of compartments are mainly comprised of the following procedures:
(i) The size of each compartment is updated according to the distribution range of nodes localized in the compartment.
(ii) The position of the compartment is updated in such a way that the center of the compartment is close to the center of gravity of nodes localized to it.
For the resizing of each compartment in step (i), we fist calculate
and
where v
c
is a node localized to the compartment c, b
c
is the center of gravity of v
c
(d the nodes localized to c, and d
v
(·,·) and d
h
(·,·) return vertical and horizontal distance of v
c
and b
c
, respectively. Then, if s
v
< 0.4 × the width of the compartment and s
h
< 0.4 × the height of the compartment, the compartment is shrunk to one level smaller size (0.95 times as large as the current size, in our setting). On the other hand, s
v
< 0.9 × the width of the compartment and 2 value s
h
< 0.9 × the height of the compartment, the compartment is enlarged to one level larger size (1/0.95 time as large as the current size). For the limitation of the scaling, the compartment cannot be shrunk if its current size is smaller than 0.6 times of its original size, while it cannot be enlarged if its size is larger than 1.5 times of its original size.
For step (ii), the position of the compartment that minimizes the distance of the center of compartment and the center of gravity of nodes are searched. For an easier implementation, we discredited the center of compartment and the center of gravity of nodes to some grid points and employed the Manhattan distance for the distance measure. Positioning is searched in the limited distance from the center of gravity, which is set to 10 in our setting. if the compartment is resized. Also, for the search procedure, the following two conditions must be satisfied:
For the efficiency and simplicity of checking the second condition, we only consider overlapping of the rectangles that surround the compartments. Overlapping of these rectangles can be detected by checking if at one of four corners are in the other rectangle. If no valid position can be found in the above procedure, the size of the compartment is turned back to its previous size of step (i) and then step (ii) is applied again. If no valid position is still not found, then its current size and position are used for the next step. When several nodes are located close to the surface of a compartment, its size and position cannot be updated to a better condition as resizing and repositioning of the compartment violate the localization of these nodes. In order to avoid the case, we introduce the following cost function to nodes located within one grid distance from the surface of the compartments defined as α·exp(-βl), where α and β are respectively set to 20·(w + r) and 0.002 from an empirical rule and l is the number of updated steps. Due to the above cost function, the placement of nodes close to the surface of the compartments is avoided and then the compartments can be updated to a better size and position with higher probability. In addition, since the above cost function converges to zero with increasing update steps l, the convergence of the search is guaranteed.
Next, we consider the time complexity of the dynamic compartment update. For step (i), the calculation of s
h
and s
c
require O(|V
c
|) time for a compartment c, where V
c
is the set of nodes localized to c. Resizing the compartment c requires O(w
c
·h
c
) time, where w
c
and h
c
are width and height of the compartment c. Thus, in total, O(|V| + w·h) = O(w·h) time is required for step (i). For step (ii), checking the violation of localization information of every node requires O(|V) time for each movement of a compartment even in a naïve way. In addition, at worst, each compartment is moved to all the grid points in the limited distance from the center of gravity and the number of them are obviously less than the number of grid points. Checking the overlapping of a pair of compartment requires constant time. Since the number of compartments are limited (in our setting, at most three), which can be considered as a constant, the time complexity of step (ii) requires O(w·h·|V|) time at worst case. Actually, since the number of grid points searched for the repositioning of compartments are limited, the time complexity for the dynamic compartment update is not heavy in practice, which is supported by the comparison of running time of the proposed algorithm with and without the dynamic compartment update in Figure 10, 11, and 12.