We assume to have observed two paired biological networks, each represented by a graph, denoted by \(\mathcal {A}=(V,E_{A})\) and \(\mathcal {B}=(V,E_{B})\), respectively. Both graphs are defined on a common set, V={1,2,…,N}. The respective sets E
A
and E
B
of edges indicate the connection between the nodes in the two graphs. We also let the matrices A=(A
ij
) and B=(B
ij
) denote the two (N×N) adjacency matrices associated with graphs \(\mathcal {A}\) and \(\mathcal {B}\), respectively.
The Hamming distance (HD) between \(\mathcal {A}\) and \(\mathcal {B}\) provides a commonly used metric to quantify the difference between the networks, and is defined by \(\frac {1}{2} \text {tr}\left [(A-B)^{2}\right ]\), where tr[ ·] denotes the trace of a matrix. This distance takes into account the number of edges that are in common between the two networks. Here we propose an extension of this metric, which we call the Generalised Hamming Distance (GHD), defined as
$$ \text{GHD}(\mathcal{A},\mathcal{B}) = \frac {1}{N(N-1)} \sum\limits_{i,j} \left(a'_{ij} - b'_{ij} \right)^{2}, $$
((1))
where a
ij′ and b
ij′ are mean-centred edge weights defined as
$$a'_{ij} = a_{ij} - \frac{1}{N(N-1)}\sum\limits_{i,j}a_{ij}, \quad b'_{ij} = b_{ij} - \frac{1}{N(N-1)}\sum\limits_{i,j}b_{ij} $$
and \(\sum _{i,j}\) denotes summation over distinct i and j. The edge weights, which depend on the topology of the networks, provide a measure of connectivity between every pair of nodes i and j in \(\mathcal {A}\) and \(\mathcal {B}\), respectively. When a
ij
and b
ij
are binary values indicating the presence or absence of an edge, i.e. are the elements of A and B, \(\text {GHD}(\mathcal {A},\mathcal {B})\) is related to the HD. The specific node weights we propose here instead quantify the topological overlap (TO) between a pair of nodes by taking into account the local neighbourhood structure around those nodes [21]. In the literature, the TO measure has been successfully applied for the detection of communities in biological networks, and there is empirical evidence that it carries biological meaning [7, 22].
We use the one-step TO between nodes i and j indicating whether they share direct connections to other nodes. The weights are obtained from the adjacency matrix as follows:
$$ a_{ij} = \frac{\sum_{l\ne i,j}A_{il}A_{lj}+A_{ij}}{\min\left(\sum_{l\ne i}A_{il}-A_{ij},\sum_{l\ne j}A_{il}-A_{ij}\right) +1}, $$
((2))
when i≠j, and otherwise a
ij
=1, and analogously for b
ij
. The GHD sums the squared differences \((a^{\prime }_{\textit {ij}}-b^{\prime }_{\textit {ij}})^{2}\) over all pairs of nodes in the network. Note that the term \(\sum _{l\ne i,j} A_{\textit {il}}A_{\textit {lj}}\) is a count of all vertexes (i,l,j) containing node pair (i,j). This term measures the connectivity information of each (i,j) pair plus their common one-step neighbours. The denominator in (2) can be written as min(d
i
,d
j
)+1−A
ij
, where d
i
and d
j
represent the node degrees of i and j, respectively. It is roughly equal to the smaller of (d
i
,d
j
) and normalises a
ij
such that 0≤a
ij
≤1. A large discrepancy between \(a^{\prime }_{\textit {ij}}\) and \(b^{\prime }_{\textit {ij}}\) indicates a topological difference localised around that pair of nodes.
By exploring the neighbourhood of each node, the proposed GHD can detect subtle topological changes with higher sensitivity compared to the HD. A simple illustration of this is given in Fig. 1, where four simple networks are shown: the network labelled (a) is taken as reference while the three paired networks (b), (c), and (d) have been generated by changing the position of a single edge in (a). The two distances, HD and GHD, have been computed to quantify the difference between (a) and each of the other three networks. It can be observed that, whereas the HD is unable to distinguish between the three networks, the GHD score is more sensitive to subtle topological variations and can discriminate between them.
A non-parametric test for network comparison
For inferential purposes, we require computing the probability that a distance as extreme or more extreme than the observed GHD value could have been observed by chance only. By treating the GHD as a random variable with unknown sampling distribution, this probability can be estimated non-parametrically via permutation testing. First, we specify the null hypothesis as being
$$ \mathcal{H}_{0} : \text{networks }\mathcal{A} \text{ and } \mathcal{B} \text{ are independent.} $$
((3))
By taking \(\mathcal {B}\) as reference, each permutation consists of shuffling the labels of the nodes in \(\mathcal {A}\) while keeping the edges unchanged. This generates a permuted network \(\mathcal {A}_{\pi }\) that is isomorphic to \(\mathcal {A}\), and the exchangeability property holds. In turn, this signifies that the original and permuted networks are generated from the same underlying, but unspecified, model [23, 24]. Since all permutation networks are isomorphic, permuting the labels of the network is equivalent to shuffling rows and columns of the adjacency matrix, an approach that bears some similarity with Mantel’s test [25] for the comparison of two distance matrices. All the the N! possible permutations are then collected in a set Π, and for each π∈Π a permuted GHD value is denoted as
$$ \text{GHD}_{\pi} (\mathcal{A}_{\pi},\mathcal{B}) =\frac {1}{N(N-1)} \sum_{i,j}\left(a'_{\pi{(i)}\pi (j)} - b'_{ij}\right)^{2}, $$
and is calculated from the edge weights \(a^{\prime }_{\pi (i)\pi (j)}\) after permutation. The exact permutation distribution is obtained by carrying out an exhaustive calculation of all GHD
π
values, and p-values can then be evaluated as usual. In practice, however, doing so is computationally infeasible because the cardinality of Π is generally extremely large, even for relatively small networks. The exhaustive evaluation for all permutations in Π could be replaced by a Monte Carlo approach whereby only a smaller number of random permutations are explored. Nevertheless, the overall computational costs remain high for networks of the moderately large sizes observed in applications or when this procedure has to be repeated several times, for instance when searching for a differential subnetwork as in Section Differential subnetwork detection.
In what follows, we propose an alternative approach that removes the need to carry out computationally expensive permutation testing altogether. We demonstrate that, under our null hypothesis, the exact GHD permutation distribution can be approximated well by a normal distribution with moments that can be obtained analytically, in closed form. First, we notice that the GHD can be rewritten in an equivalent form in terms of a generalised correlation coefficient as follows:
$$\begin{array}{@{}rcl@{}} \text{GHD}_{\pi} (\mathcal{A}_{\pi},\mathcal{B}) = c - \frac {2}{N(N-1)} \sum\limits_{i,j} a'_{\pi(i)\pi(j)} b'_{ij}, \end{array} $$
((4))
where c is a constant that does not change under permutations. By making use of this alternative representation, we are able to exploit well-known sufficient conditions for asymptotic normality, which can also be easily checked in practice. For a generalised correlation coefficient of this form, the exact permutation distribution is asymptotically normal under two sufficient conditions [26–28]:
$$ \sum\limits_{i,j} a'_{ij} = \sum_{i,j} b'_{ij} = 0 \quad \text{and} \\ $$
((5a))
$$ {\lim}_{\textit{N}\to\infty} \frac{\left[\sum_{ijkl}a'_{ij}a'_{ik}a'_{il}\right]^{2}}{\left[\sum_{ijk}a'_{ij}a'_{ik}\right]^{3}}={\lim}_{\textit{N}\to\infty} \frac{\left[\sum_{ijkl}b'_{ij}b'_{ik}b'_{il}\right]^{2}}{\left[\sum_{ijk}b'_{ij}b'_{ik}\right]^{3}}=0. $$
((5b))
Condition (5a) follows directly from the definition of \(a^{\prime }_{\textit {ij}}\) and \(b^{\prime }_{\textit {ij}}\) as being mean-centred. In order to gain some insight into the meaning of condition (5b) in our context, it is instructive to consider the case where a
ij
and b
ij
are elements of the two adjacency matrices, i.e. they indicate the presence of an edge. On defining \(a_{\textit {i}\cdot }=\sum _{j \neq i} a_{\textit {ij}}\) and \(\bar a = \frac {1}{N}\sum _{i}a_{i\cdot }\), we have
$$ a'_{\textit{i}\cdot} = \sum\limits_{j \neq i} a'_{ij} = a_{\textit{i}\cdot}-\bar a, $$
((6))
and condition (5b), with reference to network \(\mathcal {A}\), can be written as
$$ {\lim}_{\textit{N}\to\infty} \frac{\left[\sum_{i}(Na'_{\textit{i}\cdot})^{3}\right]^{2}}{\left[\sum_{i}(Na'_{\textit{i}\cdot})^{2}\right]^{3}}= {\lim}_{\textit{N}\to\infty} \frac{\left[\sum_{i}(a_{\textit{i}\cdot} - \bar a)^{3}\right]^{2}}{\left[\sum_{i}(a_{\textit{i}\cdot} - \bar a)^{2}\right]^{3}}=0, $$
((7))
and analogously for \(\mathcal {B}\). It can be observed that, when using the adjacency matrix, a
i· represents the degree of the i
th node. An analogous condition also applies to \(\mathcal {B}\). Therefore, checking (5b) amounts to computing the degree of each node in the two networks, and assessing the limiting behaviour. When the TO measure is used instead, as in the GHD, the coefficient a
i· represents the overall topological overlap information at node i, and can also be computed using (6).
When both (5a) and (5b) hold true, under the null hypothesis, the permutation distribution of \(\text {GHD}(\mathcal {A},\mathcal {B})\) is approximately normal. We then standardise the GHD value by mean-centring and normalising it, so that it follows a standard normal distribution asymptotically,
$$ \frac{\text{GHD}_{\pi}(\mathcal{A}_{\pi},\mathcal{B}) - \mu_{\pi}} {\sigma_{\pi}} \sim N(0,1) $$
((8))
where μ
π
and σ
π
are the mean and standard deviation of GHD under the exact permutation distribution, respectively. These two moments can be computed precisely and in closed-form by enumerative combinatorics; the calculations follow developments described in the context of related permutation-based testing procedures [25], and can also be found in [29]. Here we provide explicit formula for both μ
π
and \(\sigma ^{2}_{\pi }\) as follows. First, we need to define
$$\begin{array}{@{}rcl@{}} \!{~}^{t}S_{a} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} a^{t}_{ij}, t=1,2 &\text{and} & T_{a} = \sum\limits_{i=1}^{N}\left(\sum\limits_{j=1}^{N} a_{ij}\right)^{2}\\ \!{~}^{t}S_{b} = \sum\limits_{i=1}^{N} \sum\limits_{j=1}^{N} b^{t}_{ij}, t=1,2 &\text{and}& T_{b} = \sum\limits_{i=1}^{N}\left(\sum\limits_{j=1}^{N} b_{ij}\right)^{2} \end{array} $$
where \(a^{t}_{\textit {ij}}\) and \(b^{t}_{\textit {ij}}\) are edge weights with power t. Here \(\frac {\!{~}^{1}S_{a}}{N(N-1)}\) and \(\frac {\!{~}^{2}S_{a}}{N(N-1)}\) are empirical raw moment of edge weight a
ij
, and analogously for b
ij
. Furthermore we need to introduce the following quantities,
$$\begin{array}{@{}rcl@{}} &&A_{a} = \left(\!{~}^{1}S_{a}\right)^{2}, B_{a} = T_{a} - \left(\!{~}^{2}S_{a}\right),\\ &&~~\text{and}~~ C_{a} = A_{a}+2\left(\!{~}^{2}S_{a}\right) -4T_{a} \\ [5pt] &&A_{b} = \left(\!{~}^{1}S_{b}\right)^{2}, B_{b} = T_{b} - \left(\!{~}^{2}S_{b}\right),\\ &&~~\text{and}~~ C_{b} = A_{b}+2\left(\!{~}^{2}S_{b}\right) -4T_{b} \end{array} $$
Then, closed-form expressions for the mean μ
π
and variance \(\sigma ^{2}_{\pi }\) are,
$$\begin{aligned} &\mu_{\pi} = \frac{{\!{~}^{2}S_{a}} + {\!{~}^{2}S_{b}}}{N(N-1)} - \frac{2\left({\!{~}^{1}S_{a}}\right)\!\!\left({\!{~}^{1}S_{b}}\right)}{N^{2}\left(N-1\right)^{2}} \,, \\ &\sigma^{2}_{\pi} = \frac{4}{N^{3}(N-1)^{3}}\left[2{\vphantom{\frac{(A_{a})(A_{b})}{N(N-1)}}}\left({\!{~}^{2}S_{a}}\right)\right. \left.\!\!\!\left({\!{~}^{2}S_{b}}\right)+\frac{4(B_{a})(B_{b})}{N-2}\right. \\ &\left.\,\,\,\,\,\,\,\,\,\,\,\,\,\,+ \frac{(C_{a})(C_{b})}{(N-2)(N-3)} - \frac{(A_{a})(A_{b})}{N(N-1)}\right]. \end{aligned} $$
With the expressions for the first two exact moments, a corresponding p-value can therefore be efficiently computed from the normal approximation, even for very large networks. We will exploit the computational efficiency gained here in Section Differential subnetwork detection, where we apply the test repeatedly on networks of increasingly smaller size in order to detect differential subnetworks.
Validation of asymptotic normality on scale-free networks
The closed-form approximation for the computation of p-values only requires that conditions (5a) and (5b) are satisfied, and does not need any random network model to be specified. These two conditions can also be verified analytically in special case when certain random network models are assumed. For instance, in [29] it was proved that these conditions hold true for scale-free (SF), random geometric (RG) and Erdös-Rényi (ER) network models when using both HD and GHD distances. In this section we provide a simplified proof for the case of SF networks using the Hamming distance. This proof should serve as an illustration of how these derivations can be carried out analytically, and as simple validation of the methodology described in Section A non-parametric test for network comparison for SF networks. An analogous proof using the GHD distance can be found in the Supplementary Material, and we refer the reader to [29] for the other models.
A SF network is a network whose node degree distribution follows a power law, at least asymptotically, and has often been used to describe real biological networks [30–32]. The degree of each node is assumed to be an independent and identically distributed (IID) random variable with probability mass function defined as
$$ P(d_{i} = k) = ck^{-\alpha}, \quad k = m,m+1\dots,K, $$
((9))
where m and K are the lower and upper cut-offs for the node degree, respectively, c is a normalising constant, and α represents a power exponent. It is generally assumed that α is greater than 1, and the lower cut-off m is generally be taken to be 1. The upper cut-off K for α>2 is conventionally specified as \(K = N^{\frac {1}{\alpha -1}}\) [33], and generally K=N−1 for 1<α≤2. Values of α for different biological networks have been characterised, and mostly vary between 1.4 to 1.7 [30].
On defining the weights a
ij
and b
ij
as elements of A and B, respectively, (7) becomes
$$\begin{array}{@{}rcl@{}} {\lim}_{\textit{N}\to\infty} \frac{\left[\sum_{i}(d_{i} - \bar{d{\,}})^{3}\right]^{2}}{\left[\sum_{i}(d_{i} - \bar{d\,})^{2}\right]^{3}}=0, \end{array} $$
((10))
where \(\bar {d\,}\) is the average node degree. In order to study this limiting behaviour, we exploit the fact that both numerator and denominator are powers of the centralised empirical moments of the node degree distribution. We let \(\mu _{s} = c \sum _{d=1}^{K} d^{s-\alpha }\) denote the s
th theoretical moment and \(m_{s}=\frac {1}{N}\sum _{i=1}^{N}{d_{i}^{s}}\) the corresponding empirical moment of this distribution. In order to study the limit above we need to characterise the order of m
s
, for s=1,2,3, as N increases. Our strategy here consists of first characterising the order of μ
s
asymptotically, for the first three moments, and establishing a correspondence with m
s
.
We start by examining the order of μ
s
, for s=1,2,3, in the limit. Since this depends on s, we consider three distinct cases: (a) s−α+1<0, (b) s−α+1=0 and (c) s−α+1>0. For (a), the order of μ
s
is \(\sum _{d=1}^{K} \frac {1}{\alpha - 1} d^{-1} = O(1)\). For (b), the order of μ
s
is \(\sum _{d=1}^{K} d^{-1} = O(\ln (K))\). Finally, for (c), we need to study how μ
s
increases with K. First, we apply the Euler-Maclaurin formula,
$$ \sum\limits_{d=1}^{K} d^{s-\alpha} = K^{s-\alpha+1} + (\alpha-s){\int_{1}^{K}} \frac{\left\lfloor x \right\rfloor}{x^{\alpha-s+1}} dx + O(1), $$
where ⌊x⌋ denotes the largest integer that is not greater than x. To compute the order of \(\sum _{d=1}^{K} d^{s-\alpha } \), we need to know which one of the two terms in the sum dominates in order. By applying l’Hospital’s rule we have
$$\begin{array}{@{}rcl@{}} {\lim}_{\textit{K}\to\infty}\frac{s{\int_{1}^{K}} \frac{\left\lfloor x \right\rfloor}{x^{\alpha -s+1}} dx} {K^{s-\alpha+1}} &=& \frac{s}{s-\alpha+1}, \end{array} $$
which is a finite constant, and hence μ
s
has the same order as K
s−α+1. For a SF network, the condition for asymptotic normality also depends on the values taken by the exponent. In the case where 1<α<2, for which K=N−1, the calculation of the s
th moment falls under case (c), hence we conclude that the order of the first three theoretical moments are, respectively, O(N
2−α),O(N
3−α) and O(N
4−α).
We now turn to the direct comparison of the orders of μ
s
and m
s
in the limit. Specifically, we assess whether the order of each μ
s
established above also holds true for the corresponding m
s
. This can be verified by checking that
$$\begin{array}{@{}rcl@{}} {\lim}_{N \to \infty} \frac{m_{s}}{\mu_{s}} = c_{s}, \end{array} $$
((11))
for s=1,2,3, and for some positive constants c
s
. To study the above limit, we apply the Weak Law of Large Numbers (WLLN). For the WLLN to hold, μ
s
must be finite. Hence we first transform d
i
so that μ
s
, after the transformation, is finite. We let \(N_{s} = N^{\frac {s+1-\alpha } {s}}\), and define \(z_{\textit {si}} = \frac {d_{i}}{N_{s}}\). The distribution of z
si
is
$$P(z_{si}=z) = c'z^{-\alpha}, \qquad z = \frac {1}{N_{s}},\frac {2}{N_{s}},..,\frac {K}{N_{s}}, $$
where c
′=c
N
s
. Thus the s
th theoretical moment of z
si
is
$$\begin{array}{@{}rcl@{}} \mu_{zs} = c'\sum\limits_{z} z^{s-\alpha}= c' \sum\limits_{d}\left[\frac{d}{N_{s}}\right]^{s} d^{-\alpha}= \frac{\mu_{s}}{N^{s+1-\alpha}}, \end{array} $$
which is finite. Denoting by m
zs
the s
th empirical moment of z
si
,i=1,...,N, we have
$$\begin{array}{@{}rcl@{}} m_{zs} = \frac{1}{N} \sum\limits_{i=1}^{N} z_{si}^{s} = \frac{1}{N} \sum\limits_{i=1}^{N} \left(\frac{d_{i}}{{N_{s}}}\right)^{s} = \frac{m_{s}}{N^{s+1-\alpha}}. \end{array} $$
Now, since μ
zs
is finite and since d
1,d
2,...,d
N
are assumed IID, z
s1,z
s2,...z
sN
are also IID, and according to the WLLN, m
zs
converges to μ
zs
in probability. Hence we have
$$\begin{array}{@{}rcl@{}} 1 = {\lim}_{N \to \infty} \frac{m_{zs}}{\mu_{zs}} = {\lim}_{N \to \infty} \frac{\frac{m_{s}}{N^{s+1-\alpha}} }{\frac{\mu_{s}}{N^{s+1-\alpha}}} = {\lim}_{N \to \infty} \frac{m_{s}}{\mu_{s}}, \end{array} $$
indicating that m
s
and μ
s
are of the same order asymptotically. Using this result, we are able to approximate the orders of the numerator and denominator of condition (7): \(\sum _{i}\left (d_{i} - \bar d\right)^{3} = N\left (m_{3}-2m_{2}m_{1}+2{m_{1}^{3}}\right)\) is O(N
4−α+1), and \( \sum _{i}\left (d_{i} - \bar d\right)^{2} = N(m_{2} - {m_{1}^{2}}) \) is O(N
3−α+1). Substituting into (7), we see that the numerator is of order O(N
8−2α+2), the denominator is of order O(N
9−3α+3), and therefore the ratio is of order O(N
α−2). Hence for 1<α<2, the limit in (10) is 0. By following a similar procedure, it can be proved that the normality condition is also satisfied when α≥3.
Differential subnetwork detection
In this section we leverage the test statistic of Section A non-parametric test for network comparison to detect a differential subnetwork. When comparing the two networks, the expectation is that only a subset of edges would present altered interaction patterns. This task is formulated here as the problem of detecting a subset V
∗⊂V for which there is no sufficient evidence to reject the null hypothesis that the corresponding subnetworks \(\mathcal {A}^{*}(V^{*},E_{A^{*}})\) and \(\mathcal {B}^{*}(V^{*},E_{B^{*}})\) are statistically independent.
An algorithm for the detection of V
∗ should take into account the fact that a certain degree of topological difference between \(\mathcal {A}\) and \(\mathcal {B}\) is always bound to be observed, even when the two population networks are the same, due to finite sample variability. The GHD test provides an efficient way to assess the statistical significance of any observed discrepancy between two paired networks, and is used as a building block to derive an algorithm that identifies differential subnetworks.
We indicate by V
K
a subset of V of size K≤N, and define the centralised GHD test statistic computed by comparing \(\mathcal {A}=(V_{K},E_{A})\) and \(\mathcal {B}=(V_{K},E_{B})\) by
$$ \Delta_{V_{K}} = \text{GHD}(\mathcal{A}(V_{K},E_{A}),\mathcal{B}(V_{K},E_{B})) - \mu_{V_{K},} $$
((12))
where \(\mu _{V_{K}}\) is the mean of the permutation distribution for node set V
K
. Furthermore we define \(\Delta _{V_{K}|i}\) to be the centralised GHD value computed by comparing the two networks after removal of node i. The quantity
$$\delta_{i} = \Delta_{V_{K}|i} - \Delta_{V_{K}}, $$
measures the influence that node i has on the mean-centred GHD test when comparing two subnetworks defined on set V
K
. We propose an iterative procedure which removes a node or set of nodes at each step, and generates a sequence of node sets of increasing smaller size, i.e.
$$V_{N} \supset V_{N-1} \supset \ldots \supset V_{N_{\text{min}}}, $$
where N
min<N is a constant indicating the smallest allowed size of subnetwork. Starting with V
N
, the two corresponding networks are compared by the GHD test, and a p-value is computed, as described previously. For each node indexed by i=1,…,N, the corresponding δ
i
is computed, and the node associated with the largest positive δ
i
value is removed. Given a new set V
N−1, the process is then repeated again, and then again until a specified minimal set size is reached.
This simple algorithm produces a monotonic sequence of p-values that increases as the subnetwork size decreases (e.g. see Fig. 2). The p-values should be adjusted for multiple testing, e.g. by controlling the false discovery rate [34]. In the presence of a differential subnetwork, the sequence is expected to feature a peak corresponding to the size of the subnetwork. Specifically, for a given desired significance level α, the algorithm finds the largest K, with N≥K≥N
min, such that the adjusted p-value exceeds α. Clearly the algorithm benefits from the fact that p-values at each iteration can be computed very quickly in closed-form.