 Methodology Article
 Open access
 Published:
A new method to measure complexity in binary or weighted networks and applications to functional connectivity in the human brain
BMC Bioinformatics volume 17, Article number: 87 (2016)
Abstract
Background
Networks or graphs play an important role in the biological sciences. Protein interaction networks and metabolic networks support the understanding of basic cellular mechanisms. In the human brain, networks of functional or structural connectivity model the informationflow between cortex regions. In this context, measures of network properties are needed. We propose a new measure, Ndim, estimating the complexity of arbitrary networks. This measure is based on a fractal dimension, which is similar to recently introduced boxcovering dimensions. However, boxcovering dimensions are only applicable to fractal networks. The construction of these networkdimensions relies on concepts proposed to measure fractality or complexity of irregular sets in \(\mathbb {R}^{n}\).
Results
The network measure Ndim grows with the proliferation of increasing network connectivity and is essentially determined by the cardinality of a maximum kclique, where k is the characteristic path length of the network. Numerical applications to latticegraphs and to fractal and nonfractal graph models, together with formal proofs show, that Ndim estimates a dimension of complexity for arbitrary graphs. Boxcovering dimensions for fractal graphs rely on a linear log−log plot of minimum numbers of covering subgraph boxes versus the box sizes. We demonstrate the affinity between Ndim and the fractal boxcovering dimensions but also that Ndim extends the concept of a fractal dimension to networks with nonlinear log−log plots. Comparisons of Ndim with topological measures of complexity (cost and efficiency) show that Ndim has larger informative power. Three different methods to apply Ndim to weighted networks are finally presented and exemplified by comparisons of functional brain connectivity of healthy and depressed subjects.
Conclusion
We introduce a new measure of complexity for networks. We show that Ndim has the properties of a dimension and overcomes several limitations of presently used topological and fractal complexitymeasures. It allows the comparison of the complexity of networks of different type, e.g., between fractal graphs characterized by hub repulsion and small world graphs with strong hub attraction. The large informative power and a convenient computational CPUtime for moderately sized networks may make Ndim a valuable tool for the analysis of biological networks.
Background
Network or graph theory is of increasing importance for the analysis of biological systems. This may comprise protein interaction networks [1], metabolic networks [2] or neuronal networks in the human brain. Taskinduced and resting state functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) established the analysis of functional and structural connectivity networks in the human brain [3, 4]. For quantitative analyses, local and global topological measures are employed; see [3, 5, 6] for a description and a critical discussion. We introduce in this paper a new concept, which may be called a “regional quantity” and which measures the complexity of networks. The problem how to measure complexity of a graph has been approached in different ways. For example, complexity of a graph has been defined by the number of its spanning trees [7]. It has been defined as the number of Boolean operations to construct the graph from generating graphs [8]; this type of complexity is frequently called computational complexity. Or, it has been defined by a combination of the number of vertices, edges and proper paths [9]. Our concept of complexity is based on the connectivity of a graph. Simple examples of this type of complexity are the cost or the efficiency, more involved examples are the boxcounting dimensions [10, 11] introduced recently. Cost and efficiency are easily calculated, however both concepts fail to discriminate regular grids with different dimensions, see Ndim and manifolds. Box countingdimensions can detect a novel type of graphs, the fractal graph; however, these dimensions need NPcomplete algorithms and become unstable if fractality is distorted [12]. In our approach we try to overcome these shortcomings by a new concept, which is applicable to any undirected graph. We achieve with our concept more flexibility and enhanced informative power, the price however is again a NPcomplete algorithm with high CPU times for some large sized networks. Practical applicability of Ndim to such cases will be demonstrated by the introduction of convenient lower bounds.
Our concept is based on a definition of a fractal dimension introduced for sets in \(\mathbb {R}^{n}\) by Sandau [13] and Sandau & Kurz [14]. Fractal analysis in \(\mathbb {R}^{n}\) allows the quantification of irregularity or complexity of point sets where the concepts of traditional geometry usually fail. Fractals possess a fine structure that exhibits details at different scales of resolution. Fractals appear in numerous disciplines, e.g., plasma physics, biological systems, or neuroscience [15–17].
The degree of complexity is measured by a fractal dimension FD although many nonequivalent definitions of FDs exist. However, all such concepts of FD satisfy (at least approximately) the following unifying conditions, see [13, 15].

C1.
Invariance under Euclidean motions;

C2.
Invariance under affine transformations;

C3.
Monotonicity for set inclusion;

C4.
Maximum property for set union;

C5.
FDs are extensions of the topological dimension for smooth manifolds, and

C6.
All FDs are equivalent for selfsimilar fractals.
A picturesque definition of selfsimilarity is due to Mandelbrot [18]: A fractal is called selfsimilar if a subset, magnified to the size of the whole set, is congruent to the whole. (See [15] for a formal definition and extensions.) The numerical calculation of FDs for real data with scale of resolution >0 is, in general, nontrivial as the details of scales tending to zero are important for the characterization of a fractal [17]. In [13, 14] an FD definition, called xdim, was proposed that is both applicable to selfsimilar and more general fractals. This definition appears to be numerically more robust than the frequently used boxcounting dimension for selfsimilar fractals.
Using a procedure similar to the calculation of a boxcounting dimension in \(\mathbb {R}^{n}\), a novel FD for networks or graphs was introduced by Song et al. [10, 11]. For this purpose, a covering of the graph by subgraph (“box”) systems differing in their linear size is proposed. The FD is finally determined by the slope of a linear log− log plot, where the minimal number of covering subgraphs is plotted versus their linear size.
Networks have in general no well defined geometric patterns that allow the definition of selfsimilarity. But selfsimilarity can be defined by internal properties of the graph, e.g., by the invariance of the degree distribution under scale transformations. To obtain graphs on different length scales, a subgraph covering with fixed linear size is iterated, blurring the initial graph more and more, thus increasing the scale; see Song et al. [10, 11]. If the degree distribution is invariant under the different steps of this renormalization procedure, the graph is called selfsimilar [19, 20]. A similar scaling invariance for edge densities was discussed by Blagues et al. [21]. Note the following difference to fractals in \(\mathbb {R}^{n}\): There exist selfsimilar graphs which are not fractals, i.e., graphs for which the log− log plot is nonlinear [11, 20].
Inverting this iterated renormalization procedure allows the design of models for fractal and nonfractal scale free graphs [11, 12]. Transitions between both classes depend on the strength of the module or hub repulsion which is controlled by the model parameters; minimal hub repulsion (attraction) produces a model with small world properties, maximal hub repulsion a fractal graph [11]. It is shown in [12], that the addition of noise by adding random edges can also initiate the transition from a fractal to a small world graph. Many real data networks are fractal graphs; examples are the worldwideweb (WWW), social networks, proteinprotein interaction graphs (PIN), and cellular networks [10, 11]. More recently, Gallos et al. [22, 23] detected fractal fMRI networks in the human brain for high percolation thresholds. An alternative to the boxcovering methods is discussed in [24]; they use a random walker through the network, to derive a correlation dimension using a convenient log− log plot.
In this paper, we propose an extension of Sandau‘s fractal dimension xdim in \(\mathbb {R}^{n}\) to networks or graphs, which we call Ndim. This new concept is based essentially on the maximum kclique cardinality and does not only allow the quantification of complexity for fractal graphs, but also for graphs with nonlinear log− log plots. We prove that this concept satisfies the graphspecific modifications of the conditions C2, C3, and C4; condition C1 is irrelevant for graphs. The validity of conditions C5 and C6 is demonstrated numerically by applications of Ndim to regular lattice or grid graphs and to binary fractal models. As weighted graphs are frequently used to quantify functional or structural connectivity, we present three procedures to apply Ndim to weighted graphs. To evaluate the complexity Ndim, we compare it with the connectivity measures cost, efficiency, and boxcounting dimensions by applications to resting state and taskinduced fMRI data. We find several advantages of our new measure: compared to cost and efficiency Ndim has stronger informative power, as complexity differences in fMRI correlation networks between healthy and depressed subjects are increased for Ndim. In contrast to cost and efficiency is Ndim a nonglobal regional metrics. We show that this feature enables the localization of special hub nodes in the networks which are characteristic for depressed subjects. Comparing Ndim and some boxcounting metrics, we find a strong similarity as long as the networks are fractals; applying these concepts to experimental dual task fMRI networks, which lose fractality by lowering a correlation threshold, we find that Ndim has an essentially enhanced scope of application compared to boxcounting dimensions. Though the algorithm to calculate Ndim is NPcomplete, we find that CPU time is quite low for moderate network sizes, for large sized networks we discuss the introduction of lower bound constraints.
Methods
Extended counting method in Euclidean space
For a point set in \(\mathbb {R}^{n}\), the fractal dimension xdim is numerically calculated using the extended counting method [13]; see Fig. 1 a, b for an illustration.
In Fig. 1 a, b, the point set F (Sierpiński triangle) is covered by a fine uniform grid with grid scale e _{1} satisfying e _{1}≥ the resolution of the data points given in Fig. 1 a. An additional large window with size e _{ w }= (minimal side length of the wrapping box of F)/2 is slid along the corner points of the fine grid to find the maximum intersection between F and the fine grid within the window; see Fig. 1 b. Denoting the maximum number of intersecting fine boxes by N, xdim is then estimated by
(See [13, 17]). For xdim estimates of MRI cortex surfaces, fMRI time series in the human brain, and fractional Brownian motion, we refer to [25–27].
Extended counting method for networks
For a connected binary network or graph G, we define an extended counting method similar to that for a point set in \(\mathbb {R}^{n}\). The fractal dimension Ndim of G is computed in several steps:

1.
Compute the average distance μ of G. The average (chemical) distance, characteristic path length, or normalized Wiener index is a natural measure for the compactness of a graph [28, 29]. It is defined by
$$\mu := \frac{{\sum\limits_{1\leq i < j \leq n}} d(i,j)}{\binom{n}{2}}, $$where n denotes the number of vertices and d(i,j) the chemical distance or shortest path between vertices i and j.

2.
Reduce μ to an integer k:=⌊μ⌋. Here ⌊·⌋ denotes the floor function.

3.
Compute a vertex set that is defined by a maximum kclique of G. A maximum kclique is a largest set of vertices with distance ≤k in G to each other. Such a clique can be interpreted as a cluster, module, or cohesive subgroup of vertices [30].

4.
We define Ndim for a finite graph G by
$${\text{Ndim}} (G) = \frac{\log \max {\textrm{\textit{k}clique}}}{\log(k+1)}, $$where  maxkclique denotes the cardinality of a maximum kclique. Following the convention of Song et al. [10], we added a +1 to the chemical distance in the denominator.

5.
For infinite graphs G with arbitrary large k, a fractal dimension Ndim can be defined by
$$\begin{array}{*{20}l} {\text{Ndim}} (G) = {\lim}_{\textit{k}\to\infty} \frac{\log \max {\textit{k}\textrm{clique}}}{\log(k+1)},\\ \quad \text{provided this limit exists.} \end{array} $$((1))The above definition is similar to the definition of fractal dimensions for sets in \(\mathbb {R}^{n}\) where an infinitesimal process is applied [15]; see Ndim and manifolds and Applications of Ndim to fractal and nonfractal models for examples of graphs where the number of vertices and k go to ∞.
See Fig. 1 c, d for an illustration of a submaximum and a maximum kclique in a fractal graph. As will be shown in later sections, Ndim is a fractal dimension FD in the sense that it satisfies the conditions C2 – C6 for graphs given in. For simplicity, the notion of fractal dimension is also used for Ndim when applied to finite graphs.
The numerically most difficult part in the computation of Ndim(G) is the calculation of the maximum kclique where a scan through G is necessary. This is a NPcomplete problem [30] just as the computation of boxcoverings for the FD of [10]. For the clique computations, an algorithm by Carraghan & Pardalos [31] and an algorithm by Tomita et al. [32] applied in the commercial software package Mathematica are used. For a comparison, the boxcovering dimensions defined via log− log plots are also computed. We employ the maximumexcludedmassburning (MEMB) algorithm and a compactboxburning (bcm) algorithm as described in [33]. The MEMB algorithm is published online and can be found at http://wwwlevich.engr.ccny.cuny.edu/webpage/hmakse/brain/.
Application of Ndim to weighted graphs
The FD Ndim was introduced for a binary graph. However, numerous problems in biology or brain research deal with weighted graphs [34]. (Note that all graphs considered in this article are undirected and have no self loops).
In the following, we describe three procedures of how to apply Ndim to weighted graphs with nonnegative weights. Elaborate examples for human brain data and the conditions for applicability are discussed in Results. To obtain a reasonable distance measure d(i,j) for weighted graphs, the interpretation of the weights w(i,j) in the weighted adjacency matrix must be considered. For better readability, we do not delineate in the following between a graph and its adjacency matrix.
For “transportation networks,” distances increase with the weights; for “communication networks,” distances decrease with increasing weights [35, 36]. For brain mapping, “communication networks” are frequently used. Here the weights may be given by w(i,j)=correlation(i,j) for fMRI time series in two different ROIs i and j [37], or by w(i,j)=connectivitystrength(i,j) for DTI fiber connection between functional ROIs i and j [38]. In these cases Ndim can be computed as follows.

1.
Transition to a binary adjacency matrix using a cutoff: A mapping from the weighted adjacency matrix W _{ n×n } of a graph with n vertices to a binary adjacency matrix A _{ n×n } with components a(i,j) is defined by thresholding of some quantity τ>0, as follows. If for an edge w(i,j)>τ, then w(i,j) is replaced by a(i,j)=1; otherwise w(i,j) is replaced by a(i,j)=0. This implies that for, e.g., correlation(i,j)>τ, d(i,j)=1, and for correlations(i,j)<τ, d(i,j)>1, as the connecting paths are no longer direct links.
If the binary graph A is disconnected then Ndim is computed for every connected binary component \(A_{k_{l}}\), where k _{ l } is the average distance in component A _{ l }. The list of FDs (Ndim_{1}, Ndim_{2}…) is combined into a weighted average
$${\text{Ndim}} (W,\tau) = \sum_{l} \text{weight}(l) \,\cdot\,{\text{Ndim}} (A_{k_{l}}), $$describing the complexity of the disconnected graph. As weights, we choose weight (l)=(number of nodes in component A _{ l })/n, where n denotes the size of the entire graph, which reduces the weight for smaller components as they carry less information. Components with only a few nodes should be excluded from the averaging process but the normalization \(\sum _{l} \text {weight}(l) = 1\) should be maintained.

2.
A Monte Carlo ensemble method: The weighted adjacency matrix W _{ n×n } is normalized by setting \(\widetilde {w}(i,j) := w(i,j)/\max \{w(i,j) : i,j = 1, \ldots, n\}\) and these weights are interpreted as probabilities. A uniform random number generator assigns to each edge a random number p∈(0,1] and defines a mapping to a binary random matrix via the following condition: If \(p\leq \widetilde {w}(i,j)\) then a(i,j)=1; else a(i,j)=0. By this procedure, an ensemble of binary random graphs is produced, where for large \(\widetilde {w}(i,j)\) short distances are frequently randomly generated and for small \(\widetilde {w}(i,j)\) mainly large distances [39, 40].
For every such binary random graph the FD Ndim can be computed. In case a random graph is disconnected, a weighted averaging as in (1) can be applied.

3.
Calculation of Ndim via functional distances: To calculate statistical measures of weighted connected communication networks W _{ n×n }, a functional (physical) distance network \(\widetilde {W}_{n\times n}\) is often introduced in order to avoid long pathways for strong connections [35, 36, 41, 42]. In the following we adapt this approach for the computation of the complexity Ndim.

(a)
The w(i,j) coefficients (edges of a weighted graph) are mapped to functional distance coefficients via, e.g., \(\widetilde {w}(i,j) = 1/w(i,j)\).

(b)
The graph \(\widetilde {W}_{n\times n}\) is mapped to an approximating multigraph M _{ n×n } with integer weights by scaling by a large factor c and rounding [35]: \(M_{n\times n} = \lfloor c \widetilde {W}_{n\times n}\rfloor \). Finally, the coefficients m(i,j)=∞ are mapped to m(i,j)=0; see Results for a numerical example of scaling. Then compute, for instance by the Dijkstra algorithm [43], the distance matrix D _{ n×n } with coefficients d(i,j) for M _{ n×n }, the average distance μ, and the minimum positive distance m.

(c)
Transform the multigraph M _{ n×n } to a binary distance graph G _{ n×n } [44] using the following condition: If d(i,j)≤⌊μ⌋ set g(i,j)=1; otherwise set g(i,j)=0 (g(i,i)=0). For this binary graph calculate a maximum clique if G _{ n×n } is connected; otherwise compute a maximum clique in each connected component and use from these maximum cliques the one with maximal vertex cardinality; see an example in the comment for Fig. 8.

(d)
The complexity Ndim for W _{ n×n } is finally estimated by
$${\text{Ndim}} (W) = \frac{\log \textrm{maximum clique}}{\log [(\lfloor\mu\rfloor + m)/m]}. $$In case M _{ n×n } is a binary graph, Ndim agrees with the definition given in Extended counting method for networks.

(a)
Data
An application of thresholding and of the Monte Carlo ensemble method to compare the complexity of weighted networks relies on data from [45]: For patients with recurrent depression and for healthy controls, a whole brain functional connectivity network was derived from preprocessed restingstate functional MRI data. By anatomical parcellation of the whole brain using the HarvardOxford atlas (FSL, Oxford University) 112 regions of interest (ROIs) or network nodes were defined. Time series of functional MRI signals were extracted from each voxel and subsequently averaged within each of the 112 ROIs. A maximal overlap discrete wavelet transform was applied to decompose the regional time series into different frequency scales [46]. Absolute wavelet correlation coefficients at the lowfrequency scale (0.060–0.125 Hz) were used to obtain a 112×112 weighted connectivity matrix representing an individual wholebrain functional correlation network for each subject; a similar procedure is used in [37].
A second data set from [22, 23] uses visual and auditory stimuli with four different onset delays to achieve wholebrain dualtask fMRI time series. The phase of the BOLD signal was computed for each voxel i and for 40 trials [47]. The phasebased correlations between different voxels i and j were averaged over the trials. This resulted in a wholebrain correlation matrix representing a weighted subjectspecific functional network. This wholebrain network was reduced by a mask of ∼ 60.000 voxels, where only voxels with high activation probability were kept. The data are published online at http://wwwlevich.engr.ccny.cuny.edu/webpage/hmakse/brain/. The variation of a correlation threshold produces a percolation process: highly correlated voxels indicate separated modules with strong functional internal links, for lower thresholds the modules are merged and the network approaches a smallworld topology.
Seven binary connectivity networks, taken from http://www.brainconnectivitytoolbox.net/, are tested for computational cost. Autobahn has a connection 1, if two locations are directly connected by the highway [48]. Air500 summarizes the flight connections between 500 air ports [49]. Three biological networks are: Celegans describing neuronal connections [50], Mac95 and Macaque summarizing corticocortical connections [51].
Results
In the next section, we numerically explore how well Ndim and a boxcovering dimension approximate the dimensions of grid graphs (condition C5). Then, the equivalence between Ndim and several boxcovering dimensions is explored for models of selfsimilar fractal graphs (condition C6). An examination of nonfractal graphs demonstrates that Ndim is applicable beyond fractality, i.e., to graphs with nonlinear log− log plots for minimum numbers of covering boxes versus box size. Three methods of how to apply Ndim to weighted graphs on the basis of analyses of functional connectivity in the human brain are explained. Resting state fMRI data of healthy and depressed subjects and dualtask fMRI data are involved to evaluate Ndim.
Ndim and manifolds
To demonstrate, that the proposed FD Ndim is an extension of the topological dimension for smooth manifolds (condition C5 in the introduction), we applied Ndim to regular one and twodimensional finite lattice graphs of size 100, 1500, 30×30, 40×40, 80×80, and 100×100. For comparison, a boxcovering dimension was calculated applying the method MEMB [33] and the measures cost (global normalized edge density) and global efficiency, which are in use as simple measures of complexity. We refer to Table 1 for the results, which show a convergence of the FDs to the topological dimensions as size increases. For cost and efficency we find that they tend to zero for both grids as their sizes increase.
The log− log plots for MEMB showed a high linearity in all cases. The coefficient of determination R ^{2} was found to satisfy R ^{2}>0.99. The box sizes for the log− log plots were averaged, as proposed by Song et al. [33], thus increasing the R ^{2} values. To keep computation time within limits, we did not use adjacency matrices with more than 10,000 nodes. These results show the enhanced informative power of the fractal dimensions as compared to the topological measures.
Application of Ndim to fractal and nonfractal models
A model for scalefree fractal networks was introduced by [11]; it is shown in the Supplement of this paper that this model is in addition selfsimilar. An explicit algorithm, calculating the model by inversion of a renormalization process, can be found in [11, 12]. In this algorithm, the network grows iteratively in size (number of nodes) whereby the diameter and the degree per node approach the full model. To explore if Ndim satisfies condition C6 for binary graphs, the network dimensions Ndim, MEMB, and bcm are compared at different stages of a growing network. Starting the network construction with a single node, these dimensions are plotted against the size of the model f(g,n,m,e) for g=4,5,6, and 7 iterations; see Fig. 2. The growth factor of the network size is n=4 per iteration, the growth factor of the degree is m=2, the probability for hub attraction is e=0; see [11] and [12] for a detailed discussion of the parameters. The FD of a full model (g→∞) can be calculated by F D= ln(n)/ ln(m).
Figure 2 indicates that all numerically computed dimensions approach the value FD = 2 with increasing g. For the complexity Ndim, an analogous convergence behaviour was observed for the fractal models f(g,6,2,0), f(g,5,2,0), f(g,3,2,0), f(g,5,3,0), and f(g,6,3,0). To keep computer time within limits the size of the graphs was restricted to 4^{7} nodes.
Applying the hub attraction parameter 0<e≤1, we can construct nonfractal models with nonlinear log− log plots for boxcovering dimensions; increasing e converts the network more and more to a small world network [19]. Although boxcovering dimensions are not well defined for the entire range of log− log plots, Ndim can still be applied in these cases to quantify the complexity of such graphs. The notion of complexity is introduced as an extension of the notion of fractality; see Discussion for an indepth explanation. Depending on the size, the complexity of a binary network is limited by the complexity of the corresponding complete graph, where Ndim= log(size)/ log2. See Fig. 3 a, b for the models f(6,4,2,0) and f(6,4,2,1).
In Fig. 3 c, we show log− log plots for the minimum number of covering boxes N(l _{ B }) against the averaged box sizes l _{ B }, calculated by the MEMB algorithm for models with e=0,0.25,0.5,0.75, and 1. For e>0, deviations from linearity are clearly visible. The computed complexities Ndim of these models are presented in Fig. 3 d for the iterations g=4,5,6,7. It is shown in [12], that the addition of small fractions of random edges to the fractal models (e=0) produces similar deviations from linearity in the log− log plots.
Application of Ndim to weighted graphs by thresholding
Abnormal functional brain connectivity is reported for recurrent depression patients compared to the connectivity of healthy subjects [45, 52]. We use such connectivity data to exemplify the informative power of Ndim and to demonstrate how to apply this measure for weighted networks; a detailed clinical study is beyond the scope of this paper. In this section, for a healthy and a strongly affected recurrent depression subject after 9 episodes of depression and a high clinical Hamilton Rating scale HAMD=23 [45], restingstate fMRI correlation networks of size = 112 as described in Data, are compared. To calculate Ndim and the frequently used topological complexity measures cost (global edge density) and normalized global efficiency, the two weighted absolute correlation networks are transformed to binary graphs by thresholding the correlations w(i,j), following the procedure outlined in Methods. This procedure is applied for τ=0.2,0.3, and 0.4. For τ=0.2, the low correlation edges are also filtered out using the following inverse condition: If w(i,j)<τ, then w(i,j) is replaced by a(i,j)=1; otherwise by a(i,j)=0 (a(i,i)=0). The results are shown in Fig. 4, broken lines belong to the healthy subject, solid lines to the subject with depression. Cost, efficiency and Ndim increase for w(i,j)>τ, with decreasing tau; all measures Ndim are increased for the depressed subject as compared to the healthy subject. Vice versa, for the low correlation graphs w(i,j)<0.2, the network of the healthy subject is more complex. In general, we find for Ndim, that the differences are more pronounced than for the two global measures, indicating larger informative power of Ndim. For the depressed subject, Ndim decreases from the case w(i,j)>0.2 to the case w(i,j)<0.2, whereas cost and efficiency are increasing. This dependency excludes a simple correlation or redundancy between Ndim and the topological measures. See Fig. 5 a for binary networks derived by the condition : w(i,j)>0.2. The lefthand side shows the graph of the healthy subject, the righthand side the graph of the subject with depression, maximum kcliques are indicated in red.
Monte Carlo ensemble method
The same two data sets used in the last section and extensions to groups of 10 healthy controls and 6 depressed subjects (more than 8 episodes and HAMD> 15) are now analyzed for Ndim by the Monte Carlo ensemble method. To provide a closer comparison with the results of the thresholding method, the weighted networks are also thresholded for a given τ before transforming them to the binary random graphs, as is described in Methods. Thresholding is performed like follows, in case w(i,j)>τ, w(i,j) remains unchanged, otherwise w(i,j)=0; or, for the low correlation case, if w(i,j)<τ, w(i,j) remains unchanged, and set otherwise w(i,j)=0. For a quantitative comparison between the binary ensembles the weighted networks are normalized to probabilities by the total maximum weight collected over all networks involved in the comparison. In Fig. 5 b, samples of the binary Monte Carlo ensembles after identical thresholding are presented. The graphs in panel b increase in complexity from healthy to disease as in the cases shown in Fig. 5 a. A reduction of connectivity in the Monte Carlo samples compared to panel a, is due to the fact that this method transforms lower w(i,j) less frequently into the edges of the binary networks. For every weighted network an ensemble of 1,000 binary random networks is calculated and Ndim of a weighted graph is thus quantified by a distribution of complexities; see Fig. 5 c–h for examples. The blue distributions belong to the healthy subject, the red ones to the subject with depression. The aforementioned thresholding of the weighted graph is indicated in the Figure. Panels c), d), e), h) present comparisons of the two subjects, panel f) a comparison between the control group and the depressed subject, and panel g) the groupgroup comparison. Statistical subjectsubject comparisons may be helpful in personalized medical analysis. The group distributions are involved to reduce subjective random variability, their distributions are computed pooling all samples of the group members. The onesided significance of the pairwise distribution shifts (blue to red, same threshold) can be quantified using a nonparametric statistics of Brunner and Munzel [53]. This method is free of any assumption regarding the shape of the distributions. For the case with w(i,j)<0.2 the Ndim distribution of the healthy subject is significantly shifted to higher complexities compared to the distribution of the depressed subject. For the other cases (w(i,j)>0.2,0.3,0.4) the upwards shift is reversed ending at higher Ndim values for the depressed subject. For the subjectgroup and groupgroup comparisons, panels f),g), this upwards shift is weakened, we find for the Pvalues P _{subjectsubject}<P _{subjectgroup}<P _{groupgroup}<10^{−10}. The subjectsubject comparisons are in line with that of the thresholding method, but more instructive, as their statistical significance can be quantified; see [38] for applications of this statistical technique to edge distributions of weighted networks.
The calculation of Ndim is based on the maximum kcliques or on the largest sets of nodes with distances ≤k. Maximum kcliques include nodes with high functional connectivity w(i,j) or with similar resting state fMRI signals in the corresponding grey matter ROIs. Figuring out the spatial locations of maximum kclique ROIs in the brain, we can deduce localized neural information. This point is complicated by the fact that several maximum kcliques may exist in any binary random realization of the weighted network. For the cases with w(i,j)>0.2, we find that for depressed subjects this multiplicity (median = 11) increases compared to the multiplicity of healthy subjects (median = 6), the characteristic path length k (median = 2) is on the contrary rather stable. This indicates, as Ndim increases, an increasing connectivity for depression in some regions of the brain. For further analysis we focus on nodes which are contained in the intersection of all maximum kcliques of a binary random realization and call these nodes core nodes. A core node is a hub node with distances ≤k to all maximum kclique nodes of a binary network. For an ensemble of binary realizations we can quantify the probability of any node to be a core node. In Fig. 6 a, b this probability is plotted for the healthy (blue) and the depressed (red) subject of Fig. 5 e versus the node labels 1 to 56. A complete mapping of these labels to grey matter ROIs can be found in the Supplement of [45], the ROIs are derived from HarvardOxford brain atlas. This labeling is symmetric for the left and the right hemisphere of the brain. In Fig. 6 c, d core node probabilities are given for the healthy group and for the group of depressed subjects of Fig. 5 g. For nodes with higher probabilities, the grey matter ROIs for depressed patients are given in Table 2. Compared to the healthy subject, the probability of these nodes is strongly increased for the subject with depression (Fig. 6 a, b), some of these nodes have probabilities ∼1; such nodes play in nearly all binary realizations the role of a core node or, the corresponding weighted network favors these core nodes. The clusters of core nodes with enhanced probabilities are nearly symmetric in both hemispheres indicating nonrandom (systematic) modifications of connectivity under depression (Figure 6 a, b). For the groupgroup comparison, the patterns of the node probabilities (Fig. 6 c, d) are similar to the subjectsubject case, but the probabilities for the depressed case are weakened. The situation for the groupsubject, see Fig. 5 f, core nodes is not shown, but is quite similar to the subjectsubject case, as can be easily inferred. Summarizing, we find that depression enhances the number of core nodes and consequently the functional connectivity between the maximum kcliques. The grey matter core ROIs under depression which are active for all comparisons are : frontal pole (1, left, right), insular cortex (2, left, right), superior frontal gyrus (3, left), middle frontal gyrus (4, left), cingulate gyrus, anterior division (29, left), see Table 2 and Fig. 6.
Applications of functional distances
An application of functional distances is performed on correlation networks based on dualtask fMRI measurements; see Data. To keep the CPU time within limits, we reduced the number of voxels in the data mask of a subject from 60,000 to 1,208 thus eliminating coefficients in the correlation matrix. This lowers the percolation thresholds p and the complexity but an essential result in the work by Gallos et al. [22, 23], where separated fractal brain modules collapse for lower p into weakly connected nonfractal components, is still approximately valid. To compare the complexity Ndim with boxcounting dimensions in such a situation, we extended the box counting algorithm for bcm to be directly applicable to the weighted network \(\widetilde {W}_{n\times n}\) (path length = sum of weighted edges). Following the procedure of Gallos et al., the correlation network W _{ n×n } of a subject is thresholded by p in the following way: If w(i,j)<p then w(i,j):=0. Then we apply the chain of transformations as described in ‘Methods’. The scaling factor c is calculated by
for k=10^{2}. In order to guarantee that \(c\cdot \min \{\widetilde {w}(i,j) > 0 : i,j = 1, \ldots n\} \geq 1\) and to obtain an approximating multigraph M _{ n×n } of \(c\,\widetilde {W}_{n\times n}\), k must satisfy the inequality
For the threshold p=0.885, we find two large connected components of \(\widetilde {W}_{n\times n}\) of nearly equal size (size _{1}=511 and size _{2}=499); see Fig. 7 a, b. For p=0.85, we find a collapsed large component in \(\widetilde {W}_{n\times n}\) of size = 1147; see Fig. 7 c. The corresponding binary distance graphs G _{ n×n } including maximum cliques are shown in Fig. 7 df. The complexities of W _{ n×n } are Ndim_{1}=1.51 and Ndim_{2}=1.76, respectively. For the large component we find Ndim=2.06. The corresponding mean topological distances of \(\widetilde {W}_{n\times n}\) are μ _{1}=35, μ _{2}=23, and μ=17. The log− log plots for bcm (weighted box sizes are averaged as in [33]) are shown in Fig. 7 gi. Their R ^{2}values are R ^{2}=0.99, respectively, R ^{2}=0.98, for bcm _{1}=1.43, respectively, bcm _{2}≈1.41. For the collapsed component we obtain R ^{2}=0.95; the corresponding log− log plot is too nonlinear to estimate a unique FD. These results indicate that the component of size _{1} is a fractal, the component of size _{2} may be close to a fractal, and the collapsed component is a nonfractal graph. Please note that in contrast to [22, 23], bcm is calculated by boxcovering on the weighted graph \(\widetilde {W}_{n\times n}\).
Monotonicity and the maximum property
Monotonicity and the maximum property (see C3 and C4 in Background) are essential properties of topological or fractal dimensions [15, 54, 55]. We show that Ndim satisfies both criteria for fractal and nonfractal graphs in an appropriate way just as xdim does for sets in \(\mathbb {R}^{n}\) [13]. The detailed proofs of both theorems are given in the next section. We refer to [56] for graphtheoretic definitions and notation, and to [34] for details about weighted graphs. In the following we add parameters to Ndim in oder to clarify the interpretation.
Monotonicity: For induced connected subgraphs G _{1} and G _{2} of a binary or weighted connected finite graph G, we have the following implication.
where μ is the average and m is the minimum positive distance. The parameters μ and m may be derived from G or from G _{ i }. (The proofs in the next section do not depend on a specific choice of μ and m). This relation is an approximate version of Condition C3.
For infinite graphs G _{1} and G _{2} with μ _{1}<μ _{2} and μ _{1},μ _{2}→∞, the exact version of Condition 3 reads as follows:
as
Here it was assumed that all limits exist.
Maximum property: For induced connected subgraphs G _{1} and G _{2} of a binary or weighted finite connected graph G and for
where E(G;G _{1},G _{2}) denotes the set of all edges of G directly connecting vertices V of G _{1} and G _{2}, we obtain the following estimates:
for fixed μ and m. Here we set lb:= log2.
The above relation for Ndim is also called the quasi maximum property in [13], implying that the complexity Ndim of a finite graph is determined approximately by a subgraph with maximum clique cardinality. The error of this approximation reduces as the average distance μ increases, thus approaching the exact maximum property (C4) for infinite graphs G _{1}, G _{2}, G _{1}⊎G _{2} when μ _{1},μ _{2}→∞:
The existence of all limits was assumed.
Proofs
In this section, we proof the monotonicity and maximum properties of Ndim for binary and weighted connected finite graphs W. We use the transformation W→ multigraph M→ distance graphD→ binary distance graph G, as described in Application of Ndim to weighted graphs /3, and present the proofs for the maximum cliques \({\text {cl}_{\max }} (G)\). The complexity Ndim for W is finally estimated by
where μ denotes the average distance and m the minimum positive distance derived from D. For graph theoretic definitions and notation, see [34, 56].
Monotonicity: For induced subgraphs G _{1} and G _{2} of G, we have the implication:
for connected as well as disconnected subgraphs G _{1} and G _{2}.
Proof.
If G _{1}=G _{2}, the identity is true. If, w.l.o.g., G _{1}⊂G _{2}, then the vertex cardinality \({\text {cl}_{\max }}(G_{1})\) is a lower bound for \({\text {cl}_{\max }}(G_{2})\) implying the righthand side of (5).
Assume now that the induced connected subgraphs W _{ i }⊆W have the property that W _{1}⊆W _{2}. They are transformed to induced subgraphs G _{ i } in the distance graph G with G _{1}⊆G _{2}. By (5), we have Ndim(W _{1})≤Ndim(W _{2}), for fixed μ and m based on W.
To clarify the understanding of the proof of the maximum property, a binary graph (A) with k=⌊μ⌋=2 and its binary distance graph (B) are shown in Fig. 8. The maximum 2clique in (A) and the corresponding maximum clique in (B) have identical vertex cardinality and are indicated in red. If the weight m(11,10)=1 in (A) is modified to m(11,10)=3, we still have k=2 and the binary distance graph (B) becomes disconnected. Edge eliminating cuts (green and blue lines) are indicated in (A) and (B). Any cut separates a graph G into a pair of disjoined induced subgraphs G _{1} and G _{2} with G=G _{1}⊎G _{2}. Note that in a graph G several maximum cliques with identical vertex cardinalities \({\text {cl}_{\max }} \) may exist. Due to their distance dependence, kcliques of subgraphs G _{ i } may involve paths outside of G _{ i }, as see by the green cut in (A). If the distances in (A) are artificially restricted to G _{ i }, statement b) in the following Lemma may not be fulfilled: Apply, e.g., the modification m(11,10)=3 and the green cut. Then \(({\text {cl}_{\max }}(G) = 10) \!>\! \left ({\text {cl}_{\max }}(G_{1}) = 2\right) + \left ({\text {cl}_{\max }}(G_{2}) = 7)\right)\) in (B), where \({\text {cl}_{\max }}(G_{2}) = \{2,3,4,6,7,9,13\}\).
Maximum property: For induced connected subgraphs W _{1} and W _{2} of a binary or weighted finite connected graph W and for
where E(W;W _{1},W _{2}) denotes the set of all edges of W directly connecting vertices V of W _{1} and W _{2}, we obtain the following estimates:
for fixed μ and m. Here we set lb:= log2.
We prove the maximum property in several steps.
Lemma.
Assume that G _{1} and G _{2} are induced connected subgraphs of a finite binary distance graph G. If G _{1}∩G _{2}=∅ and G _{1}⊎G _{2} is connected, then we have for the vertexcardinalities · the following estimates:

(a)
\(\max \left \{{\text {cl}_{\max }}(G_{1}), {\text {cl}_{\max }}(G_{2})\right \} \leq {\text {cl}_{\max }}(G_{1}\uplus G_{2})\);

(b)
\({\text {cl}_{\max }}(G_{1} \uplus G_{2}) \leq {\text {cl}_{\max }}(G_{1})+ {\text {cl}_{\max }}(G_{2})\).
Proof of (a): Applying the operation ⊎ to G _{1} and G _{2} implies that G _{ i }⊆G _{1}⊎G _{2}, for i=1,2; see Fig. 8 for an illustration. By monotonicity, this set containment yields that \({\text {cl}_{\max }}(G_{i}) \leq {\text {cl}_{\max }}(G_{1}\uplus G_{2})\), which gives statement (a).
Proof of (b):

1)
Assume that no eliminated edge connecting G _{1} and G _{2} is contained in any of the n≥1 maximum cliques of G _{1}⊎G _{2}. Then, (b) holds as an inequality; see the green cut in Fig. 8.

2a)
Assume that only one maximum clique \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\) exists and that eliminated edges connecting G _{1} and G _{2} are contained in this maximum clique \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\); see blue cut in Fig. 8. Using the notation \({\text {cl}_{\max,{i}}} : = {\text {cl}_{\max }}(G_{1}\uplus G_{2})\) restricted to G _{ i }, we have \({\text {cl}_{\max }}(G_{1}\uplus G_{2}) = {\text {cl}_{\max,{1}}}\) \(+{\text {cl}_{\max,{2}}}\) and (b) holds as an identity in case \({\text {cl}_{\max,{i}}} = {\text {cl}_{\max }}(G_{i})\), for i=1,2. On the other hand, (b) is satisfied as an inequality if \({\text {cl}_{\max,{i}}} < {\text {cl}_{\max }} (G_{i})\), for i=1 or i=2. If \({\text {cl}_{\max,{i}}}\) contains only a single vertex then \({\text {cl}_{\max,{i}}} = 1\).

2b)
First assume that we have n>1 maximum cliques \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\). For a cut that eliminates edges of m (m<n) maximum cliques \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\), the inequality is true, as there exists at least one maximum clique \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\subseteq G_{i}\) for i=1 or i=2. If the edges of all n maximum cliques \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\) are removed, then we have for every maximum clique \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\) the identity \({\text {cl}_{\max }}(G_{1}\uplus G_{2}) = {\text {cl}_{\max,{1}}}+{\text {cl}_{\max,{2}}}\). Suppose now that there exists a maximum clique \({\text {cl}_{\max }}(G_{1}\uplus G_{2})\) with \({\text {cl}_{\max,{1}}} = {\text {cl}_{\max }} (G_{1})\). Then, \({\text {cl}_{\max }} (G_{2}) \geq {\text {cl}_{\max,{2}}}\) which yields (b). On the contrary, if we assume that for all n maximum cliques \({\text {cl}_{\max,{1}}} < {\text {cl}_{\max }} (G_{1})\), then inequality holds in (b). \(\square \)
Now overlapping subgraphs are analyzed. Two cuts are involved to separate G _{1} and G _{2}.
Remark 1.
Statements (a) and (b) in the Lemma also hold when G _{1}∩G _{2}≠∅ for connected subgraphs G _{1} and G _{2}.
Proof of (a): Follows the same arguments as those given in the proof of the lemma.
Proof of (b): Assume w. l. o. g. that \(\widetilde {G}_{1} := (G_{1} \uplus G_{2})  G_{2}\) is connected, where \(\widetilde {G}_{1} \subset G_{1}\), \(\widetilde {G}_{1}\cap G_{2} = \emptyset \), and \(\widetilde {G}_{1}\uplus G_{2} = G_{1}\uplus G_{2}\). Then statement (b) in the Lemma holds. By monotonicity, (b) is then also true for the subgraphs G _{1} and G _{2}. If \(\widetilde {G}_{1}\) is not connected, we follow the same argumentation applying the definition of a maximum clique for disconnected binary graphs as given in Application of Ndim to weighted graphs /3c. \(\square \)
Remark 2.
Both statements in the Lemma are also valid if G _{1} and G _{2} are connected, G _{1}∩G _{2}=∅, and G _{1}⊎G _{2} is disconnected.
Proof.
Statements (a) and (b) hold trivially as an equality, respectively, an inequality.
Remark 3.
If the induced subgraphs G _{1} and G _{2} are disconnected, then statements (a) and (b) in the Lemma are valid.
Proof.
Apply the argumentation given in the above proofs with trivial modifications.
Remark 4.
Statement (b) in the Lemma and Remarks 1, 2, or 3 imply that
Combining (a) and (b ^{′}) and taking the logarithm to base 2 yields the maximum property.
Discussion
We introduced in this paper a new measure of complexity, Ndim, for binary and weighted graphs. Numerical applications to grids, fractal and nonfractal models, and to human brain data are complemented by the proofs of some general mathematical properties of Ndim. The term complexity can be understood in a picturesque manner if we regard complete graphs with finite size and constant weights w(i,j)=c>0. In this case, Ndim= log(size)/ log2. Such graphs combine maximum complexity with maximum cost thus indicating that for a fixed size increasing complexity may be due to a proliferation of increasing network connectivity. The concept Ndim is closely related to boxcovering dimensions for fractal graphs derived from log− log plots [10, 11]. Although Ndim, if applied to fractal graphs, can be interpreted as a fractal dimension, Ndim does not rely on log− log plots, which quickly become uninformative losing linearity by hub attraction or noise.
The construction of Ndim is motivated by the extended counting method, proposed by Sandau and Kurz, see [13, 14], to calculate a fractal dimension for point sets in \(\mathbb {R}^{n}\), and relies for graphs essentially on the computation of maximum kcliques in binary or multigraphs. The use of distancebased kcliques to count the maximum vertex cardinality of “tightly knit vertex groups” is somewhat arbitrary; diameterbased kclubs [30] might also be reasonable candidates but would require substantially longer CPU times for some of our applications. For the measure of cohesion we used k=⌊average distance⌋, where the average distance of a graph is a measure for its compactness. The alternative k=⌊diameter/2⌋, which is closer to Sandau’s original proposal was also numerically tested. We found that for graphs with small distances, Ndim computed by using k=⌊diameter/2⌋ lost robustness when short path graphs were added randomly.
Fractal dimensions FD should satisfy (at least approximately) a list of conditions given in Background. For graphs, the following conditions are relevant: FDs are invariant to multiplicative factors, FDs are extensions of topological dimensions, different FDs for selfsimilar fractal graphs are equivalent, and the monotonicity and maximum properties are satisfied (C2–C6). According to the definitions of Ndim for binary and weighted graphs, Ndim satisfies factor invariance (C2). In Ndim and manifolds, we showed that Ndim approaches the topological dimensions of grid graphs (C5) as their size increases. In addition, we quantified that Ndim converges faster than the boxcovering dimension MEMB to the topological dimension of a grid. Applying cost and global efficiency to the regular latices, we find that both topological measures of complexity tend to zero as the size increases losing complexity information; see Table 1. Models of selfsimilar fractal binary graphs are explored with Ndim and the boxcovering dimensions MEMB and bcm, in order to study the agreement between FDs for selfsimilar fractal graphs (C6). These models are iteratively generated and increase in size, diameter, and node degree, for an increasing generation parameter g. In the limit g→∞, a boxcovering dimension of the model can be calculated analytically. We found that with increasing parameter g the quantities Ndim, MEMB, and bcm approach the model dimension (see Fig. 2). CPUtime limitations prevented the investigation for graphs of size >4^{7}. Different degrees of hub attraction or noise destroy the fractality of a graph. The boxcovering dimensions are no longer well defined in such cases in contrast to Ndim which is applicable beyond fractality. (See Fig. 3 for an exploration of nonfractal models with hub attraction).
The iteration of the graph models applies an inverted renormalization process; the growth of the parameter g reduces the length scale and increases the resolution of a graph. Accordingly, the iteration process can be interpreted as an increased network parcellation. We see in Fig. 3 that Ndim is for e>0.25 not scale invariant, especially for smallworld graphs (e=1); however, for fractal graphs or for graphs close to fractality (e≤0.25) Ndim is quite stable for g>5, see also Fig. 2. Studying the effect of different spatial parcellations of the human brain on graph metrics is an important clinical issue [57], it relates to the problem of comparability of studies with different brain parcellations. It is shown in [57] that topological metrics of resting state fMRI networks vary with the spatial parcellation scales; however, some qualitative properties, like smallworldness or scalefreeness are stable. This differs from our results, where the metric Ndim is increasingly stable for fractal graphs. Fractal graphs were detected in the human brain by Gallos et al. [22, 23] for dual task fMRI data and convenient thresholding (see Application of functional distances); see also Fig. 7. For two different spatial brain parcellations the results were similar, see Supplement of [22]. A systematic investigation of the dependence of their findings on spatial brain parcellations was to our knowledge however not performed. Studying the dependence of complexity Ndim for real data on different brain parcellations is beyond the scope of the present paper and may be a topic of future research. At present we can only say that scale invariance of Ndim can be achieved if the spatial brain parcellation induces a network parcellation of a fractal network.
The conditions monotonicity and maximum property (C3, C4) are outlined and proven in Proofs. Monotonicity asserts that the complexity Ndim cannot decrease for graphs G _{1}, G _{2} with G _{1}⊆G _{2}. The maximum property claims that the complexity Ndim of a graph G can be estimated by the complexity of a subgraph of G. We showed that for binary and weighted, fractal and nonfractal graphs monotonicity is approximately satisfied for finite graphs. The same conclusion holds for the maximum property. These approximations improve as the average distance increases. In the limit, as the average distance approaches infinity, conditions C3 and C4 hold exactly. Conditions C3, C4 are essential properties of any concept of dimension [15, 54, 55], may it be topological or fractal. As Ndim satisfies conditions C2C6 at least approximately, we may interpret Ndim when applied to fractal graphs as a FD, similar to MEMB or bcm. More generally, without restricting to fractal graphs, we may interpret Ndim as a dimension measuring complexity.
Biomedical networks are frequently modeled by weighted graphs. Methods to apply Ndim to weighted graphs were presented in Application of Ndim to weighted graphs. All three methods mapped the weighted graph to convenient binary graphs, where a maximum kclique was computed. Thresholding proved to be the fastest method, but simplifies any weight w(i,j)>0 to a yesno edge a(i,j)=1 or 0. Comparing with cost and efficiency, we find that Ndim has enhanced power to differentiate connectivity in a healthy and a depressed subject; see Fig. 4. More information about the weights is maintained in the Monte Carlo method, where w(i,j) is mapped to an ensemble of binary edges; the higher w(i,j) the more frequently a(i,j)=1 is sampled. Mapping a weighted graph to an ensemble of binary random graphs implies that the complexity of the weighted graph is described by a distribution of complexities Ndim of the corresponding binary graphs; see Fig. 5. Therefore, the difference of the complexity between two weighted graphs can be made statistically relevant by significance testing, analyzing the shift in the two Ndim distributions [53]. This may be of importance for a personalized clinical analysis of the abnormal functional connectivity under depression. As Ndim is a regional measure, core nodes within the maximum kcliques can be detected to localize ROIs which act as communication centers or hubs. Our analysis shows that the probability for such hubs is increased for subjects under depression; see Fig. 6.
In a third approach, using functional distances, the weighted graph was transformed to a binary kdistance graph and Ndim was calculated via the cardinality of its maximum 1clique. We applied this method to a dualtask fMRI data set [22, 23] and compared Ndim with the boxcovering dimension bcm that was modified for a direct application to weighted graphs. For a high percolation threshold (p=0.885) we found, similar to Gallos et al. [22, 23], large connected components which are at least close to a fractal with rather low complexities Ndim and bcm. For a lower threshold (p=0.85), the components collapse into a nonfractal network with lower average distance and higher complexity Ndim. Due to the nonlinear log− log plot for p=0.85, bcm can no longer be calculated reliably over the entire range of box sizes. This demonstrates the advantage of Ndim, which can quantify complexity beyond fractality. The limited applicability of the boxcovering dimension bcm is apparent already for R ^{2} coefficients with R ^{2}<0.98; see Fig. 7.
The weights of a network can be altered by the application of thresholding techniques or by noise. Direct thresholding is applied in Figs. 4, 5 and 7. In all cases correlation thresholding modifies the network complexity, in Fig. 7 even the type – it separates strongly connected fractal subgraphs from a mixed compound. This influence of the threshold on complexity is intuitively clear: Different correlation thresholds in, e.g., fMRI data, focus on signals with a different degree of similarity; the lower the threshold, the higher the complexity due to monotonicity (See Fig. 4). In some studies correlation thresholding is additionally constrained by the condition of identical cost for the pair to be compared. We reparametrized the results of Fig. 4 and achieved for cost =0.1,0.2,0.3,0.4 for efficiency and Ndim smaller differences between healthy and depressed patients compared to direct correlation thresholding; all differences of efficiency were <0.01, the differences of Ndim were 0.08,0.1,0.5, 0.58, respectively. We still find that Ndim has more informative power than efficiency. Next we look at robustness or noise sensitivity of Ndim. To test this, the weights of the correlation graph for Fig. 4 (threshold >0.2) were modified by adding uniformly distributed noise. For w(i,j)→w(i,j)+ε(i,j), ε(i,j)∈[−w(i,j)·α,w(i,j)·α], where α∈{0.1,0.2,0.3,0.5}, we obtained Ndim=3.7→Ndim= 3.7, 3.6, 3.6, and 3.3, respectively. Under the influence of such noise, Ndim is quite robust. Different is the effect of noise on binary graphs by adding randomly connections =1; e.g. complexity Ndim of the fractal model e=0, g=6 of Fig. 3 increases by 10,20,30 % if only 1,2,3 % random connections are added, see also [12].
Computational cost (CPU time) for large networks is a problem for any NP complete algorithm [30]. In our calculations of Ndim for the synthetic networks of Fig. 3 CPU times depend on the type characterized by parameter e and on the generation parameter g or size. We find for any type monotonicity of Ndim with size. This agrees with intuitive expectation, as renormalization (transition g→g−1) induces blurring reducing vertices and low distance connectivities; if renormalization produces a subgraph (g−1) of the graph (g) monotonicity of Ndim could be proven formally. The worst case (g=7, size =16.384) CPU times for the graphs with e=1,0.75,0.5 are CPU = 86, 69, 590 sec, for g=5 (size =1.024) CPU ∼1 sec in all cases. Critical are the cases e=0.25 and 0. We find, e.g., for the case e=0.25 and g=7 a CPU time of more than 6 hours, for g=4,5,6, a CPU time of only 1,1,10 sec. The mentioned CPU times are calculated without any constraints on the clique cardinalities; especially constraining the lower bound can improve speed [31]. Monotonicity enables a simple rule to estimate this lower bound exp(Ndim(g−1)· log(k(g)+1))=lower bound of maxkclique(g). If we incorporate this information as a constraint in our calculations we find the following CPU times: for e=0.25 and g=7 CPU =1650 sec, for e=0 (fractals) and g=5,6,7 CPU =1,30,1560 sec. For fractal networks based on real data, as used in Fig. 7 a,b, we needed only CPU ∼2 sec without constraint. This may be due to inherent type mixing or noise in real data networks. We tested computational cost for seven additional connectivity networks taken from the brainconnectivitytoolbox, see Data. We had to symmetrize some of the adjacency matrices, as Ndim is defined for undirected networks only, using the triangle above the diagonal of the matrix. In some cases the graphs were disconnected, in such cases the largest connected component was used. For these 7 networks complexity was calculated without constraints, CPU time was in all cases moderate, see Table 3. All CPU times were produced by a single processor (2.8 GHz) applied to the commercial software Mathematica 9 (constraining is implemented) and should be regarded only as rough estimate which may be improved by faster hard ware equipment or software. A more basic improvement is possible by application of parallel computing as was demonstrated in [58], where fast algorithms for the calculation of the maximum clique were developed and tested; we showed in Application of Ndim to weighted graphs that Ndim can always be calculated by a maximum clique (k=1). Applying 128 processors to the maximum clique calculation for a network a speeding up factor of ∼20 could be achieved. Implementation of this involved method was beyond our scope.
Conclusion
We presented a new measure, Ndim, to quantify complexity originating by the proliferation of edgeconnectivity in binary or weighted networks. Ndim is essentially determined by the cardinality of a maximum kclique of the graph and fulfills the conditions of a dimension. These dimensional properties guarantee a large informative power of Ndim, compared to cost and efficiency. In addition, for a fractal graph Ndim estimates its fractal dimension, like the recently proposed boxcovering dimensions. But, boxcovering dimensions cannot be calculated uniquely for fractals perturbed by noise caused by the addition of random edges or for graphs with hub attraction like small world graphs. For Ndim however, there is no such limitation; comparisons of complexity between all types of finite graphs can be performed. These features were demonstrated for model calculations and by comparisons of functional brain connectivity for healthy and depressed subjects. Due to this informative power and flexibility, Ndim may become a useful tool in biomedical studies performing comparisons of complexity of finite networks.
References
Vazquez A. Protein interaction networks. In: Alzate O, editor. Neuroproteomics. Boca Raton: CRC Press: 2010. p. 1–14.
Jing LS, Shah FFM, Mohamed MS, Hamram NL, Salleh AHM, Deris S, et al. Database and tools for metabolic analysis. Biotech Bioproc Eng. 2014; 19:568–85.
Bullmore ED, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci. 2009; 10:186–98.
Sporns O. From simple graphs to the connectome: Networks in neuroimaging. NeuroImage. 2012; 62:881–6.
Rubinov M, Sporns O. Complex network measures of brain connectivity: Uses and interpretations. NeuroImage. 2010; 52:1059–69.
Van Wijk BCM, Stam CJ, Daffertshofer A. Comparing brain networks of different size and connectivity density using graph theory. PLOSOne. 2010; 5:1–13.
Constantine G. Graph complexity and the Laplacian matrix in blocked experiments. Linear Multilinear Algebra. 1990; 28(1–2):49–56.
Pudlak P, Roedl V, Savicky P. Graph complexity. Acta Inform. 1988; 25(5):515–35.
Minoli D. Combinatorial graph complexity. Atti Accad Naz Lincei Rend Cl Sci Fis Mat Nat. (8). 1975; 59(6):651–61.
Song S, Havlin S, Makse HA. Selfsimilarity of complex networks. Nature. 2005; 433:392–395.
Song S, Havlin S, Makse HA. Origins of fractality in the growth of complex networks. Nat Phys. 2006; 2:275–81.
Kitsak M, Havlin S, Paul G, Riccaboni M, Pammolli F, Stanley HE. Betweenness centrality of fractal and nonfractal scalefree model networks and tests on real networks. Phys Rev E. 2007; 75:1–8.
Sandau K. A note on fractal sets and the measurement of fractal dimension. Physica A. 1996; 233:1–18.
Sandau K, Kurz H. Measuring fractal dimension and complexity – an alternative approach with an application. J Microscopy. 1996; 186:164–76.
Falconer K. Fractal Geometry, Second ed. New York: Wiley & Sons; 2005.
Lopes R, Betrouni N. Fractal and multifractal analysis: A review. Med Im An. 2009; 13:634–49.
Prigarin S, Sandau K, Kazmierczak M, Hahn K. Estimation of fractal dimensions: a survey with numerical experiments and software description. Int J Biomath Biostat. 2014; 2:167–80.
Mandelbrot B. How long is the coast of Britain? Statistical selfsimilarity and fractional dimension. Science. 1967; 156:636–8.
Gallos LK, Song C, Makse HA. A review of fractality and selfsimilarity in complex networks. Physica A. 2007; 386:686–91.
Kim JS, Goh KI, Kahn B, Kim D. Fractality and selfsimilarity in scalefree networks. New J Phys. 2007; 9. doi:10.1088/13672630/9/6/177.
Blagus N, Subelji L, Bajee M. Selfsimilar scaling of density in complex realworld networks. Physica A. 2012; 391:2798–802.
Gallos KL, Makse HA, Sigman M. A small world of weak ties provides optimal global integration of selfsimilar modules in functional brain networks. PNAS. 2012; 109:2825–30.
Gallos KL, Sigman M, Makse HA. The conundrum of functional brain networks: small worldefficiency or fractal modularity. Frontiers Phys. 2012; 3:1–9.
Lacasa L, GomezGardenes J. Correlation dimension of complex networks. Phys Rev Lett. 2013; 110:1–5.
Hahn K, Sandau K, Rodenacker K, Prigarin S. Novel algorithms to measure complexity in the human brain and to detect statistically significant complexitydifferences. Electronic Supplement of Journal MAGMA, vol. 19, Suppl 1: Springer Link; 2006. http://dx.doi.org/10.1007/s1033400600431.
Hahn K, Prigarin S, Rodenacker K, Sandau K. A fractal dimension for exploratory fMRI analysis. Proc. Intl. Soc. Magn. Reson. Med. 2007; 15:1858.
Prigarin S, Hahn K, Winkler G. Comparative analysis of two numerical methods to measure Hausdorff dimension of the fractional Brownian motion. Num Anal and Appl. 2008; 1:163–78.
Doyle JK, Graver JE. Mean distance in a graph. Discr Math. 1977; 17:147–54.
Goddard W, Oellermann OR. Distance in Graphs In: Dehner M, editor. Structural Analysis of Complex Networks. New York: Springer Verlag: 2011. p. 49–72.
Balasundaram B. Graph Theoretic Generalizations of Clique: Optimization and Extensions. PhD Thesis: Texas A&M University; 2007.
Carraghan R, Pardalos PM. An exact algorithm for the maximum clique problem. Oper Res Lett. 1990; 9:375–82.
Tomita E, Tanaka A, Takahashi H. The worstcase time complexity for generating all maximal cliques and computational experiments. Theor Comp Sc. 2006; 363:28–42.
Song C, Gallos LK, Havlin S, Makse HA. How to calculate the fractal dimension of a complex network: the boxcovering algorithm. J Stat Mech Theory Exp. 2007. doi:10.1088/17425468/2007/03/P03006.
Balakrishnan R, Ranganathan K. A Textbook of GraphTheory, Second ed. New York: Springer Verlag; 2012.
Newman MEJ. Analysis of weighted networks. Phys Rev E. 2004; 70:1–9.
Antoniou IE, Tsompa ET. Statistical analysis of weighted networks. Discret Dyn Nat Soc. 2008. doi:10.1155/2008/375452.
AlexanderBloch AF, Gogtay N, Meunier D, Birn R, Clasen L, Lalonde F, et al. Disrupted modularity and local connectivity of brain functional networks in childhoodonset schizophrenia front. Syst Neurosci. 2010; 4/147:1–16.
Hahn K, Myers N, Prigarin S, Rodenacker K, Kurz A, Förstl H, et al. Selectively and progressively disrupted structural connectivity of functional brain networks in Alzheimer’s disease – Revealed by a novel framework to analyze edge distributions of networks detecting disruptions with strong statistical evidence. NeuroImage. 2013; 81:96–109.
Ahnert SE, Garlaschelli D, Fink TMA, Cardarelli G. Ensemble approach to the analysis of weighted networks. Phys Rev E. 2007; 76.016101:1–5.
Ahnert SE, Garlaschelli D, Fink TMA, Cardarelli G. Applying weighted network measures to microarray distance matrices. J Phys A. 2008; 41:1–6.
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Phys Rep. 2006; 424:175–308.
IturriaMedina Y, Sotero RS, CanalesRodriguez EJ, AlemanGomez Y, MelieGarcia L. Studying the human brain anatomical network via diffusionweighted MRI and Graph Theory. NeuroImage. 2008; 40:1064–76.
Dijkstra EW. A note on two problems in Connexion with graphs. Numer Math. 1959; 1:269–71.
Cavique L, Mendes AB, Santos JMA. An Algorithm to Discover the kClique Cover in Networks. Lecture Notes in Computer Science. Vol. 5816: Springer Link; 2009, pp. 363–73. http://link.springer.com/chapter/10.1007%2F9783642046865_30#page1.
Meng C, Brandl F, Tahmasian M, Shao J, Manoliu A, Scherr M, et al. Aberrant topology of striatum’s connectivity is associated with the number of episodes in depression. Brain. 2014; 137:598–609.
Percival DB, Walden AT. Wavelet Methods for Time Series Analysis. Cambridge, UK: Cambridge University Press; 2002.
Sigman M, Jobert A, LeBihan D, Dehaene S. Parsing a sequence of brain activations at psychological times using fMRI. NeuroImage. 2007; 35:655–68.
Kaiser M, Hilgetag CC. Spatial growth of realworld networks. Phys Rev E. 2004; 69:036–103.
Marcelino J, Kaiser M. Critical paths in a metapopulation model of H1N1: Efficiently delaying influenza spreading through flight cancellation. PLoS Currents Influenza. 2012; 4:e4f8c9a2e1fca8. doi:10.1371/4f8c9a2e1fca8.
Choe Y, McCormick BH, Koh W. Network connectivity analysis on the temporally augmented C. elegans web: A pilot study. Soc Neurosci Abstracts. 2004; 30:921–9.
Kötter R. Online retrieval, processing, and visualization of primate connectivity data from the CoCoMac database. Neuroinformatics. 2004; 2:127–44.
Zhang J, Wang J, Wu Q, Kuang W, Huang X, He Y, et al. Disrupted brain connectivity networks in drugnaive, Firstepsiode major depressive disorder. Biol Psych. 2011; 70:334–42.
Brunner E, Munzel U. Nonparametric Behrens–Fisher problem: asymptotic theory and a smallsample approximation. Biom J. 2000; 42:17–25.
Aarts JM, Nishiura T. Dimensions and extensions. Amsterdam: NorthHolland Publishing Co; 1993.
Edgar GA. Measure, Topology, and Fractal Geometry, Second ed. New York: SpringerVerlag; 2008.
Diestel R. Graph Theory, Fourth ed. New York: SpringerVerlag; 2010.
Formito A, Zalesky A, Bullmore ET. Network scaling effects in graph analytic studies of the hman restingstate fMRI data. Front Syst Neurosci. 2010; 4/22:1–16.
Eblen JD. The Maximum Clique Problem: Algorithms, Applications and Implementations. PhD Thesis: University of Tennessee; 2010.
Acknowledgements
We thank Prof. Sandau for helpful comments and his continuing interest in the problem. We also thank Dr. Meng for permitting us to use his data and for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
KH proposed the new concept of complexity measure, performed several of the numerical tests, contributed to the proofs, the interpretations of the results, and helped to draft the manuscript. PM contributed to the proofs and to the analyses of new advanced mathematical methods involved in this study. He contributed to draft the manuscript and was responsible for its LaTex formulation. SP designed and implemented numerical algorithms for the calculation of boxcovering dimensions. In addition he performed with these algorithms model and real data analyses. All authors read, discussed, and finally approved the manuscript. Every author contributed substantially to the presented work; it is the result of a cooperation.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Hahn, K., Massopust, P.R. & Prigarin, S. A new method to measure complexity in binary or weighted networks and applications to functional connectivity in the human brain. BMC Bioinformatics 17, 87 (2016). https://doi.org/10.1186/s1285901609339
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285901609339