3off2: A network reconstruction algorithm based on 2point and 3point information statistics
 Séverine Affeldt^{1, 2},
 Louis Verny^{1, 2} and
 Hervé Isambert^{1, 2}Email author
https://doi.org/10.1186/s128590150856x
© Affeldt et al. 2016
Published: 20 January 2016
Abstract
Background
The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking of alternative Bayesian networks, or rely on the identification of structural independencies, which correspond to missing edges in the underlying network. Bayesian inference methods typically require heuristic search strategies, such as hillclimbing algorithms, to sample the superexponential space of possible networks. By contrast, constraintbased methods, such as the PC and IC algorithms, are expected to run in polynomial time on sparse underlying graphs, provided that a correct list of conditional independencies is available. Yet, in practice, conditional independencies need to be ascertained from the available observational data, based on adjustable statistical significance levels, and are not robust to sampling noise from finite datasets.
Results
We propose a more robust approach to reconstruct graphical models from finite datasets. It combines constraintbased and Bayesian approaches to infer structural independencies based on the ranking of their most likely contributing nodes. In a nutshell, this local optimization scheme and corresponding 3off2 algorithm iteratively “take off” the most likely conditional 3point information from the 2point (mutual) information between each pair of nodes. Conditional independencies are thus derived by progressively collecting the most significant indirect contributions to all pairwise mutual information. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3point information of unshielded triples. The approach is shown to outperform both constraintbased and Bayesian inference methods on a range of benchmark networks. The 3off2 approach is then applied to the reconstruction of the hematopoiesis regulation network based on recent single cell expression data and is found to retrieve more experimentally ascertained regulations between transcription factors than with other available methods.
Conclusions
The novel informationtheoretic approach and corresponding 3off2 algorithm combine constraintbased and Bayesian inference methods to reliably reconstruct graphical models, despite inherent sampling noise in finite datasets. In particular, experimentally verified interactions as well as novel predicted regulations are established on the hematopoiesis regulatory networks based on single cell expression data.
Keywords
Background
Two types of reconstruction method for directed networks have been developed and applied to a variety of experimental datasets. These methods are either based on Bayesian scores [1], [2] or rely on the identification of structural independencies, which correspond to missing edges in the underlying network [3], [4].
Bayesian inference approaches have the advantage of allowing for quantitative comparisons between alternative networks through their Bayesian scores but they are limited to rather small causal graphs due to the superexponential space of possible directed graphs to sample [1], [5], [6]. Hence, Bayesian inference methods typically require either suitable prior restrictions on the structures [7], [8] or heuristic search strategies such as hillclimbing algorithms [9]–[11].
By contrast, structure learning algorithms based on the identification of structural constraints typically run in polynomial time on sparse underlying graphs. These socalled constraintbased approaches, such as the PC [12] and IC [13] algorithms, do not score and compare alternative networks. Instead they aim at ascertaining conditional independencies between variables to directly infer the Markov equivalent class of all causal graphs compatible with the available observational data. Yet, these methods are not robust to sampling noise in finite datasets as early errors in removing edges from the complete graph typically trigger the accumulation of compensatory errors later on in the pruning process. This cascading effect makes the constraintbased approaches sensitive to the adjustable significance level α, required for the conditional independence tests. In addition, traditional constraintbased methods are not robust to the order in which the conditional independence tests are processed, which prompted recent algorithmic improvements intending to achieve orderindependence [14].
In this paper, we report a novel network reconstruction method, which exploits the best of these two types of structure learning approaches. It combines constraintbased and Bayesian frameworks to reliably reconstruct graphical models despite inherent sampling noise in finite observational datasets. To this end, we have developed a robust informationtheoretic method to confidently ascertain structural independencies in causal graphs based on the ranking of their most likely contributing nodes. Conditional independencies are derived using an iterative search approach that identifies the most significant indirect contributions to all pairwise mutual information between variables. This local optimization algorithm, outlined below, amounts to iteratively subtracting the most likely conditional 3point information from 2point information between each pair of nodes. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3point information of unshielded triples. Identifying structural independencies within such a maximum likelihood framework circumvents the need for adjustable significance levels and is found to be more robust to sampling noise from finite observational data, even when compared to constraintbased methods intending to resolve the orderdependence on the variables [14].
Constraintbased methods
Constraintbased approaches, such as the PC [12] and IC [13] algorithms, infer causal graphs from observational data, by searching for conditional independencies among variables. Under the Markov and Faithfulness assumptions, these algorithms return a Complete Partially Directed Acyclic Graph (CPDAG) that represents the Markov equivalent class of the underlying causal structure [3], [4]. They proceed in three steps detailed in Algorithm 1:

1) inferring unnecessary edges and associated separation sets to obtain an undirected skeleton.

2) orienting unshielded triples as vstructures if their middle node is not in the separation set (R _{0}).

3) propagating as many orientations as possible following propagation rules (R _{1 − 3}), which prevents the orientation of additional vstructures (R _{3}) and directed cycles (R _{2 − 3}) [15].
However, as previously stated, the sensitivity of the constraintbased methods to the adjustable significance level α used for the conditional independence tests and to the order in which the variables are processed (step 1) favors the accumulation of errors when the search procedure relies on finite observational data.
In this paper, we aim at improving constraintbased methods, Algorithm 1, by uncovering the most reliable conditional independencies supported by the (finite) available data, based on a quantitative information theoretic framework.
Maximum likelihood methods
Note, in particular, that the significance level of the Maximum likelihood approach is set by the number N of independent observational data points, as detailed in the Methods Section below.
Methods
Information theoretic framework
Inferring isolated vstructures vs nonvstructures from 3point and 2point information
As long recognized in the field [17], [18], 3point information, I(x;y;z), can be positive or negative (if I(x;y)<I(x;yz)), unlike 2point mutual information, which are always positive, $I(x;y)\ge 0$.
Hence, a significantly negative 3point information, I(x;y;z)<0, implies that a vstructure is more likely than a nonvstructure given the observed correlation data. Conversely, a significantly positive 3point information, I(x;y;z)>0, implies that a nonvstructure model is more likely than a vstructure model.
Note, in particular, that choosing the base with the lowest mutual information is consistent with the Data Processing Inequality expected for nonvstructures, Fig. 1b–d.
Hence, combining 3point and 2point information allows to determine the likelihood and the base of isolated vstructures versus nonvstructures. But how to extend such simple results to identify local vstructures and nonvstructures embedded within an entire graph $\mathcal{G}$?
Inferring embedded vstructures vs nonvstructures from conditional 3point and 2point information
To go from isolated to embedded vstructures and nonvstructures within a DAG $\mathcal{G}$, we will consider the Markov equivalent CPDAG of $\mathcal{G}$ and introduce generalized vstructures and nonvstructures, Fig. 1e–h. We will demonstrate that their relative likelihood, given the available observational data, can be estimated from the sign and magnitude of a conditional 3point information, I(x;y;z{u _{ i } }), Eq. 11. This will extend our initial result valid for isolated vstructures and nonvstructures, Eq. 7.
Let’s consider a pair of nonneighbor nodes x,y with a set of upstream nodes {u _{ i } }_{ n } , where each node u _{ i } has at least one direct connection to x (u _{ i } →x) or y (u _{ i } →y) or to another upstream node u _{ j } ∈{u _{ i } }_{ n } (u _{ i } →u _{ j } ) or only undirected links to these nodes (u _{ i } −x, u _{ i } −y or u _{ i } −u _{ j } ). Thus, given x,y and a set of upstream nodes {u _{ i } }_{ n } , any additional node zcan either be:

i) at the apex of a generalized vstructure, if all existing connections between x, y, {u _{ i } }_{ n } and z are directed and point towards z, Fig. 1e, or else,

ii) z has at least one undirected link with x, y or one of the upstream nodes u _{ i } (z−x, z−y or z−u _{ i } ) or at least one directed link pointing towards these nodes (z→x, z→y or z→u _{ i } ), Fig. 1f–h. In such a case, z might contribute to the mutual information I(x;y) and should be included in the set of upstream nodes {u _{ i } }_{ n } , thereby defining a generalized nonvstructure, Figs. 1f–h.
where I(x;y{u _{ i } }) is the conditional mutual information between x and y given {u _{ i } }, I(x;y{u _{ i } })=H(x{u _{ i } })+H(y{u _{ i } })−H(x,y{u _{ i } })−H({u _{ i } }).
where we introduced the conditional 3point information, I(x;y;z{u _{ i } })=I(x;y{u _{ i } })−I(x;yz,{u _{ i } }).
Hence, a significantly negative conditional 3point information, I(x;y;z{u _{ i } })<0, implies that a generalized vstructure is more likely than a generalized nonvstructure given the available observational data. Conversely, a significantly positive conditional 3point information, I(x;y;z{u _{ i } })>0, implies that a generalized nonvstructure model is more likely than a generalized vstructure model.
Yet, as the conditional 3point information, I(x;y;z{u _{ i } }), is in fact invariant upon permutations between x,y and z, it cannot indicate how to orient embedded vstructures or nonvstructures over the xyz triple, as already noted in the case of isolated vstructures and nonvstructures, above.
where we have used the chain rule, I(x;z,{u _{ i } }y)=I(x;z{u _{ i } },y)+I(x;{u _{ i } }y), before adding I(x;y;z,{u _{ i } }) on each side of the inequality. The corresponding inequality holds between I(x;y) and I(z,{u _{ i } };y), implying the Data Processing Inequality.
Finite size corrections of maximum likelihood
where I(x;y{Pa_{ y } }_{∖}_{ x } )=H(y{Pa_{ y } }_{∖}_{ x } )−H(y{Pa_{ y } }).
where r _{ x } ,r _{ y } and ${r}_{{u}_{i}}$ are the number of levels of the corresponding variables. The MDL complexity, Eq. 18, is simply related to the normalisation constant of the distribution reached in the asymptotic limit of a large dataset N→∞ (Laplace approximation). However, this limit distribution is only reached for very large datasets in practice.
Alternatively, the normalisation of the maximum likelihood can also be done over all possible datasets including the same number of data points to yield a (universal) Normalized Maximum Likelihood (NML) criteria [21], [22] and its decomposable [23], [24] and xysymmetric version, ${k}_{x;y\left\right\{{u}_{i}\}}^{{}^{\mathit{\text{NML}}}}$, defined in the Appendix.
However, unlike 3point information, I(x;y;z{u _{ i } }), 3point complexities are always positive, ${k}_{x;y;z\left\right\{{u}_{i}\}}>0$, provided that there are at least two levels for each implicated node ℓ∈x,y,z,{u _{ i } }, i.e. ${r}_{\ell}\ge 2$.
Hence, given a finite dataset, a significantly negative conditional 3point information, corresponding to I ^{′}(x;y;z{u _{ i } })<0, implies that a vstructure x→z←y is more likely than a nonvstructure provided that the structural independency, x ⊥ ⊥ y{u _{ i } }, is also confidently established as, I ^{′}(x;y{u _{ i } })<0. By contrast, a significantly positive conditional 3point information corresponds to I ^{′}(x;y;z{u _{ i } })>0 and implies that a nonvstructure model is more likely than a vstructure model, given the available observational data.
Probability estimate of indirect contributions to mutual information
and z can be iteratively added to the set of contributing nodes (i.e. {u _{ i } }←{u _{ i } }+z) of the top link x y=argmax_{ xy } R(x y;z{u _{ i } }) to progressively recover the most significant indirect contributions to all pairwise mutual information in a causal graph, as outlined below.
Robust inference of conditional independencies using the 3off2 scheme
The previous results can be used to provide a robust inference method to identify conditional independencies and, hence, reconstruct the skeleton of underlying causal graphs from finite available observational data. The approach follows the spirit of constraintbased methods, such as the PC or IC algorithms, but recovers conditional independencies following an evolving ranking of the network edges, R(x y;z{u _{ i } }), defined in Eq. 30.
3off2 algorithm
The 3off2 scheme can be used to devise a twostep algorithm (see Algorithm 2), inspired by constraintbased approaches, to first reconstruct network skeleton (Algorithm 2, step 1) before combining orientation and propagation of edges in a single step based on likelihood ratios (Algorithm 2, step 2).
Reconstruction of network skeleton
The 3off2 scheme will first be applied to iteratively remove edges with maximum positive contributions, I ^{′}(x;y;u _{ k } {u _{ i } }_{ k }_{−1})>0, corresponding to the most likely generalized nonvstructures (Eq. 23), while minimizing simultaneously the remaining 2point information, I ^{′}(x;y{u _{ i } }_{ k } ) (Eq. 24), consistently with the data processing inequality. Such 3off2 scheme (Algorithm 2, step 1) will therefore progressively lower the conditional 2point information terms, I ^{′}(x;y)>⋯>I ^{′}(x;y{u _{ i } }_{ k }_{−1})>I ^{′}(x;y{u _{ i } }_{ k } ) and might ultimately result in the removal of the corresponding edge, xy, but only when a structural independency is actually found, i.e. I ^{′}(x;y{u _{ i } }_{ n } )<0, as in constraintbased algorithms for a given significance level α. Yet, the skeleton obtained with the 3off2 scoring approach is expected to be more robust to finite observational data than the skeleton obtained with PC or IC algorithms, as the former results only from statistically significant 3point contributions, I ^{′}(x;y;u _{ k } {u _{ i } }_{ k }_{−1})>0, based on their quantitative 3off2 ranks, R(x y;u _{ k } {u _{ i } }_{ k }_{−1}).
The best results on benchmark networks using these quantitative 3off2 ranks are obtained with the NML score (see Results and discussion Section below). The MDL score leads to equivalent results, as expected, in the limit of very large datasets (see Appendix). However, with smaller datasets, the most reliable results with the MDL score are obtained using nonshifted instead of shifted 2point and 3point information terms in the 3off2 rank of individual edges, Eq. 30. This is because the MDL complexity tends to underestimate the importance of edges between nodes with many levels (see Appendix). For finite datasets, it easily leads to spurious conditional independencies, I ^{′}(x;y{u i})<0, when using shifted 2point and 3point information, Eq. 33, whereas using nonshifted information in the 3off2 ranks (Eq. 30) tends to limit the number of false negatives as early errors in {u _{ i } } can only increase $I(x;y\left\{\mathit{\text{ui}}\right\})\ge 0$, in the end, in Eq. 34.
Orientation of network skeleton
The skeleton and the separation sets resulting from the 3off2 iteration step (Algorithm 2, step 1) can then be used to orient the edges and to propagate orientations to the unshielded triples. However, while the constraintbased methods distinguish the vstructures orientation step (Algorithm 2, step 2) from the propagation procedure (Algorithm 1, step 3), the 3off2 algorithm intertwines these two steps based on the respective likelihood scores of individual vstructures and nonvstructures (Algorithm 2, step 2).
As stated earlier, the magnitude and sign of the conditional 3point information, I(x;y;z{u _{ i } }) (or equivalently the shifted 3point information, Eq. 23), indicate if a non vstructure is more likely than a vstructure. Hence, all the unshielded triples can be ranked by the absolute value of their conditional 3point information, that is, in decreasing order of their likelihood of being either a vstructure or a nonvstructure. As detailed in the step 2 of Algorithm 2, the most likely vstructure is used to set the first orientations, following R _{0} orientation rule. The possible propagations are then performed, following R _{1} propagation rule, starting from the unshielded triple having the most positive conditional 3point information. The following most likely vstructure is considered when no further propagation is possible on unshielded triples with greater absolute 3point information. If conflicting orientations arise (such as a→b←c & b→c←d), the less likely vstructure and its possible propagations are ignored.
Note that we only implement the R _{0} and R _{1} propagation rules, which are applied in decreasing order of likelihood. In particular, we do not consider propagation rules R _{2} and R _{3} which are not associated to likelihood scores but enforce the hypothesis of acyclic constraint.
As for the 3off2 skeleton reconstruction, the orientation/propagation step of 3off2 allows for a robust discovery of orientations from finite observational data as it relies on a quantitative framework of likelihood ratios taken in decreasing order of their statistical significance. During this step, 3off2 recovers and propagates as many orientations as possible in an iterative procedure following the decreasing ranks of the unshielded triples based on the absolute value of their conditional 3point information, I ^{′}(x;y;z{u _{ i } }).
Results and discussion
Tests on benchmark graphs
We also define additional Precision, Recall and Fscores taking into account the edge orientations of the predicted networks against the corresponding CPDAG of the benchmark networks. This amounts to label as false positives, all true positive edges of the skeleton with different orientation/nonorientation status as the CPDAG reference, T P _{misorient}, leading to the orientationdependent definitions T P ^{′}=T P−T P _{misorient} and F P ^{′}=F P+T P _{misorient} with the corresponding CPDAG Precision, Recall and Fscores taking into account edge orientations.
The alternative inference methods used for comparison with 3off2 are the PC algorithm [12] implemented in the pcalg package [25], [26] and Bayesian inference using the hillclimbing heuristics implemented in the bnlearn package [27]. In addition, we also compare the skeleton of 3off2 to the unoriented output of Aracne [28], an informationbased inference approach, which iteratively prunes links with the weakest mutual information based on the Data Processing Inequality. We have used the Aracne implementation of the minet package [29]. For each sample size, 3off2, Aracne, PC and the Bayesian inference methods have been tested on 50 replicates. Figures 2, 3, 4, 5 and 6 give the average results over these multiple replicates when comparing the CPDAG (solid lines) of the reconstructed network (or its skeleton, dashed lined) to the CPDAG (or the skeleton) of the benchmark network.
For each method, the plots presented in Figs. 2, 3, 4, 5 and 6 are those obtained for the parameters that give overall the best results over the five reconstructed benchmark networks (see Additional file 1, Figures S1S20). In particular, we used the stable implementation of the PC algorithm, as well as the majority rule for the orientation and propagation steps [14]. PC’s results are shown on Figs. 2, 3, 4, 5 and 6 for α=0.1. Decreasing α tends to improve the skeleton Precision at the expense of the skeleton Recall, leading in fact to worse skeleton Fscores for finite datasets, e.g. N≤1000 (see Additional file 1, Figures S1S5). The same trend is observed for CPDAG Fscores taking into account edge orientations, with best CPDAG scores at small sample sizes, obtained for larger α, e.g. N≤1000. Aracne threshold parameters for minimum difference in mutual information is set to ε=0, as small positive values typically worsen Fscores (see Additional file 1, Figures S6S10). Bayesian inference are obtained using BIC/MDL scores and hillclimbing heuristics with 100 random restarts [9] (see Additional file 1, Figures S11S15). Finally, the best 3off2 network reconstructions are obtained using NML scores with shifted 2point and 3point information terms in the rank of individual edges, see Methods. Using MDL scores, instead, leads to equivalent results, as expected, in the limit of very large datasets (see Appendix). However, with smaller datasets, the most reliable results with MDL scores are obtained using nonshifted instead of shifted 2point and 3point information terms in the 3off2 rank of individual edges, as discussed in Methods (see Additional file 1, Figures S16S20).
All in all, we found that the 3off2 inference approach typically reaches better or equivalent Fscores for all dataset sizes as compared to all other tested methods, i.e. Aracne, PC and Bayesian inference, as well as the MaxMin HillClimbing (MMHC) hybrid method [30] (see Additional file 1, Figures S21S25). This is clearly observed both on the skeletons (Figs. 2, 3, 4, 5 and 6 dashed lines) and even more clearly when taking the predictions of orientations into account (Figures 2, 3, 4, 5 and 6 solid lines).
Applications to the hematopoiesis regulation network
The reconstruction or reverseengineering of real regulatory networks from actual expression data has already been performed on a number of biological systems (see e.g. [28], [31]–[33]). Here, we apply the 3off2 approach on a real biological dataset related to hematopoiesis. Transcription factors play a central role in hematopoiesis, from which derive the blood cell lineages. As suggested in previous studies, changes in the regulatory interactions among transcription factors [34] or their overexpression [35] might be involved in the development of Tacute lymphoblastic leukaemia (TALL). The key role of the hematopoiesis and the potentially serious consequences of its disregulations emphasize the need to accurately establish the complex interactions between the transcription factors involved in this critical biological process.
Interactions reconstructed by 3off2 and alternative methods for a subnetwork of hematopoiesis regulation. → indicates a successfully recovered interaction including its direction as reported in the literature (see References). $\nRightarrow $ corresponds to a successfully recovered interaction, however, with an opposite direction as reported in the literature. ⌿ stipulates that no direct regulatory interaction has been inferred, while — corresponds to an undirected link. Note in particular that Aracne does not infer edge direction. See Additional file 1: Table S1 for supplementary statistics
11 known Regulatory  3off2  PC  PC  MMHC  MMHC  Bayes hc  Bayes hc  Aracne  

interactions  References  NML  α = 10 ^{ −1 }  α = 10 ^{ −2 }  BDe  BIC  BDe  BIC  ε=0 
Gata2 → Gfi1b  [36]  →  $\nRightarrow $  —  ⌿  ⌿  →  ⌿  ⌿ 
Gfi1 → Gata2  [36]  →  →  —  →  $\nRightarrow $  →  $\nRightarrow $  — 
Gfi1b $\leftrightarrows $ Gfi1  [36]  $\nRightarrow $  $\nRightarrow $  —  $\nRightarrow $  $\nRightarrow $  $\nRightarrow $  $\nRightarrow $  — 
Gfi1 → PU.1  [37]  →  →  ⌿  ⌿  ⌿  →  →  — 
Lyl1 → Gfi1  [38]  →  $\nRightarrow $  ⌿  ⌿  ⌿  →  $\nRightarrow $  — 
Ldb1 → Meis1  [39]  →  ⌿  ⌿  ⌿  ⌿  $\nRightarrow $  ⌿  ⌿ 
Ldb1 → Lyl1  [39]  →  ⌿  ⌿  ⌿  ⌿  ⌿  ⌿  ⌿ 
Erg → Lyl1  [40]  →  $\nRightarrow $  —  →  →  →  $\nRightarrow $  — 
Gata2 → Scl  [40]  $\nRightarrow $  →  —  →  →  →  →  — 
Gfi1b → Meis1  [41]  $\nRightarrow $  $\nRightarrow $  —  →  →  →  →  — 
Gata1 → Gata2  [42]  $\nRightarrow $  $\nRightarrow $  —  →  →  →  →  — 
Correct edges (out of 11)  (→/$\nRightarrow $/—)  11  9  7  6  6  10  8  8 
 Correct orientations  (→)  7  3  0  5  4  8  4  0 
 Mis/nonorientations  ($\nRightarrow $/ —)  4  6  7  1  2  2  4  8 
Missing links  (⌿)  0  2  4  5  5  1  3  3 
3off2 uncovers all 11 interactions for which specific experimental evidence has been reported in the literature (Fig. 7, red links: known activations; blue links: known repressions) as well as 30 additional links (Fig. 7, grey links: unknown regulatory interactions). By contrast, randomization of the actual data across samples for each TF leads to only 5.25 spurious interactions on average between the 18 TFs, instead of the 41 inferred edges from the actual data, and 1.62 spurious interactions on average, instead of the 16 interactions predicted among the 10 TFs involved in known regulatory interactions, Fig. 7. This suggests that around 10–13 % of the predicted edges might be spurious, due to inevitable sampling noise in the finite dataset. In particular, the 3off2 inference approach successfully recovers the relationships of the regulatory triad between Gata2, Gfi1b and Gfi1 as described in [36] and reports correct orientations for the edges involving Gata2 (Gfi1b and Gfi1 crossregulate in fact one another [36], Table 1). The network reconstructed by 3off2 also correctly infers the regulations of PU.1 by Gfi1 [37], Gfi1 by Lyl1 [38], Meis1 by Ldb1 [39], and the regulations of Lyl1 by Ldb1 [39] and Erg [40]. Finally, the interactions (Gata2 −SCL) [40], (Gfi1b −Meis1) [41] and (Gata1 −Gata2) [42] are correctly inferred, however, with opposite directions as reported in the literature. Yet, overall 3off2 outperforms most of the other methods tested for the reconstruction of the hematopoietic regulatory subnetwork (Table 1 and Additional file 1: Table S1). Only the Bayesian hillclimbing method using a BDe score leads to comparable results by retrieving 10 out of 11 interactions and correctly orienting 8 of them. These encouraging results from the 3off2 reconstruction method on experimentally proven regulatory interactions (red edges in Fig. 7) could motivate further investigations on novel regulatory interactions awaiting to be tested for their possible role in hematopoiesis (e.g. grey edges in Fig. 7).
Conclusions
In this paper, we propose to improve constraintbased network reconstruction methods by identifying structural independencies through a robust quantitative scorebased scheme limiting the accumulation of early FN errors and subsequent FP compensatory errors. In brief, 3off2 relies on information theoretic scores to progressively uncover the best supported conditional independencies, by iteratively “taking off” the most likely indirect contributions of conditional 3point information from every 2point (mutual) information of the causal graph.
Earlier hybrid methods have also attempted to improve network reconstruction by combining the concepts of constraintbased approaches with the robustness of Bayesian scores [30], [43]–[45]. In particular [43], have proposed to exploit an intrinsic weakness of the PC algorithm, its sensitivity to the order in which conditional independencies are tested on finite data, to rank these different orderdependent PC predictions with Bayesian scores. More recently [30], have also combined constraintbased and Bayesian approaches by first identifying both parents and children of each node of the underlying graphical model and then performing a greedy Bayesian hillclimbing search restricted to the identified parents and children of each node. This MaxMin HillClimbing (MMHC) approach tends to have a high precision in terms of skeleton but a more limited sensibility, leading overall to lower skeleton and CPDAG Fscores than 3off2 and Bayesian hill climbing methods on the same benchmark networks, Figures S21S25. Interestingly, however, the MMHC approach is among the fastest network reconstruction approaches, Figure S26, allowing for scalability to large network sizes [30].
The 3off2 algorithm is expected to run in polynomial time on typical sparse causal networks with low indegree, just like constraintbased algorithms.However, in practice and despite the additional computation of conditional 2point and 3point information terms, we found that the 3off2 algorithm runs typically faster than constraintbased algorithms for large enough samples, by avoiding the cascading accumulation of errors that inflate the combinatorial search of conditional independencies in traditional constraintbased approaches. Instead, we found that 3off2 running time displays a similar trend as Bayesian hillclimbing heuristic methods, Figs. 2, 3, 4, 5 and 6.
All in all, the main computational bottleneck of the present 3off2 scheme pertains to the identification of the best contributing nodes at each iteration. In the future, it could be interesting to investigate whether a more stochastic version of this 3off2 method, based on choosing one significant conditional 3point information instead of the best one, might simultaneously accelerate the network reconstruction and circumvent possible locally trapped suboptimal predictions through stochastic resampling.
Finally, another perspective for practical applications will be to include the possibility of latent variables and bidirected edges in reconstructed networks.
Appendix
Complexity of graphical models
where r _{ x } ,r _{ y } and r _{ j } are the number of levels of each variable, x, y and j. The MDL complexity, Eq. 40, is simply related to the normalisation constant reached in the asymptotic limit of a large dataset N→∞ (Laplace approximation). The MDL complexity can also be derived from the Stirling approximation on the Bayesian measure [46], [47]. Yet, in practice, this limit distribution is only reached for very large datasets, as some of the leastlikely $({r}_{y}1)\prod _{j}{r}_{j}$ combinations of states of variables are in fact rarely (if ever) sampled in typical finite datasets. As a result, the MDL complexity criteria tends to underestimate the relevance of edges connecting variables with many levels, r _{ i } , leading to the removal of false negative edges.
Note, in particular, that the MDL complexity term in Eq. 54 is readily obtained from Eq. 41 due to the Markov equivalence of the MDL score, corresponding to its xysymmetry whenever {Pa_{ y } }_{∖}_{ x } ={Pa_{ x } }_{∖}_{ y } . By contrast, the factorized NML score, Eq. 43, is not a Markovequivalent score (although its nonfactorized version, Eq. 42, is Markov equivalent by definition). To circumvent this nonequivalence of factorized NML score, we propose to recover the expected xysymmetry of ${k}_{x;y\left\right\{{u}_{i}\}}^{{}^{\mathit{\text{NML}}}}$ through the simple xysymmetrization of Eq. 44, leading to Eq. 55.
Publication costs
Publication costs for this article were funded by the Région IledeFrance.
Declarations
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 2, 2016: Bringing Maths to Life (BMTL). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements.
Additional file
Declarations
Acknowledgements
S.A. acknowledges a PhD fellowship from the Ministry of Higher Education and Research and support from Fondation ARC pour la recherche sur le cancer. L.V. acknowledges a PhD fellowship from the Région IledeFrance (DIM Institut des Systèmes Complexes) and H.I. acknowledges funding from CNRS, Institut Curie, Foundation PierreGilles de Gennes and Région IledeFrance.
Authors’ Affiliations
References
 Cooper GF, Herskovits E: A bayesian method for the induction of probabilistic networks from data. Mach Learn. 1992, 9 (4): 30947.Google Scholar
 Heckerman D, Geiger D, Chickering DM: Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Mach Learn. 1995, 20 (3): 197243.Google Scholar
 Spirtes P, Glymour C, Scheines R: Causation, Prediction, and Search. 2000, MIT press, Cambridge, MAGoogle Scholar
 Pearl J. Causality: Models, Reasoning and Inference, 2nd edn: Cambridge University Press; 2009.Google Scholar
 Chickering DM: Learning equivalence classes of bayesiannetwork structures. J Mach Learn Res. 2002, 2: 44598.Google Scholar
 Friedman N, Koller D: Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Mach Learn. 2003, 50 (1–2): 95125. 10.1023/A:1020249912095.View ArticleGoogle Scholar
 Koivisto M, Sood K: Exact bayesian structure discovery in bayesian networks. J Mach Learn Res. 2004, 5: 54973.Google Scholar
 Silander T, Myllymaki P: A simple approach for finding the globally optimal bayesian network structure. Proceedings of the TwentySecond Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI06). 2006, AUAI Press, Arlington, VirginiaGoogle Scholar
 Chickering DM, Geiger D, Heckerman D. Learning Bayesian networks: Search methods and experimental results. In: Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics: 1995. p. 112–28.Google Scholar
 Bouckaert RR: Properties of bayesian belief network learning algorithms. Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence. UAI’94. 1994, Morgan Kaufmann Publishers Inc., San Francisco, CA, USAGoogle Scholar
 Friedman N, Nachman I, Pe’er D: Learning bayesian network structure from massive datasets: The “sparse candidate”; algorithm. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. UAI’99. 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USAGoogle Scholar
 Spirtes P, Glymour C: An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev. 1991, 9: 6272. 10.1177/089443939100900106.View ArticleGoogle Scholar
 Pearl J, Verma T: A theory of inferred causation. In Knowledge Representation and Reasoning: Proc. of the Second Int. Conf. 1991, Morgan Kaufmann, San Mateo, CAGoogle Scholar
 Colombo D, Maathuis MH: Orderindependent constraintbased causal structure learning. J Mach Learn Res. 2014, 15: 3741782.Google Scholar
 Meek C: Causal inference and causal explanation with background knowledge. Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU. 1995, Morgan Kaufmann, San Francisco, CAGoogle Scholar
 Sanov IN: On the probability of large deviations of random variables. Mat Sbornik. 1957, 42: 1144.Google Scholar
 McGill WJ: Multivariate information transmission. Trans IRE Prof Group on Inf Theory (TIT). 1954, 4: 93111. 10.1109/TIT.1954.1057469.View ArticleGoogle Scholar
 Han TS: Multiple mutual informations and multiple interactions in frequency data. Inf Control. 1980, 46 (1): 2645. 10.1016/S00199958(80)904787.View ArticleGoogle Scholar
 Rissanen J: Modeling by shortest data description. Automatica. 1978, 14: 46571. 10.1016/00051098(78)900055.View ArticleGoogle Scholar
 Hansen MH, Yu B: Model selection and the principle of minimum description length. J Am Stat Ass. 2001, 96: 74674. 10.1198/016214501753168398.View ArticleGoogle Scholar
 Shtarkov YM: Universal sequential coding of single messages. Probl Inf Transm (Translated from). 1987, 23 (3): 317.Google Scholar
 Rissanen J, Tabus I: Kolmogorov’s structure function in mdl theory and lossy data compression. Adv. Min. Descrip. Length Theory Appl. 2005, MIT Press, Cambridge, MAGoogle Scholar
 Kontkanen P, Myllymäki P: A lineartime algorithm for computing the multinomial stochastic complexity. Inf Process Lett. 2007, 103 (6): 22733. 10.1016/j.ipl.2007.04.003.View ArticleGoogle Scholar
 Roos T, Silander T, Kontkanen P, Myllymäki P. Bayesian network structure learning using factorized nml universal models. In: Proc. 2008 Information Theory and Applications Workshop (ITA2008). IEEE Press: 2008.Google Scholar
 Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P: Causal inference using graphical models with the r package pcalg. J Stat Soft. 2012, 47 (11): 126. 10.18637/jss.v047.i11.View ArticleGoogle Scholar
 Kalisch M, Bühlmann P: Robustification of the pcalgorithm for directed acyclic graphs. J Comput Graph Stat. 2008, 17 (4): 77389. 10.1198/106186008X381927.View ArticleGoogle Scholar
 Scutari M: Learning Bayesian Networks with the bnlearn R Package. J Stat Soft. 2010, 35 (3): 122. 10.18637/jss.v035.i03.View ArticleGoogle Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, et al: ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinforma. 2006, 7 (Suppl 1): 710.1186/147121057S1S7.View ArticleGoogle Scholar
 Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinforma. 2008, 9: 46110.1186/147121059461.View ArticleGoogle Scholar
 Tsamardinos I, Brown LE, Aliferis CF: The MaxMin HillClimbing Bayesian Network Structure Learning Algorithm. Mach Learn. 2006, 65 (1): 3178. 10.1007/s1099400668897.View ArticleGoogle Scholar
 Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP: Causal proteinsignaling networks derived from multiparameter singlecell data. Science. 2005, 308 (5721): 52310.1126/science.1105809.View ArticlePubMedGoogle Scholar
 Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol. 2007, 3: 7810.1038/msb4100120.View ArticlePubMedPubMed CentralGoogle Scholar
 Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, et al: A yeast synthetic network for in vivo assessment of reverseengineering and modeling approaches. Cell. 2009, 137 (1): 17281. 10.1016/j.cell.2009.01.055.View ArticlePubMedGoogle Scholar
 Oram SH, Thoms JAI, Pridans C, Janes ME, Kinston SJ, Anand S, et al: A previously unrecognized promoter of lmo2 forms part of a transcriptional regulatory circuit mediating lmo2 expression in a subset of tacute lymphoblastic leukaemia patients. Oncogene. 2010, 29: 57965808. 10.1038/onc.2010.320.View ArticlePubMedGoogle Scholar
 Cleveland S, Smith S, Tripathi R, Mathias E, Goodings C, Elliott N, et al: Lmo2 induces hematopoietic stem cell like features in tcell progenitor cells prior to leukemia. Stem Cells. 2013, 31 (4): 88294. 10.1002/stem.1345.View ArticlePubMedPubMed CentralGoogle Scholar
 Moignard V, Macaulay I, Swiers G, Buettner F, Schütte J, CaleroNieto F, et al: Characterization of transcriptional networks in blood stem and progenitor cells using highthroughput singlecell gene expression analysis. Nat Cell Biol. 2013, 15: 36372. 10.1038/ncb2709.View ArticlePubMedPubMed CentralGoogle Scholar
 Spooner CJ, Cheng JX, Pujadas E, Laslo P, Singh H: A recurrent network involving the transcription factors pu.1 and gfi1 orchestrates innate and adaptive immune cell fates. Immunity. 2009, 31 (4): 57686. 10.1016/j.immuni.2009.07.011.View ArticlePubMedPubMed CentralGoogle Scholar
 Zohren F, Souroullas G, Luo M, Gerdemann U, Imperato M, et al: The transcription factor lyl1 regulates lymphoid specification and the maintenance of early t lineage progenitors. Nat Immunol. 2012, 13 (8): 7619. 10.1038/ni.2365.View ArticlePubMedPubMed CentralGoogle Scholar
 Li L, Jothi R, Cui K, Lee J, Cohen T, M. Gorivodsky IT, et al: Nuclear adaptor ldb1 regulates a transcriptional program essential for the maintenance of hematopoietic stem cells. Nat Immunol. 2011, 12: 129136. 10.1038/ni.1978.View ArticlePubMedGoogle Scholar
 Chan WYI, Follows GA, Lacaud G, Pimanda JE, Landry JR, Kinston S, et al: The paralogous hematopoietic regulators lyl1 and scl are coregulated by ets and gata factors, but lyl1 cannot rescue the early scl–/– phenotype. Blood. 2006, 109 (5): 19081916. 10.1182/blood200605023226.View ArticlePubMedGoogle Scholar
 Chowdhury AH, Ramroop JR, Upadhyay G, Sengupta A, Andrzejczyk A, Saleque S: Differential transcriptional regulation of meis1 by gfi1b and its cofactors lsd1 and corest. PLoS ONE. 2013, 8 (1): 53666View ArticleGoogle Scholar
 Göttgens B, Nastos A, Kinston S, Piltz S, Delabesse ECM, Stanley M, et al: Establishing the transcriptional programme for blood: the scl stem cell enhancer is regulated by a multiprotein complex containing ets and gata factors. The EMBO J. 2002, 21 (12): 3039050.View ArticlePubMedGoogle Scholar
 Dash D, Druzdzel MJ: A hybrid anytime algorithm for the construction of causal models from sparse data. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence. 1999, Morgan Kaufmann, San Francisco, CAGoogle Scholar
 Cano A, GomezOlmedo M, Moral S. A score based ranking of the edges for the pc algorithm. In: Proceedings of the European Workshop on Probabilistic Graphical Models (PGM): 2008. p. 41–8.Google Scholar
 Claassen T, Heskes T: A bayesian approach to constraint based causal inference. In Proc. of the 28th Conference on Uncertainty in Artificial Intelligence (UAI). 2012, Morgan Kaufmann, Burlington, MAGoogle Scholar
 Schwarz G: Estimating the dimension of a model. Ann Stat. 1978, 6: 4614. 10.1214/aos/1176344136.View ArticleGoogle Scholar
 Bouckaert RR: Probabilistic network construction using the minimum description length principle. Symbolic and Quantitative Approaches to Reasoning and Uncertainty (Clarke M, Kruse R, Moral S, eds). 1993, Springer, Berlin, GermanyGoogle Scholar
 Szpankowski W: Average Case Analysis of Algorithms on Sequences. 2001, John Wiley & Sons, New York, NYView ArticleGoogle Scholar
 Kontkanen P, Buntine W, Myllymäki P, Rissanen J, Tirri H. Efficient computation of stochastic complexity In: C. Bishop, B. Frey, editors. Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics: 2003. p. 233–8.Google Scholar
 Kontkanen P. Computationally efficient methods for mdloptimal density estimation and data clustering. 2009. PhD thesis. Helsinki University Print. Finland.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.