PROCEEDINGS  Open  Published:
3off2: A network reconstruction algorithm based on 2point and 3point information statistics
BMC Bioinformaticsvolume 17, Article number: S12 (2016)
Abstract
Background
The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking of alternative Bayesian networks, or rely on the identification of structural independencies, which correspond to missing edges in the underlying network. Bayesian inference methods typically require heuristic search strategies, such as hillclimbing algorithms, to sample the superexponential space of possible networks. By contrast, constraintbased methods, such as the PC and IC algorithms, are expected to run in polynomial time on sparse underlying graphs, provided that a correct list of conditional independencies is available. Yet, in practice, conditional independencies need to be ascertained from the available observational data, based on adjustable statistical significance levels, and are not robust to sampling noise from finite datasets.
Results
We propose a more robust approach to reconstruct graphical models from finite datasets. It combines constraintbased and Bayesian approaches to infer structural independencies based on the ranking of their most likely contributing nodes. In a nutshell, this local optimization scheme and corresponding 3off2 algorithm iteratively “take off” the most likely conditional 3point information from the 2point (mutual) information between each pair of nodes. Conditional independencies are thus derived by progressively collecting the most significant indirect contributions to all pairwise mutual information. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3point information of unshielded triples. The approach is shown to outperform both constraintbased and Bayesian inference methods on a range of benchmark networks. The 3off2 approach is then applied to the reconstruction of the hematopoiesis regulation network based on recent single cell expression data and is found to retrieve more experimentally ascertained regulations between transcription factors than with other available methods.
Conclusions
The novel informationtheoretic approach and corresponding 3off2 algorithm combine constraintbased and Bayesian inference methods to reliably reconstruct graphical models, despite inherent sampling noise in finite datasets. In particular, experimentally verified interactions as well as novel predicted regulations are established on the hematopoiesis regulatory networks based on single cell expression data.
Background
Two types of reconstruction method for directed networks have been developed and applied to a variety of experimental datasets. These methods are either based on Bayesian scores [1], [2] or rely on the identification of structural independencies, which correspond to missing edges in the underlying network [3], [4].
Bayesian inference approaches have the advantage of allowing for quantitative comparisons between alternative networks through their Bayesian scores but they are limited to rather small causal graphs due to the superexponential space of possible directed graphs to sample [1], [5], [6]. Hence, Bayesian inference methods typically require either suitable prior restrictions on the structures [7], [8] or heuristic search strategies such as hillclimbing algorithms [9]–[11].
By contrast, structure learning algorithms based on the identification of structural constraints typically run in polynomial time on sparse underlying graphs. These socalled constraintbased approaches, such as the PC [12] and IC [13] algorithms, do not score and compare alternative networks. Instead they aim at ascertaining conditional independencies between variables to directly infer the Markov equivalent class of all causal graphs compatible with the available observational data. Yet, these methods are not robust to sampling noise in finite datasets as early errors in removing edges from the complete graph typically trigger the accumulation of compensatory errors later on in the pruning process. This cascading effect makes the constraintbased approaches sensitive to the adjustable significance level α, required for the conditional independence tests. In addition, traditional constraintbased methods are not robust to the order in which the conditional independence tests are processed, which prompted recent algorithmic improvements intending to achieve orderindependence [14].
In this paper, we report a novel network reconstruction method, which exploits the best of these two types of structure learning approaches. It combines constraintbased and Bayesian frameworks to reliably reconstruct graphical models despite inherent sampling noise in finite observational datasets. To this end, we have developed a robust informationtheoretic method to confidently ascertain structural independencies in causal graphs based on the ranking of their most likely contributing nodes. Conditional independencies are derived using an iterative search approach that identifies the most significant indirect contributions to all pairwise mutual information between variables. This local optimization algorithm, outlined below, amounts to iteratively subtracting the most likely conditional 3point information from 2point information between each pair of nodes. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3point information of unshielded triples. Identifying structural independencies within such a maximum likelihood framework circumvents the need for adjustable significance levels and is found to be more robust to sampling noise from finite observational data, even when compared to constraintbased methods intending to resolve the orderdependence on the variables [14].
Constraintbased methods
Constraintbased approaches, such as the PC [12] and IC [13] algorithms, infer causal graphs from observational data, by searching for conditional independencies among variables. Under the Markov and Faithfulness assumptions, these algorithms return a Complete Partially Directed Acyclic Graph (CPDAG) that represents the Markov equivalent class of the underlying causal structure [3], [4]. They proceed in three steps detailed in Algorithm 1:

1) inferring unnecessary edges and associated separation sets to obtain an undirected skeleton.

2) orienting unshielded triples as vstructures if their middle node is not in the separation set (R _{0}).

3) propagating as many orientations as possible following propagation rules (R _{1 − 3}), which prevents the orientation of additional vstructures (R _{3}) and directed cycles (R _{2 − 3}) [15].
However, as previously stated, the sensitivity of the constraintbased methods to the adjustable significance level α used for the conditional independence tests and to the order in which the variables are processed (step 1) favors the accumulation of errors when the search procedure relies on finite observational data.
In this paper, we aim at improving constraintbased methods, Algorithm 1, by uncovering the most reliable conditional independencies supported by the (finite) available data, based on a quantitative information theoretic framework.
Maximum likelihood methods
The maximum likelihood ${\mathcal{\mathcal{L}}}_{\mathcal{G}}$ is related to the cross entropy $H(\mathcal{G},\mathcal{D})=\sum _{\left\{{x}_{i}\right\}}p\left(\right\{{x}_{i}\left\}\right)log\left(q\right(\left\{{x}_{i}\right\}\left)\right)$ between the “true” probability distribution p({x _{ i } }) from the data $\mathcal{D}$ and the approximate probability distribution $q\left(\right\{{x}_{i}\left\}\right)=\prod _{i}p\left({x}_{i}\right\left\{P{a}_{{x}_{i}}\right\})$ generated by the Bayesian network $\mathcal{G}$ with specific parent nodes $\left\{P{a}_{{x}_{i}}\right\}$ for each node x _{ i } , leading to [16],
where $\sum _{i}H\left({x}_{i}\right\left\{\phantom{\rule{0.3em}{0ex}}{\mathit{\text{Pa}}}_{{x}_{i}}\phantom{\rule{0.3em}{0ex}}\right\})$ is the (conditional) entropy of the underlying causal graph. This enables to score and compare alternative models through their maximum likelihood ratio as,
Note, in particular, that the significance level of the Maximum likelihood approach is set by the number N of independent observational data points, as detailed in the Methods Section below.
Methods
Information theoretic framework
Inferring isolated vstructures vs nonvstructures from 3point and 2point information
Applying the previous likelihood definition, Eq. 1, to isolated vstructures (Fig. 1a) and Markov equivalent nonvstructures (Fig. 1b–d), one obtains,
where I(x;y)=H(x)+H(y)−H(x,y) is the 2point mutual information between x and y, and,
where I(x;yz)=H(xz)+H(yz)−H(x,yz) is the conditional mutual information between x and y given z. Hence, one obtains the likelihood ratio,
where we introduced the 3point information function, I(x;y;z)=I(x;y)−I(x;yz), which is in fact invariant upon permutations between x,y and z, as seen in terms of entropy functions,
As long recognized in the field [17], [18], 3point information, I(x;y;z), can be positive or negative (if I(x;y)<I(x;yz)), unlike 2point mutual information, which are always positive, $I(x;y)\ge 0$.
More precisely, Eq. 5 demonstrates that the sign and magnitude of 3point information provide a quantitative estimate of the relative likelihoods of isolated vstructures versus nonvstructures, which are in fact independent of their actual nonconnected bases xy, xz or yz,
Hence, a significantly negative 3point information, I(x;y;z)<0, implies that a vstructure is more likely than a nonvstructure given the observed correlation data. Conversely, a significantly positive 3point information, I(x;y;z)>0, implies that a nonvstructure model is more likely than a vstructure model.
Yet, as noted above, 3point information, I(x;y;z), being symmetric by construction, it cannot indicate how to orient vstructures or nonvstructures over the xyz triple. To this end, it is however straightforward to show that the most likely base (xy, xz or yz) of the local vstructure or nonvstructure corresponds to the pair with lowest mutual information, e.g., $I(x;y)=\underset{\mathit{\text{xyz}}}{min}\left(I(s;t)\right)$, as shown by the likelihood ratios,
Note, in particular, that choosing the base with the lowest mutual information is consistent with the Data Processing Inequality expected for nonvstructures, Fig. 1b–d.
Hence, combining 3point and 2point information allows to determine the likelihood and the base of isolated vstructures versus nonvstructures. But how to extend such simple results to identify local vstructures and nonvstructures embedded within an entire graph $\mathcal{G}$?
Inferring embedded vstructures vs nonvstructures from conditional 3point and 2point information
To go from isolated to embedded vstructures and nonvstructures within a DAG $\mathcal{G}$, we will consider the Markov equivalent CPDAG of $\mathcal{G}$ and introduce generalized vstructures and nonvstructures, Fig. 1e–h. We will demonstrate that their relative likelihood, given the available observational data, can be estimated from the sign and magnitude of a conditional 3point information, I(x;y;z{u _{ i } }), Eq. 11. This will extend our initial result valid for isolated vstructures and nonvstructures, Eq. 7.
Let’s consider a pair of nonneighbor nodes x,y with a set of upstream nodes {u _{ i } }_{ n } , where each node u _{ i } has at least one direct connection to x (u _{ i } →x) or y (u _{ i } →y) or to another upstream node u _{ j } ∈{u _{ i } }_{ n } (u _{ i } →u _{ j } ) or only undirected links to these nodes (u _{ i } −x, u _{ i } −y or u _{ i } −u _{ j } ). Thus, given x,y and a set of upstream nodes {u _{ i } }_{ n } , any additional node zcan either be:

i) at the apex of a generalized vstructure, if all existing connections between x, y, {u _{ i } }_{ n } and z are directed and point towards z, Fig. 1e, or else,

ii) z has at least one undirected link with x, y or one of the upstream nodes u _{ i } (z−x, z−y or z−u _{ i } ) or at least one directed link pointing towards these nodes (z→x, z→y or z→u _{ i } ), Fig. 1f–h. In such a case, z might contribute to the mutual information I(x;y) and should be included in the set of upstream nodes {u _{ i } }_{ n } , thereby defining a generalized nonvstructure, Figs. 1f–h.
Then, similarly to the case of an isolated vstructure (Eq. 3), the maximum likelihood ${\mathcal{\mathcal{L}}}_{v}\left(\mathit{\text{xy}}\right)$ of a generalized vstructure pointing towards z from a base xy with upstream nodes {u _{ i } }_{ n } can be expressed as,
where I(x;y{u _{ i } }) is the conditional mutual information between x and y given {u _{ i } }, I(x;y{u _{ i } })=H(x{u _{ i } })+H(y{u _{ i } })−H(x,y{u _{ i } })−H({u _{ i } }).
Likewise, the maximum likelihood ${\mathcal{\mathcal{L}}}_{\mathit{\text{nv}}}\left(\mathit{\text{xy}}\right)$ of a generalized nonvstructure of base xy with upstream nodes {u _{ i } }_{ n } and z can be expressed as,
where I(x;yz,{u _{ i } })=H(xz,{u _{ i } })+H(yz,{u _{ i } })−H(x,yz, {u _{ i } })−H(z,{u _{ i } }) is the conditional mutual information between x and y given z and {u _{ i } }. Hence,
where we introduced the conditional 3point information, I(x;y;z{u _{ i } })=I(x;y{u _{ i } })−I(x;yz,{u _{ i } }).
Hence, a significantly negative conditional 3point information, I(x;y;z{u _{ i } })<0, implies that a generalized vstructure is more likely than a generalized nonvstructure given the available observational data. Conversely, a significantly positive conditional 3point information, I(x;y;z{u _{ i } })>0, implies that a generalized nonvstructure model is more likely than a generalized vstructure model.
Yet, as the conditional 3point information, I(x;y;z{u _{ i } }), is in fact invariant upon permutations between x,y and z, it cannot indicate how to orient embedded vstructures or nonvstructures over the xyz triple, as already noted in the case of isolated vstructures and nonvstructures, above.
However, the most likely base (xy, xz or yz) of the embedded vstructure or nonvstructure corresponds to the least correlated pair conditioned on {u _{ i } }, e.g., $I(x;y\left\{{u}_{i}\right\})=\underset{\mathit{\text{xyz}}}{min}\left(I(s;t\left\{{u}_{i}\right\})\right)$, as shown with the following likelihood ratios,
Note, in particular, that choosing the base with the lowest conditional mutual information, e.g., $I(x;y\left\{{u}_{i}\right\})=\underset{\mathit{\text{xyz}}}{min}\left(I(s;t\left\{{u}_{i}\right\})\right)$, is consistent with the Data Processing Inequality expected for the generalized nonvstructure of Fig. 1f–h, $I(x;y)\le min\left(I(x;z,\{{u}_{i}\left\}\right),I(z,\{{u}_{i}\};y)\right)$, as shown below for I(x;y) and I(x;z,{u _{ i } }), by subtracting I(x;y;z{u _{ i } }) on each side of the inequality I(x;y{u _{ i } })≤I(x;z{u _{ i } }), leading to,
where we have used the chain rule, I(x;z,{u _{ i } }y)=I(x;z{u _{ i } },y)+I(x;{u _{ i } }y), before adding I(x;y;z,{u _{ i } }) on each side of the inequality. The corresponding inequality holds between I(x;y) and I(z,{u _{ i } };y), implying the Data Processing Inequality.
Finite size corrections of maximum likelihood
Maximum likelihood ratios, such as Eq. 2, suggest that 1/N sets the significance level of the maximum likelihood approach, as $H(\mathcal{G},\mathcal{D})H({\mathcal{G}}^{\prime},\mathcal{D})\gg 1/N$ should imply a significant improvement of the underlying model ${\mathcal{G}}^{\prime}$ over $\mathcal{G}$. In practice, however, there are $\mathcal{O}(log(N)/N)$ corrections coming from the proper normalization of maximum likelihoods (see Appendix),
The model $\mathcal{G}$ can then be compared to the alternative model ${\mathcal{G}}_{\setminus x\to y}$ with one missing edge x→y using the maximum likelihood ratio,
where I(x;y{Pa_{ y } }_{∖}_{ x } )=H(y{Pa_{ y } }_{∖}_{ x } )−H(y{Pa_{ y } }).
Then, following the rationale of constraintbased approaches, Eq. 15 can be reformulated by replacing the parent nodes {Pa_{ y } }_{∖}_{ x } with an unknown separation set {u _{ i } } to be learnt simultaneously with the missing edge candidate xy,
where the factor ${k}_{x;y\left\right\{{u}_{i}\}}>0$ tends to limit the complexity of the models by favoring fewer edges. Namely, the condition, $I(x;y\left\{{u}_{i}\right\})<{k}_{x;y\left\right\{{u}_{i}\}}/N$, implies that simpler models compatible with the structural independency, x ⊥ ⊥ y{u _{ i } }, are more likely than model $\mathcal{G}$, given the finite available dataset. This replaces the ‘perfect’ conditional independency condition, I(x;y{u _{ i } })=0, valid in the limit of an infinite dataset, N→∞. A common complexity criteria in model selection is the Bayesian Information Criteria (BIC) or Minimal Description Length (MDL) criteria [19], [20],
where r _{ x } ,r _{ y } and ${r}_{{u}_{i}}$ are the number of levels of the corresponding variables. The MDL complexity, Eq. 18, is simply related to the normalisation constant of the distribution reached in the asymptotic limit of a large dataset N→∞ (Laplace approximation). However, this limit distribution is only reached for very large datasets in practice.
Alternatively, the normalisation of the maximum likelihood can also be done over all possible datasets including the same number of data points to yield a (universal) Normalized Maximum Likelihood (NML) criteria [21], [22] and its decomposable [23], [24] and xysymmetric version, ${k}_{x;y\left\right\{{u}_{i}\}}^{{}^{\mathit{\text{NML}}}}$, defined in the Appendix.
Then, incrementing the separation set of xy from {u _{ i } } to {u _{ i } }+z leads to the following likelihood ratio,
with I(x;y;z{u _{ i } })=I(x;y{u _{ i } })−I(x;y{u _{ i } },z) and where we introduced a 3point conditional complexity, ${k}_{x;y;z\left\right\{{u}_{i}\}}$, defined similarly as the difference between the 2point conditional complexities,
However, unlike 3point information, I(x;y;z{u _{ i } }), 3point complexities are always positive, ${k}_{x;y;z\left\right\{{u}_{i}\}}>0$, provided that there are at least two levels for each implicated node ℓ∈x,y,z,{u _{ i } }, i.e. ${r}_{\ell}\ge 2$.
Hence, we can define the shifted 2point and 3point information in Eqs. 16 & 19 for finite datasets as,
This leads to the following maximum likelihood ratios equivalent to Eqs. 11 & 12 for vstructure over nonvstructure and between alternative bases,
Hence, given a finite dataset, a significantly negative conditional 3point information, corresponding to I ^{′}(x;y;z{u _{ i } })<0, implies that a vstructure x→z←y is more likely than a nonvstructure provided that the structural independency, x ⊥ ⊥ y{u _{ i } }, is also confidently established as, I ^{′}(x;y{u _{ i } })<0. By contrast, a significantly positive conditional 3point information corresponds to I ^{′}(x;y;z{u _{ i } })>0 and implies that a nonvstructure model is more likely than a vstructure model, given the available observational data.
Probability estimate of indirect contributions to mutual information
The previous results enable us to estimate the probability of a node z to contribute to the conditional mutual information I(x;y{u _{ i } }), by combining the probability, P _{ n } v (x y z{u _{ i } }), that the triple xyz is a generalized nonvstructure conditioned on {u _{ i } } and the probability, P _{ b } (x y{u _{ i } }), that its base is xy, where,
that is, using Eqs. 23 & 24 including finite size corrections of the maximum likelihoods,
Then, various alternatives to combine P _{ n } v (x y z{u _{ i } }) and P _{ b } (x y{u _{ i } }) exist to estimate the overall probability that the additional node z indirectly contributes to I(x;y{u _{ i } }). One possibility is to choose the lower bound S _{ l } b (z;x y{u _{ i } }) of P _{ n } v (x y z{u _{ i } }) and P _{ b } (x y{u _{ i } }), since both conditions need to be fulfilled to warrant that z indeed contributes to I(x;y{u _{ i } }),
The pair of nodes xy with the most likely contribution from a third node z can then be ordered according to their rank R(x y;z{u _{ i } }) defined as,
and z can be iteratively added to the set of contributing nodes (i.e. {u _{ i } }←{u _{ i } }+z) of the top link x y=argmax_{ xy } R(x y;z{u _{ i } }) to progressively recover the most significant indirect contributions to all pairwise mutual information in a causal graph, as outlined below.
Robust inference of conditional independencies using the 3off2 scheme
The previous results can be used to provide a robust inference method to identify conditional independencies and, hence, reconstruct the skeleton of underlying causal graphs from finite available observational data. The approach follows the spirit of constraintbased methods, such as the PC or IC algorithms, but recovers conditional independencies following an evolving ranking of the network edges, R(x y;z{u _{ i } }), defined in Eq. 30.
All in all, this amounts to perform a generic decomposition for each mutual information term, I(x;y), by introducing a succession of node candidates, u _{1},u _{2},…, u _{ n } , that are likely to contribute to the overall mutual information between the pair x and y, as,
or equivalently between the shifted 2point and 3point information terms including finite size corrections (Eq. 22),
Hence, given a significant mutual information between x and y, I ^{′}(x;y)>0, we will search for possible structural independencies, i.e. I ^{′}(x;y{u _{ i } }_{ n } )<0, by iteratively “taking off” conditional 3point information terms from the initial 2point (mutual) information, I ^{′}(x;y), as
and similarly with nonshifted 2point and 3point information,
3off2 algorithm
The 3off2 scheme can be used to devise a twostep algorithm (see Algorithm 2), inspired by constraintbased approaches, to first reconstruct network skeleton (Algorithm 2, step 1) before combining orientation and propagation of edges in a single step based on likelihood ratios (Algorithm 2, step 2).
Reconstruction of network skeleton
The 3off2 scheme will first be applied to iteratively remove edges with maximum positive contributions, I ^{′}(x;y;u _{ k } {u _{ i } }_{ k }_{−1})>0, corresponding to the most likely generalized nonvstructures (Eq. 23), while minimizing simultaneously the remaining 2point information, I ^{′}(x;y{u _{ i } }_{ k } ) (Eq. 24), consistently with the data processing inequality. Such 3off2 scheme (Algorithm 2, step 1) will therefore progressively lower the conditional 2point information terms, I ^{′}(x;y)>⋯>I ^{′}(x;y{u _{ i } }_{ k }_{−1})>I ^{′}(x;y{u _{ i } }_{ k } ) and might ultimately result in the removal of the corresponding edge, xy, but only when a structural independency is actually found, i.e. I ^{′}(x;y{u _{ i } }_{ n } )<0, as in constraintbased algorithms for a given significance level α. Yet, the skeleton obtained with the 3off2 scoring approach is expected to be more robust to finite observational data than the skeleton obtained with PC or IC algorithms, as the former results only from statistically significant 3point contributions, I ^{′}(x;y;u _{ k } {u _{ i } }_{ k }_{−1})>0, based on their quantitative 3off2 ranks, R(x y;u _{ k } {u _{ i } }_{ k }_{−1}).
The best results on benchmark networks using these quantitative 3off2 ranks are obtained with the NML score (see Results and discussion Section below). The MDL score leads to equivalent results, as expected, in the limit of very large datasets (see Appendix). However, with smaller datasets, the most reliable results with the MDL score are obtained using nonshifted instead of shifted 2point and 3point information terms in the 3off2 rank of individual edges, Eq. 30. This is because the MDL complexity tends to underestimate the importance of edges between nodes with many levels (see Appendix). For finite datasets, it easily leads to spurious conditional independencies, I ^{′}(x;y{u i})<0, when using shifted 2point and 3point information, Eq. 33, whereas using nonshifted information in the 3off2 ranks (Eq. 30) tends to limit the number of false negatives as early errors in {u _{ i } } can only increase $I(x;y\left\{\mathit{\text{ui}}\right\})\ge 0$, in the end, in Eq. 34.
Orientation of network skeleton
The skeleton and the separation sets resulting from the 3off2 iteration step (Algorithm 2, step 1) can then be used to orient the edges and to propagate orientations to the unshielded triples. However, while the constraintbased methods distinguish the vstructures orientation step (Algorithm 2, step 2) from the propagation procedure (Algorithm 1, step 3), the 3off2 algorithm intertwines these two steps based on the respective likelihood scores of individual vstructures and nonvstructures (Algorithm 2, step 2).
As stated earlier, the magnitude and sign of the conditional 3point information, I(x;y;z{u _{ i } }) (or equivalently the shifted 3point information, Eq. 23), indicate if a non vstructure is more likely than a vstructure. Hence, all the unshielded triples can be ranked by the absolute value of their conditional 3point information, that is, in decreasing order of their likelihood of being either a vstructure or a nonvstructure. As detailed in the step 2 of Algorithm 2, the most likely vstructure is used to set the first orientations, following R _{0} orientation rule. The possible propagations are then performed, following R _{1} propagation rule, starting from the unshielded triple having the most positive conditional 3point information. The following most likely vstructure is considered when no further propagation is possible on unshielded triples with greater absolute 3point information. If conflicting orientations arise (such as a→b←c & b→c←d), the less likely vstructure and its possible propagations are ignored.
Note that we only implement the R _{0} and R _{1} propagation rules, which are applied in decreasing order of likelihood. In particular, we do not consider propagation rules R _{2} and R _{3} which are not associated to likelihood scores but enforce the hypothesis of acyclic constraint.
As for the 3off2 skeleton reconstruction, the orientation/propagation step of 3off2 allows for a robust discovery of orientations from finite observational data as it relies on a quantitative framework of likelihood ratios taken in decreasing order of their statistical significance. During this step, 3off2 recovers and propagates as many orientations as possible in an iterative procedure following the decreasing ranks of the unshielded triples based on the absolute value of their conditional 3point information, I ^{′}(x;y;z{u _{ i } }).
Results and discussion
Tests on benchmark graphs
We have tested the 3off2 network reconstruction approach to learn benchmark causal graphs containing 20 to 70 nodes, Figs. 2, 3, 4, 5 and 6. The results are evaluated against other methods in terms of Precision (or positive predictive value), P r e c=T P/(T P+F P), Recall or Sensitivity (true positive rate), R e c=T P/(T P+F N), as well as Fscore =2×P r e c×R e c/(P r e c+R e c) for increasing sample size N=10 to 50,000 data points.
We also define additional Precision, Recall and Fscores taking into account the edge orientations of the predicted networks against the corresponding CPDAG of the benchmark networks. This amounts to label as false positives, all true positive edges of the skeleton with different orientation/nonorientation status as the CPDAG reference, T P _{misorient}, leading to the orientationdependent definitions T P ^{′}=T P−T P _{misorient} and F P ^{′}=F P+T P _{misorient} with the corresponding CPDAG Precision, Recall and Fscores taking into account edge orientations.
The alternative inference methods used for comparison with 3off2 are the PC algorithm [12] implemented in the pcalg package [25], [26] and Bayesian inference using the hillclimbing heuristics implemented in the bnlearn package [27]. In addition, we also compare the skeleton of 3off2 to the unoriented output of Aracne [28], an informationbased inference approach, which iteratively prunes links with the weakest mutual information based on the Data Processing Inequality. We have used the Aracne implementation of the minet package [29]. For each sample size, 3off2, Aracne, PC and the Bayesian inference methods have been tested on 50 replicates. Figures 2, 3, 4, 5 and 6 give the average results over these multiple replicates when comparing the CPDAG (solid lines) of the reconstructed network (or its skeleton, dashed lined) to the CPDAG (or the skeleton) of the benchmark network.
For each method, the plots presented in Figs. 2, 3, 4, 5 and 6 are those obtained for the parameters that give overall the best results over the five reconstructed benchmark networks (see Additional file 1, Figures S1S20). In particular, we used the stable implementation of the PC algorithm, as well as the majority rule for the orientation and propagation steps [14]. PC’s results are shown on Figs. 2, 3, 4, 5 and 6 for α=0.1. Decreasing α tends to improve the skeleton Precision at the expense of the skeleton Recall, leading in fact to worse skeleton Fscores for finite datasets, e.g. N≤1000 (see Additional file 1, Figures S1S5). The same trend is observed for CPDAG Fscores taking into account edge orientations, with best CPDAG scores at small sample sizes, obtained for larger α, e.g. N≤1000. Aracne threshold parameters for minimum difference in mutual information is set to ε=0, as small positive values typically worsen Fscores (see Additional file 1, Figures S6S10). Bayesian inference are obtained using BIC/MDL scores and hillclimbing heuristics with 100 random restarts [9] (see Additional file 1, Figures S11S15). Finally, the best 3off2 network reconstructions are obtained using NML scores with shifted 2point and 3point information terms in the rank of individual edges, see Methods. Using MDL scores, instead, leads to equivalent results, as expected, in the limit of very large datasets (see Appendix). However, with smaller datasets, the most reliable results with MDL scores are obtained using nonshifted instead of shifted 2point and 3point information terms in the 3off2 rank of individual edges, as discussed in Methods (see Additional file 1, Figures S16S20).
All in all, we found that the 3off2 inference approach typically reaches better or equivalent Fscores for all dataset sizes as compared to all other tested methods, i.e. Aracne, PC and Bayesian inference, as well as the MaxMin HillClimbing (MMHC) hybrid method [30] (see Additional file 1, Figures S21S25). This is clearly observed both on the skeletons (Figs. 2, 3, 4, 5 and 6 dashed lines) and even more clearly when taking the predictions of orientations into account (Figures 2, 3, 4, 5 and 6 solid lines).
Applications to the hematopoiesis regulation network
The reconstruction or reverseengineering of real regulatory networks from actual expression data has already been performed on a number of biological systems (see e.g. [28], [31]–[33]). Here, we apply the 3off2 approach on a real biological dataset related to hematopoiesis. Transcription factors play a central role in hematopoiesis, from which derive the blood cell lineages. As suggested in previous studies, changes in the regulatory interactions among transcription factors [34] or their overexpression [35] might be involved in the development of Tacute lymphoblastic leukaemia (TALL). The key role of the hematopoiesis and the potentially serious consequences of its disregulations emphasize the need to accurately establish the complex interactions between the transcription factors involved in this critical biological process.
The dataset we have used for this analysis [36] consists of the single cell expressions of 18 transcription factors, known for their role in hematopoiesis. Five hundred ninety seven single cells representing 5 different types of hematopoietic progenitors have been included in the analysis (N=597). We reconstructed the corresponding network with the 3off2 inference method, Fig. 7, and four other available approaches, namely, PC [12] implemented in the pcalg package [25], [26], Bayesian inference using hillclimbing heuristics as well as the MaxMin HillClimbing (MMHC) hybrid method [30], both implemented in the bnlearn package [27], and, finally, Aracne [28] implemented in the minet package [29] (Table 1 and Additional file 1: Table S1).
3off2 uncovers all 11 interactions for which specific experimental evidence has been reported in the literature (Fig. 7, red links: known activations; blue links: known repressions) as well as 30 additional links (Fig. 7, grey links: unknown regulatory interactions). By contrast, randomization of the actual data across samples for each TF leads to only 5.25 spurious interactions on average between the 18 TFs, instead of the 41 inferred edges from the actual data, and 1.62 spurious interactions on average, instead of the 16 interactions predicted among the 10 TFs involved in known regulatory interactions, Fig. 7. This suggests that around 10–13 % of the predicted edges might be spurious, due to inevitable sampling noise in the finite dataset. In particular, the 3off2 inference approach successfully recovers the relationships of the regulatory triad between Gata2, Gfi1b and Gfi1 as described in [36] and reports correct orientations for the edges involving Gata2 (Gfi1b and Gfi1 crossregulate in fact one another [36], Table 1). The network reconstructed by 3off2 also correctly infers the regulations of PU.1 by Gfi1 [37], Gfi1 by Lyl1 [38], Meis1 by Ldb1 [39], and the regulations of Lyl1 by Ldb1 [39] and Erg [40]. Finally, the interactions (Gata2 −SCL) [40], (Gfi1b −Meis1) [41] and (Gata1 −Gata2) [42] are correctly inferred, however, with opposite directions as reported in the literature. Yet, overall 3off2 outperforms most of the other methods tested for the reconstruction of the hematopoietic regulatory subnetwork (Table 1 and Additional file 1: Table S1). Only the Bayesian hillclimbing method using a BDe score leads to comparable results by retrieving 10 out of 11 interactions and correctly orienting 8 of them. These encouraging results from the 3off2 reconstruction method on experimentally proven regulatory interactions (red edges in Fig. 7) could motivate further investigations on novel regulatory interactions awaiting to be tested for their possible role in hematopoiesis (e.g. grey edges in Fig. 7).
Conclusions
In this paper, we propose to improve constraintbased network reconstruction methods by identifying structural independencies through a robust quantitative scorebased scheme limiting the accumulation of early FN errors and subsequent FP compensatory errors. In brief, 3off2 relies on information theoretic scores to progressively uncover the best supported conditional independencies, by iteratively “taking off” the most likely indirect contributions of conditional 3point information from every 2point (mutual) information of the causal graph.
Earlier hybrid methods have also attempted to improve network reconstruction by combining the concepts of constraintbased approaches with the robustness of Bayesian scores [30], [43]–[45]. In particular [43], have proposed to exploit an intrinsic weakness of the PC algorithm, its sensitivity to the order in which conditional independencies are tested on finite data, to rank these different orderdependent PC predictions with Bayesian scores. More recently [30], have also combined constraintbased and Bayesian approaches by first identifying both parents and children of each node of the underlying graphical model and then performing a greedy Bayesian hillclimbing search restricted to the identified parents and children of each node. This MaxMin HillClimbing (MMHC) approach tends to have a high precision in terms of skeleton but a more limited sensibility, leading overall to lower skeleton and CPDAG Fscores than 3off2 and Bayesian hill climbing methods on the same benchmark networks, Figures S21S25. Interestingly, however, the MMHC approach is among the fastest network reconstruction approaches, Figure S26, allowing for scalability to large network sizes [30].
The 3off2 algorithm is expected to run in polynomial time on typical sparse causal networks with low indegree, just like constraintbased algorithms.However, in practice and despite the additional computation of conditional 2point and 3point information terms, we found that the 3off2 algorithm runs typically faster than constraintbased algorithms for large enough samples, by avoiding the cascading accumulation of errors that inflate the combinatorial search of conditional independencies in traditional constraintbased approaches. Instead, we found that 3off2 running time displays a similar trend as Bayesian hillclimbing heuristic methods, Figs. 2, 3, 4, 5 and 6.
All in all, the main computational bottleneck of the present 3off2 scheme pertains to the identification of the best contributing nodes at each iteration. In the future, it could be interesting to investigate whether a more stochastic version of this 3off2 method, based on choosing one significant conditional 3point information instead of the best one, might simultaneously accelerate the network reconstruction and circumvent possible locally trapped suboptimal predictions through stochastic resampling.
Finally, another perspective for practical applications will be to include the possibility of latent variables and bidirected edges in reconstructed networks.
Appendix
Complexity of graphical models
The complexity ${k}_{\mathcal{G},\mathcal{D}}$ of a graphical model is related to the normalization constant $Z(\mathcal{G},\mathcal{D})$ of its maximum likelihood as ${k}_{\mathcal{G},\mathcal{D}}=logZ(\mathcal{G},\mathcal{D})$,
For Bayesian networks with decomposable entropy, i.e. $H(\mathcal{G},\mathcal{D})=\sum _{i}H\left({x}_{i}\right\left\{{\text{Pa}}_{{x}_{i}}\right\})$, it is convenient to use decomposable complexities, ${k}_{\mathcal{G},\mathcal{D}}=\sum _{i}{k}_{{x}_{i}\left\{{\text{Pa}}_{{x}_{i}}\right\}}$,
such that the comparison between alternative models $\mathcal{G}$ and ${\mathcal{G}}_{\setminus x\to y}$ (i.e. $\mathcal{G}$ with one missing edge x→y) leads to a simple local increment of the score,
A common complexity criteria in model selection is the Bayesian Information Criteria (BIC) or Minimal Description Length (MDL) criteria [19], [20],
where r _{ x } ,r _{ y } and r _{ j } are the number of levels of each variable, x, y and j. The MDL complexity, Eq. 40, is simply related to the normalisation constant reached in the asymptotic limit of a large dataset N→∞ (Laplace approximation). The MDL complexity can also be derived from the Stirling approximation on the Bayesian measure [46], [47]. Yet, in practice, this limit distribution is only reached for very large datasets, as some of the leastlikely $({r}_{y}1)\prod _{j}{r}_{j}$ combinations of states of variables are in fact rarely (if ever) sampled in typical finite datasets. As a result, the MDL complexity criteria tends to underestimate the relevance of edges connecting variables with many levels, r _{ i } , leading to the removal of false negative edges.
To avoid such biases with finite datasets, the normalisation of the maximum likelihood can be done over all possible datasets with the same number N of data points. This corresponds to the (universal) Normalized Maximum Likelihood (NML) criteria [21]–[24],
We introduce here the factorized version of the NML criteria [23], [24] which corresponds to a decomposable NML score, ${k}_{\mathcal{G},\mathcal{D}}^{{}^{\mathit{\text{NML}}}}=\sum _{{x}_{i}}{k}_{{x}_{i}\left\right\{{\text{Pa}}_{{x}_{i}}\}}^{{}^{\mathit{\text{NML}}}}$, defined as,
where N _{ yj } is the number of data points corresponding to the jth state of the parents of y, {Pa_{ y } }, and ${N}_{y{j}^{\prime}}$ the number of data points corresponding to the j ^{′}th state of the parents of y, excluding x, {Pa_{ y } }_{∖}_{ x } . Hence, the factorized NML score for each node x _{ i } corresponds to a separate normalisation for each state j=1,…,q _{ i } of its parents and involving exactly N _{ ij } data points of the finite dataset,
where N _{ ijk } corresponds to the number of data points for which the ith node is in its kth state and its parents in their jth state, with ${N}_{\mathit{\text{ij}}}=\sum _{k}^{{r}_{i}}{N}_{\mathit{\text{ijk}}}$. The universal normalization constant ${\mathcal{C}}_{n}^{r}$ is then obtained by averaging over all possible partitions of the n data points into a maximum of r subsets, ℓ _{1}+ℓ _{2}+⋯+ℓ _{ r } =n with ${\ell}_{k}\ge 0$,
which can in fact be computed in lineartime using the following recursion [23],
with ${\mathcal{C}}_{0}^{r}=1$ for all r, ${\mathcal{C}}_{n}^{1}=1$ for all n and applying the general formula Eq. 48 for r=2,
or its Szpankowski approximation for large n (needed for n>1000 in practice) [48]–[50],
Then, following the rationale of constraintbased approaches, we can reformulate the likelihood ratio of Eq. 37 by replacing the parent nodes {Pa_{ y } }_{∖}_{ x } in the conditional mutual information, I(x;y{Pa_{ y } }_{∖}_{ x } ), with an unknown separation set {u _{ i } } to be learnt simultaneously with the missing edge candidate xy,
where we have also transformed the asymmetric parentdependent complexity difference, $\Delta {k}_{y{\left\{{\text{Pa}}_{y}\right\}}_{\setminus x}}$, into a {u _{ i } }dependent complexity term, ${k}_{x;y\left\right\{{u}_{i}\}}$, with the same xysymmetry as I(x;y{u _{ i } }),
Note, in particular, that the MDL complexity term in Eq. 54 is readily obtained from Eq. 41 due to the Markov equivalence of the MDL score, corresponding to its xysymmetry whenever {Pa_{ y } }_{∖}_{ x } ={Pa_{ x } }_{∖}_{ y } . By contrast, the factorized NML score, Eq. 43, is not a Markovequivalent score (although its nonfactorized version, Eq. 42, is Markov equivalent by definition). To circumvent this nonequivalence of factorized NML score, we propose to recover the expected xysymmetry of ${k}_{x;y\left\right\{{u}_{i}\}}^{{}^{\mathit{\text{NML}}}}$ through the simple xysymmetrization of Eq. 44, leading to Eq. 55.
Publication costs
Publication costs for this article were funded by the Région IledeFrance.
Declarations
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 2, 2016: Bringing Maths to Life (BMTL). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements.
Additional file
References
 1.
Cooper GF, Herskovits E: A bayesian method for the induction of probabilistic networks from data. Mach Learn. 1992, 9 (4): 30947.
 2.
Heckerman D, Geiger D, Chickering DM: Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Mach Learn. 1995, 20 (3): 197243.
 3.
Spirtes P, Glymour C, Scheines R: Causation, Prediction, and Search. 2000, MIT press, Cambridge, MA
 4.
Pearl J. Causality: Models, Reasoning and Inference, 2nd edn: Cambridge University Press; 2009.
 5.
Chickering DM: Learning equivalence classes of bayesiannetwork structures. J Mach Learn Res. 2002, 2: 44598.
 6.
Friedman N, Koller D: Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Mach Learn. 2003, 50 (1–2): 95125. 10.1023/A:1020249912095.
 7.
Koivisto M, Sood K: Exact bayesian structure discovery in bayesian networks. J Mach Learn Res. 2004, 5: 54973.
 8.
Silander T, Myllymaki P: A simple approach for finding the globally optimal bayesian network structure. Proceedings of the TwentySecond Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI06). 2006, AUAI Press, Arlington, Virginia
 9.
Chickering DM, Geiger D, Heckerman D. Learning Bayesian networks: Search methods and experimental results. In: Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics: 1995. p. 112–28.
 10.
Bouckaert RR: Properties of bayesian belief network learning algorithms. Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence. UAI’94. 1994, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
 11.
Friedman N, Nachman I, Pe’er D: Learning bayesian network structure from massive datasets: The “sparse candidate”; algorithm. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. UAI’99. 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
 12.
Spirtes P, Glymour C: An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev. 1991, 9: 6272. 10.1177/089443939100900106.
 13.
Pearl J, Verma T: A theory of inferred causation. In Knowledge Representation and Reasoning: Proc. of the Second Int. Conf. 1991, Morgan Kaufmann, San Mateo, CA
 14.
Colombo D, Maathuis MH: Orderindependent constraintbased causal structure learning. J Mach Learn Res. 2014, 15: 3741782.
 15.
Meek C: Causal inference and causal explanation with background knowledge. Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU. 1995, Morgan Kaufmann, San Francisco, CA
 16.
Sanov IN: On the probability of large deviations of random variables. Mat Sbornik. 1957, 42: 1144.
 17.
McGill WJ: Multivariate information transmission. Trans IRE Prof Group on Inf Theory (TIT). 1954, 4: 93111. 10.1109/TIT.1954.1057469.
 18.
Han TS: Multiple mutual informations and multiple interactions in frequency data. Inf Control. 1980, 46 (1): 2645. 10.1016/S00199958(80)904787.
 19.
Rissanen J: Modeling by shortest data description. Automatica. 1978, 14: 46571. 10.1016/00051098(78)900055.
 20.
Hansen MH, Yu B: Model selection and the principle of minimum description length. J Am Stat Ass. 2001, 96: 74674. 10.1198/016214501753168398.
 21.
Shtarkov YM: Universal sequential coding of single messages. Probl Inf Transm (Translated from). 1987, 23 (3): 317.
 22.
Rissanen J, Tabus I: Kolmogorov’s structure function in mdl theory and lossy data compression. Adv. Min. Descrip. Length Theory Appl. 2005, MIT Press, Cambridge, MA
 23.
Kontkanen P, Myllymäki P: A lineartime algorithm for computing the multinomial stochastic complexity. Inf Process Lett. 2007, 103 (6): 22733. 10.1016/j.ipl.2007.04.003.
 24.
Roos T, Silander T, Kontkanen P, Myllymäki P. Bayesian network structure learning using factorized nml universal models. In: Proc. 2008 Information Theory and Applications Workshop (ITA2008). IEEE Press: 2008.
 25.
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P: Causal inference using graphical models with the r package pcalg. J Stat Soft. 2012, 47 (11): 126. 10.18637/jss.v047.i11.
 26.
Kalisch M, Bühlmann P: Robustification of the pcalgorithm for directed acyclic graphs. J Comput Graph Stat. 2008, 17 (4): 77389. 10.1198/106186008X381927.
 27.
Scutari M: Learning Bayesian Networks with the bnlearn R Package. J Stat Soft. 2010, 35 (3): 122. 10.18637/jss.v035.i03.
 28.
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, et al: ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinforma. 2006, 7 (Suppl 1): 710.1186/147121057S1S7.
 29.
Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinforma. 2008, 9: 46110.1186/147121059461.
 30.
Tsamardinos I, Brown LE, Aliferis CF: The MaxMin HillClimbing Bayesian Network Structure Learning Algorithm. Mach Learn. 2006, 65 (1): 3178. 10.1007/s1099400668897.
 31.
Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP: Causal proteinsignaling networks derived from multiparameter singlecell data. Science. 2005, 308 (5721): 52310.1126/science.1105809.
 32.
Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol. 2007, 3: 7810.1038/msb4100120.
 33.
Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, et al: A yeast synthetic network for in vivo assessment of reverseengineering and modeling approaches. Cell. 2009, 137 (1): 17281. 10.1016/j.cell.2009.01.055.
 34.
Oram SH, Thoms JAI, Pridans C, Janes ME, Kinston SJ, Anand S, et al: A previously unrecognized promoter of lmo2 forms part of a transcriptional regulatory circuit mediating lmo2 expression in a subset of tacute lymphoblastic leukaemia patients. Oncogene. 2010, 29: 57965808. 10.1038/onc.2010.320.
 35.
Cleveland S, Smith S, Tripathi R, Mathias E, Goodings C, Elliott N, et al: Lmo2 induces hematopoietic stem cell like features in tcell progenitor cells prior to leukemia. Stem Cells. 2013, 31 (4): 88294. 10.1002/stem.1345.
 36.
Moignard V, Macaulay I, Swiers G, Buettner F, Schütte J, CaleroNieto F, et al: Characterization of transcriptional networks in blood stem and progenitor cells using highthroughput singlecell gene expression analysis. Nat Cell Biol. 2013, 15: 36372. 10.1038/ncb2709.
 37.
Spooner CJ, Cheng JX, Pujadas E, Laslo P, Singh H: A recurrent network involving the transcription factors pu.1 and gfi1 orchestrates innate and adaptive immune cell fates. Immunity. 2009, 31 (4): 57686. 10.1016/j.immuni.2009.07.011.
 38.
Zohren F, Souroullas G, Luo M, Gerdemann U, Imperato M, et al: The transcription factor lyl1 regulates lymphoid specification and the maintenance of early t lineage progenitors. Nat Immunol. 2012, 13 (8): 7619. 10.1038/ni.2365.
 39.
Li L, Jothi R, Cui K, Lee J, Cohen T, M. Gorivodsky IT, et al: Nuclear adaptor ldb1 regulates a transcriptional program essential for the maintenance of hematopoietic stem cells. Nat Immunol. 2011, 12: 129136. 10.1038/ni.1978.
 40.
Chan WYI, Follows GA, Lacaud G, Pimanda JE, Landry JR, Kinston S, et al: The paralogous hematopoietic regulators lyl1 and scl are coregulated by ets and gata factors, but lyl1 cannot rescue the early scl–/– phenotype. Blood. 2006, 109 (5): 19081916. 10.1182/blood200605023226.
 41.
Chowdhury AH, Ramroop JR, Upadhyay G, Sengupta A, Andrzejczyk A, Saleque S: Differential transcriptional regulation of meis1 by gfi1b and its cofactors lsd1 and corest. PLoS ONE. 2013, 8 (1): 53666
 42.
Göttgens B, Nastos A, Kinston S, Piltz S, Delabesse ECM, Stanley M, et al: Establishing the transcriptional programme for blood: the scl stem cell enhancer is regulated by a multiprotein complex containing ets and gata factors. The EMBO J. 2002, 21 (12): 3039050.
 43.
Dash D, Druzdzel MJ: A hybrid anytime algorithm for the construction of causal models from sparse data. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence. 1999, Morgan Kaufmann, San Francisco, CA
 44.
Cano A, GomezOlmedo M, Moral S. A score based ranking of the edges for the pc algorithm. In: Proceedings of the European Workshop on Probabilistic Graphical Models (PGM): 2008. p. 41–8.
 45.
Claassen T, Heskes T: A bayesian approach to constraint based causal inference. In Proc. of the 28th Conference on Uncertainty in Artificial Intelligence (UAI). 2012, Morgan Kaufmann, Burlington, MA
 46.
Schwarz G: Estimating the dimension of a model. Ann Stat. 1978, 6: 4614. 10.1214/aos/1176344136.
 47.
Bouckaert RR: Probabilistic network construction using the minimum description length principle. Symbolic and Quantitative Approaches to Reasoning and Uncertainty (Clarke M, Kruse R, Moral S, eds). 1993, Springer, Berlin, Germany
 48.
Szpankowski W: Average Case Analysis of Algorithms on Sequences. 2001, John Wiley & Sons, New York, NY
 49.
Kontkanen P, Buntine W, Myllymäki P, Rissanen J, Tirri H. Efficient computation of stochastic complexity In: C. Bishop, B. Frey, editors. Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics: 2003. p. 233–8.
 50.
Kontkanen P. Computationally efficient methods for mdloptimal density estimation and data clustering. 2009. PhD thesis. Helsinki University Print. Finland.
Acknowledgements
S.A. acknowledges a PhD fellowship from the Ministry of Higher Education and Research and support from Fondation ARC pour la recherche sur le cancer. L.V. acknowledges a PhD fellowship from the Région IledeFrance (DIM Institut des Systèmes Complexes) and H.I. acknowledges funding from CNRS, Institut Curie, Foundation PierreGilles de Gennes and Région IledeFrance.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
SA, LV and HI conceived and performed the research. SA, LV and HI wrote the manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Rights and permissions
About this article
Published
DOI
Keywords
 Network reconstruction
 Hybrid inference method
 Information theory
 Hematopoiesis