BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference

Background Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6 % to 11 %). On a real Escherichia coli compendium, an improvement of 11.8 % compared to CLR and 3 % compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html. Conclusions BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the art GRN inference methods. It is applicable as a generic network inference post-processing, due to its computational efficiency. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0754-2) contains supplementary material, which is available to authorized users.


Choice of the lambda parameter
To study the impact of the parameter λ TF on the results, we display in Figure 1 Precision-Recall curves using the different possible combinations of λ TF and λ TF on the DREAM4 dataset.
More specifically, we used the GENIE3 weights as an input to BRANE Cut. Variables µ and γ are set to 3 and the (G − 1) th quantile of the normalized weights, respectively. The Precision-Recall curve is computed by varying the λ TF parameter in a limited interval [0, 0.1] to be able to plot a full Precision-Recall curve. For each λ TF , we vary the λ TF parameter with a linear sampling between the actual λ TF value and 1.
The results in terms of Area Under the Precision-Recall curves are provided in Table 1 Many different choices of λ TF ≤ λ TF are possible These results show that such an extended choice for λ TF is not critical to improve the performance over GENIE3. This validates our choice to reduce the initial number of thresholds to only one, similarly to the compared methods, as explained in the paper.

Performance comparison of various methods on DREAM4
In this section, we present detailed results obtained using BRANE Cut with CLR, GENIE3, ND-CLR or ND-GENIE3 weights.
Figures 2 to 6 show the resulting Precision-Recall curves. They were obtained by varying only the λ TF threshold as specified in the paper.

Contribution of each term of the Graph cut functional
To determine the influence of the different biological a priori introduced in our model (as expressed by Equation (2)), we present three Precision-Recall curves obtained using the E. coli dataset described in the paper. The first PR curve is obtained without any a priori, the second one using the first a priori, and the third one using all of them. The curves are displayed in Figure 7. The employed weights ω are those computed with CLR using the default parameters. We set λ TF = βλ TF , β ≥ 1 and incrementally assess the effectiveness of both β and µ choices.
The first curve is obtained using β = 1, and µ = 0. These parameters make our method (thus, without a priori) equivalent to CLR, with λ = λ TF acting as a unique threshold on ω weights. The second curve is obtained with µ = 0 and β = |V| |T | , as specified in the paper The third curve is obtained by taking full advantage of the capabilities offered by our model: β = |V| |T | and µ = 1000. The corresponding AUPR (Area Under the Precision-Recall curve) are given in Table 2. CLR BC-CLR µ = 0 BC-CLR µ = 0 AUPR 0.0786 0.0870 0.0879 Gain vs CLR -10.7 % 11.8 % Table 2: AUPR for BRANE Cut used with different parameters µ and λ: µ = 0 and λ TF = λ TF (equivalent to CLR), µ = 0 and λ TF ≥ λ TF (equivalent to treat the problem without the co-regulation property), and µ > 0 and λ TF ≥ λ TF (all the a priori are taken into account).
Following the procedure presented in Figure 4 of the paper, we also compare the AUPR computed for different parts of the whole PR curves. For each range of Precision values, the relative improvement is computed as: AUPR(BC-CLR)/AUPR(CLR) for BC-CLR with µ = 0 (dotted purple line) and BC-CLR with µ = 0 (solid purple line), see Figure 8.  Weights ω are employed to compute co-regulation probabilities ρ i,j,j ′ . Different ω distributions lead to different sets of non-zero co-regulation probabilities. Consequently, they impact the optimal choice for µ. This is observed in the different µ values chosen for the tested networks. For practically useful inference, we consider important to obtain a simple estimation of µ for a given network. It should also be of low sensitivity. For a given set of weights, we denote by C r the number of identified couples of genes (j, j ′ ) ∈ T 2 co-regulating at least one gene. The total number of co-regulator couples, denoted by C t , is equal to . We experimentally observe that an accurate order of magnitude close to the optimal µ is given by the cardinality-based ratio: This heuristic is consistent with the biological view point, where a small proportion of co-regulator couples is expected. Results on DREAM4 using the proposed heuristic for the µ parameter are given in Table 3 and are consistent with whose presented in the article. In addition, the choice of this parameter allows us to obtain an AUPR of 0.0917 (resp. 0.0873) for BRANE Cut initialized with GENIE3 (resp. CLR) weights on the E. coli dataset.

Effects of the µ parameter
From the above observation, we analyze the impact of this parameter on the results in terms of AUPR. We perform this analysis on the DREAM4 dataset with the CLR and the GENIE3 weights as initial weights. We compute the mean and median AUPR, as well as the corresponding deviation measures (standard deviation and median absolute deviation), over 50 values between 0.1µ and 10µ, where µ is firstly computed using the proposed heuristic. Results in Table 4 show a low variability in the impact of parameter µ. We observe a difference regarding the standard and the median deviation, however the highest observed variability remains acceptable. 5.8e-4 5.4e-4 1.86e-4 3.06e-4 7.2e-4 (b) Mean (resp. median) and standard deviation (resp. median absolute deviation) on the AUPR for various µ values with BRANE Cut initialized with GENIE3.

Additional validation of heuristics on DREAM5
We evaluate BRANE Cut using both CLR and GENIE3 weights as input on the three networks of DREAM5 for which a ground truth is available. All the BRANE Cut parameters (β, γ, µ) are chosen using the proposed heuristics. The evaluation is performed using the same procedure as described in the paper. AUPR is reported in table 5.
For the first network, we observe an improvement over seven and five percent respectively for CLR and GENIE3. On the third network, the improvement over CLR and GENIE3 reaches 2.8 % and 2.1 %. Regarding the fourth network, the AUPR computed with every method is exceptionally low. As such, the relative AUPR differences are insignificant, within the numerical precision.
Results show that with the proposed heuristics, BRANE Cut significantly outperforms CLR and GENIE3 on the first network, and equals performance achieved by CLR and GENIE3 on the third and fourth networks.