Skip to main content

EnsInfer: a simple ensemble approach to network inference outperforms any single method


This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.

Peer Review reports


Network inference

A gene regulatory network (GRN) consists of molecular regulators (including DNA segments, messenger RNAs, and transcription factors) in a cell and the causal links between regulators and gene targets. Causality here means that the regulator influences the RNA expression of the gene target. Network inference is the problem of identifying such causal links. In machine learning terms, since the set of regulator genes and target genes are given, the network inference problem can be viewed as a binary classification task to determine whether or not a potential regulatory edge between any pair of regulator and target gene exists.

Because network inference facilitates the understanding of the biological systems at the level of molecular interactions, it potentially enables the designed repression or enhancement of groups of molecules. This has applications ranging from drug design and medical treatment to the reduced use of fertilizer in agriculture. Accurate network inference and functional validation is an ongoing challenge for systems biology. Over the past two decade, numerous gene regulatory network inference technologies have been proposed to tackle this problem from different perspectives [1,2,3,4,5,6].

Individual methods feed an ensemble method

Pratapa et al. [7] presented a framework called BEELINE to evaluate state-of-the-art network inference algorithms. The vast majority of the inference algorithms including the ones we are going to incorporate into the ensemble approach can be roughly categorized into three types:

  1. 1.

    Pairwise correlation models make use of various kinds of correlations between a target gene’s expression and potentially causal transcription factor expressions. PPCOR [8] computes the partial and semi-partial correlation coefficients for each gene pair. LEAP [9] calculates the Pearson coefficient of each gene pair in a time-series background that considers a time-delay in regulatory response. PIDC [10] looks at the distribution of the gene expression and calculates the gene pair-wise mutual information between distributions. SCRIBE [11] also looks at mutual information between gene expression distributions, and, like LEAP, considers time-lagged correlation in time-series data. Finally, there is correlation on any set of steady-state data.

  2. 2.

    Tree-based models use random forests (or their close variants) to predict the gene expression of each target gene based on the expression of regulator genes (transcription factors). Such models then use feature importance to determine the weight of each regulator-target interaction. High weights correspond to regulatory edges. Examples include GENIE3 [2], a faster alternative GRNBoost2 [12], and the inference method OutPredict [13]. OutPredict also takes prior information (e.g., binding data) into account during training and test.

  3. 3.

    Ordinary differential equation (ODE)-based regression approaches model the target gene expression as a dependent on the time derivative of the expression of regulatory genes. Inferelator [1] is a regularized regression model that focuses on feature selection. Its latest iteration, Inferelator 3.0 [14], makes use of single cell data to learn regulatory networks. SCODE [3] is a direct application of fast ODE-based regression. SINCERITIES [15] utilizes Kolmogorov-Smirnov test-based ridge regression. GRISLI [16] is an ODE solver that accounts for gene expression velocity.

The BEELINE benchmark of 12 different inference algorithms showed that while some algorithms generally perform better than others, there is no definitive best solution that can be applied to all datasets. Our approach complements theirs: in addition to studying the performance of individual algorithms (including some promising ones that they did not study), we show that an ensemble method that we call EnsInfer can obtain as good or better results than any single method and improves upon previous ensemble methods [17,18,19]. In vision and language applications, some work, such as [20, 21], uses clustering based ensemble on large data to create balanced sets which are then sent to distinct learners. In addition to showing the benefits of combining multiple inference methods, our pipeline also provides a practical combination strategy.

Materials and methods

Underlying network inference algorithms

Here we introduce the inference algorithms we used in this ensemble workflow. Our workflow and the open-source code we provide allows the easy incorporation of new inference algorithms.

Experimental setup: the data

All level 1 network inference algorithms take gene expression level data as input, there are two main sources for these data: synthetic data generated by simulation software with a given regulatory network or transcriptome-wide RNA sequencing (RNA-seq) data from living organisms. These data can be measured in a temporal manner to constitute time-series data or measured in temporally unrelated discrete states to constitute steady-state data. RNA-seq data can also be classified into two categories: bulk RNA-seq data which is obtained using all cells inside a sample tissue and single cell RNA-seq data which examines the transcriptome information of a single cell [22]. Details about the gene expression datasets used in our experiments are listed below:

  1. 1.

    Synthetic data from the DREAM3 and DREAM4 in silico challenges consists of ten datasets each with 100 genes and varying regulatory network structures [23, 24]. The gene expression data was generated by GeneNetWeaver, the software which provided data for the DREAM3 and DREAM4 challenges. Simulation settings were kept as the default DREAM4 challenge settings except that we generated five different time intervals between data points: 10 min, 20 min, 25 min, 50 min (default value), and 100 min. The benefit of using this synthetic data is that the underlying network is precisely known by construction.

  2. 2.

    Bacterial experimental RNA-seq data from B. subtilis (bulk RNA) containing 4218 genes and 239 TFs. The training and testing sets came from a network consisting of 154 TFs and 3144 regulatory edges [25].

  3. 3.

    Plant experimental RNA-seq data (bulk RNA, time-series) from Arabidopsis shoot tissue consisting of 2286 genes and 263 transcription factors (TFs). Both the training and testing sets came from a network consisting of 29 TFs and 4247 regulatory edges [26].

  4. 4.

    Mouse Embryonic Stem Cell (mESC) experimental single-cell RNA-seq data. containing 500 genes and 47 TFs. The training and testing sets came from a functional interaction network consisting of 47 TFs and 3226 regulatory edges [27].

  5. 5.

    Human Embryonic Stem Cell (hESC) experimental single-cell RNA-seq data containing 1115 genes and 130 TFs. The training and testing sets came from a ChiP-Seq network consisting of 130 TFs and 3144 regulatory edges [28].

In this work, we have focused on either temporal time-series bulk RNA-seq or single cell RNA-seq data for which pseudo-time information is available. One reason is that some of the inference algorithms in the BEELINE framework require temporal information input. The other is the well known epistemiological reason: steady state data gives simultaneous correlation information, but does not clarify the causal relationship. By contrast, because causation moves forward in time, time series datasets are more useful for causal network inference.

Ensemble approach

Because one single inference method may not (and, in fact, does not) suit all scenarios, we propose EnsInfer, an ensemble approach to the network inference problem: each individual network inference method will work as a first level learning algorithm that gives a set of predictions from the gene expression input. Then we train a second-level ensemble learning algorithm that combines results from those first level learners. As first level inference methods are all different from each other, this forms a heterogeneous stacking ensemble process [29, 30]. The end goal is the binary classification task of determining whether or not a potential regulating edge from transcription factor gene to target gene exists.

Thus, base network inference methods such as GENIE3 or Inferelator will work as Level 1 inference methods and individually predict whether some transcription factor TF regulates some target gene g by giving each possible edge a confidence score. The resulting edge predictions of all the level 1 inference methods can be fed into the second level ensemble learner. Previous ensemble approaches include a voting method [17, 18], but other approaches have been used for other applications: a random forest classifier, or a Naive Bayesian classifier. The pipeline is shown in Fig. 1.

Fig. 1
figure 1

A diagram showing how EnsInfer works (i) All level 1 network inference algorithms are executed using time-series expression data. (ii) Every level 1 inference method assigns confidence values to all possible edges in the network. All the outputs are then curated into a tabular form with each algorithm’s prediction as a feature column. (iii) The outputs of the level 1 inference methods are then used as input data for the level 2 ensemble model, which makes predictions of regulatory edges

Each level 1 inference method infers regulation based on all the given gene expression data. By contrast, the ensemble learner takes a training set consisting of a randomly chosen subset of regulators from gold standard (normally, experimentally verified present/absent) edges and creates a model whose input is the confidence score output of each level 1 inference method and whose output is a prediction about whether each potential edge regulates or not. One thing to note is that, for the sake of consistency across different methods, we use the confidence scores on all regulatory edges of each level 1 inference method not just the highly confident edges. This benefits the level 2 ensemble efforts, because all information inferred from level 1 methods is preserved for level 2 models.

The ensemble method uses this model and the outputs of the level 1 inference methods to predict for each transcription factor in the test set, whether a given possible edge leaving that transcription factor corresponds to a true regulatory relation. This process translates well to real world applications, where EnsInfer learns from the known regulatory relations within an organism or tissue structure, and makes predictions for untested transcription factors.

We evaluated eight different models to function as level 2 ensemble models using synthetic data. Those models include: voting [17], logistic regression, logistic regression with stochastic gradient descent (SGD), Naive Bayes with a Gaussian kernel, support vector machines, k-nearest neighbors, random forest, adaptive boost trees, and XGBoost [31]. All models except XGBoost are provided by the scikit-learn python package [32]. We used a separate DREAM4 dataset with 100 genes to perform hyper-parameter tuning for all level 2 ensemble models. For each of the tunable ensemble models, a discrete set of hyper-parameter combinations spanned by the common selections of core model parameters were cross-validated on this DREAM4 dataset For each method, the best performing hyper-parameter combination was used for the later level 2 comparison experiments. Details about the hyper-parameter grid search and resulting best parameter settings for each model can be found in Additional file 1: Table S1.

We compare the area under precision-recall curve on the test data of the ensemble learner against that of the level 1 inference methods that have access to the same training data.

Algorithmic workflow of the ensemble approach

All underlying inference algorithms were executed through the BEELINE framework proposed by [7] to which we added OutPredict and Inferelator which weren’t included in the original BEELINE package.

The confidence scores of the underlying algorithms for each potential edge in the regulatory network became inputs to the level 2 ensemble model, as illustrated in Fig. 1. To compare the performance of different inference methods, we use Area Under the Precision-Recall Curve (AUPRC) as the primary metric in all experiments. The reason for choosing AUPRC is that experimentalists can choose a high confidence cutoff to identify the most likely causal transcription factors for a given target gene. A comprehensive summary of the results can be found in Tables 1 and 2 for experiments on the DREAM in silico datasets and Fig. 2 for the three real world species.

Table 1 Summary of the different gene regulatory networks used in 10 DREAM simulation experiments
Table 2 Relative performance of different ensemble methods using all level 1 inference methods’ results (i.e., regardless of kurtosis) as ensemble inputs and the same models while only using level 1 inference methods’ results with positive kurtosis, marked by plus signs (corresponding to positive kurtosis)

For the in silico DREAM datasets, the underlying gold standard priors that define each regulatory network were divided into a 2:1 training/testing split, so there were twice as many regulators in training as in testing. Because the split was done with respect to the regulators, the training and testing sets share no common transcription factors. We believe splitting based on transcription factors is the correct approach, because experimental assays commonly over-express or repress particular transcription factors. The practical goal is that if a species has some TFs with experimentally validated edges, then edges from untested TFs can be inferred.

For each dataset, we first applied 11 base level inference methods to the training data both to determine a promising single method to apply to the test data and as an input to the construction of the ensemble model. Out of the 12 methods included in BEELINE, SINCERITIES, SINGE and SCNS either produced no output or exceeded the time limit of one week for one or more of the datasets. We applied those individual level 1 inference methods (not only the most promising ones from the training data) as well as the level 2 non homogeneous ensemble models to the test set.

To assess ensemble models, we compared them with one another and with the best level 1 inference methods in both training and testing evaluations. As [17] have pointed out for the DREAM challenge, one simple yet (in DREAM at least) effective way to integrate multiple inference results is to rank potential edges according to their average rank given by all inference methods. We will also include this “community” method as a reference point for our ensemble models.

The experiments on the DREAM in silico datasets focused on three objectives: (i) for each dataset, how well did the level 1 inference methods that performed best on the training set perform on the test set? (ii) how well did the ensemble learners perform on the test set? (iii) how did the level 1 inference method that performed best on the test set compare to the level 2 ensemble models? Note that the comparison of (iii) is unfair to the ensemble models, because there is no way to know a priori which level 1 inference method will perform best on a given test set, so choosing the best one gives an unfair advantage to the level 1 inference methods.

On the experimental datasets from real world species, similarly, four level 1 methods from BEELINE: GRNVBEM, GRISLI, SINGE and SCNS were not able to produce proper inference results due to time or memory constraints on the larger datasets (e.g. they did not finish after a week), hence were not included in the ensemble approach. We then applied the best performing level 2 ensemble models from the DREAM experiments to the available 10 base level inference methods. Furthermore, we varied the input to the level 2 ensemble models by including or excluding the results from the three most poorly performing level 1 inference methods.


Base inference method performance

On the DREAM datasets, the performance of the algorithms featured in the BEELINE framework is consistent with the original paper [7]. GENIE3, GRNBOOST and PIDC performed the best among the algorithms the BEELINE authors tested. As it happened, the methods we added to the framework (Inferelator and OutPredict) outperformed those methods in many cases. Nevertheless, no individual level 1 inference method dominated the others, as seen in Table 1. We also note that while the best level 1 inference methods in training is often the best algorithm in testing, that is not always the case.

Ensemble performance

The Naive Bayes model we used works on the assumption that the likelihood distribution of edge presence is Gaussian-like with respect to any given input’s confidence score. On the DREAM datasets, two of the eleven level 1 inference methods (GRISLI and PIDC) produced outputs that did not have a Gaussian like distribution (reflected as a negative kurtosis). We therefore experimented using all level 1 inference methods as input as well as using only the level 1 inference methods whose output distribution has positive kurtosis as inputs to better accommodate Naive Bayes model. The combined results are presented in Table 2. For most ensemble methods, using all 11 level 1 inference methods versus using 9 does not change the result. However, for Naive Bayes and Logistic Regression with Stochastic Gradient Descent, eliminating those level 1 inference methods that produce non-Gaussian like output helps. In fact, Naive Bayes is overall the winner across all tested models and configurations when the input is limited by the positive kurtosis filter. While logistic regression, random forest and adaptive boosting also performed favorably compared to the best performing level 1 inference methods in training as well as compared to the average rank of level 1 inference methods of [17].

Hence, for real world experimental datasets, the four models: logistic regression, Naive Bayes, random forest and adaptive boosting were selected as level 2 ensemble models for analysis (see Fig. 2). Here the likelihood distributions of all results from the level 1 inference methods have a positive kurtosis measure, so all 10 of them were utilized for the level 2 ensemble methods.

Fig. 2
figure 2

The performance of various network inference methods on three different species, from top to bottom: a B. subtilis gene regulatory network using bulk RNA-seq expression data [25]; an Arabidopsis network using bulk RNA-seq expression data [26]; a mouse Embryonic Stem Cell functional interaction network using single cell RNA-seq data [27] and a human Embryonic Stem Cell ChIP-seq network using single cell RNA-seq expression data [28]. Inference performance was measured using the ratio of the AUPRC of each inference method divided by that of a random predictor. Gold standard priors from each of the three species were split into a random 2:1 training/testing configuration. Ensemble models along with base inference methods that are able to incorporate prior information were trained using training gold standard priors. Then all inference results were applied using the testing subset of the gold standard data yielding an AUPRC ratio using 20 random training/testing split setups. The mean AUPRC ratio of each method on the test data among these 20 experiments is represented in the bar chart. All four ensemble models were evaluated here with base inference methods, and each of them was trained with the three worst performing base inference methods in training set (see the A series histogram) or without the three worst performing base inference methods (B series histogram). Asterisks indicate a statistically significant (p-value below 0.05 in non-parametric paired tests) improvement compared to the best level 1 inference method (and compared to the average ranking approach). Overall, Naive Bayes performs best, but in some cases Adaptive Boosting and Random Forests do almost as well

Figure 2 shows that

  • The Naive Bayes approach on inputs having positive kurtosis outperforms the other three ensemble method, so our system EnsInfer uses Naive Bayes as the default option.

  • Including results from weak learners has a marginal impact (sometimes positive and sometimes negative) on the final ensemble performance. For the sake of simplicity, therefore, EnsInfer includes inputs from all available inference methods having positive kurtosis, even the weak ones.

Figure 3 shows that the Naive Bayes ensemble approach significantly (p-value < 0.05) outperformed the best level 1 method on B. subtilis and Arabidopsis. The ensemble method with all level 1 methods still had an advantage in mESC data although the performance gain was less statistically significant with p-value of 0.133. To calculate the p-value, we conservatively chose a non-parametric paired resampling approach [33] because we did not want to assume any particular distribution on the data. We used a paired test because we measured the AUPRC gain for each training/test split. (That is, the set of training/testing splits were established randomly and initially. Then, for the numerical experiments, each method used that set.) In the hESC case, the Naive Bayes ensemble method achieved approximately the same level of performance as the best level 1 method on the test set. As noted above, the best level 1 inference method for the test set cannot be known a priori (and not even looking at each method’s performance in training), so using an ensemble method gives high performance without having to know which level 1 inference method is best. The Naive Bayes approach also consistently outperformed the average voting ensemble approach [17].

Fig. 3
figure 3

The AUPRC improvement of the Naive Bayes ensemble model (restricting inputs to those with positive kurtosis, but including weak learners) compared to the single best base inference method. In the B. subtilis and Arabidopsis datasets the improvement had p-value \(< 0.05\) using a non-parametric paired test [33]. The test should be paired, because the same set of training and testing splits were used for every method. In the human dataset, the ensemble method was about equal to the best base inference method. As noted in the text, the best base method cannot be known a priori, so these comparisons understate the advantage of the ensemble method

EnsInfer: Compared to running a single inference method, the ensemble approach requires an amount of computation resources equal to the sum of the time to run all base inference algorithms, plus the ensemble effort itself. However, all base inference methods can be executed in parallel, so the wall clock time of executing level 1 inference process is just the time of the slowest method which often is also single threaded. The level 2 ensemble effort itself is less than 1/10 the time of the slowest base method as shown in Additional file 2: Table S2. We can therefore conclude that EnsInfer’s wall clock time is close to that of the slowest base inference method.


Consistently with [7], we find that no one inference method is best for all datasets tested in our study. However, a Naive Bayes level 2 ensemble model built from level 1 inference methods having positive kurtosis holds great promise as a general ensemble network inference approach and is thus the basis of EnsInfer. Naive Bayes may work better than more sophisticated Bayesian methods, because at the core of the Bayesian method, we need to estimate the likelihood distribution of p(x|e) where x is the score given by a level 1 inference method and e is the existence of an edge. Since this generative process varies dramatically across different datasets and inference methods, the Gaussian assumption used by Naive Bayes is as good as any and keeps the model simple.

Please note, however, that there are cases when Naive Bayes does not improve on the best single inference method. This happens primarily when the results are little better than random. For example, the inferred regulatory networks from single-cell human embryonic stem cell data from [7] was barely better than random using any base method in BEELINE. The ensemble does not improve that.

Naive Bayes works particularly well in a sparse data environment, which is often the case when experimental data is hard to come by. For example, there are only 29 experimentally validated transcription factors for Arabidopsis and 154 for B. subtilis. Another point in favor of Naive Bayes is that the the size of the feature space (the number of outputs of the level 1 inference methods) is small. If the training dataset and feature space were larger, Random Forest-based approaches might do better. Our current investigation used roughly a dozen level 1 inference methods. Other promising new inference ones could be added such as BiXGBoost and DeepSEM [4, 5]. A level 2 ensemble method might potentially require a feature selection step if many more inference algorithms were included.


The main overall benefit of the ensemble method EnsInfer is its robust and flexible nature. Instead of picking a network inference method and hoping that it will perform well on a dataset, EnsInfer uses a combination of state-of-the-art inference approaches and combines them using a simple Naive Bayes ensemble model. Because the ensemble approach essentially turns all the predictions from different inference algorithms into priors about each edge in the network, EnsInfer easily allows the integration of diverse kinds of data (e.g. bulk RNA-seq, single cell RNA-seq) as well as new inference methods.

Availability of data and materials

All experimental data and source code for the ensemble process can be found at our Github repository:


  1. Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V. The inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006;7(5):1–16.

    Article  Google Scholar 

  2. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE. 2010;5(9):12776.

    Article  Google Scholar 

  3. Matsumoto H, Kiryu H, Furusawa C, Ko MS, Ko SB, Gouda N, Hayashi T, Nikaido I. Scode: an efficient regulatory network inference algorithm from single-cell rna-seq during differentiation. Bioinformatics. 2017;33(15):2314–21.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Zheng R, Li M, Chen X, Wu F-X, Pan Y, Wang J. Bixgboost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks. Bioinformatics. 2019;35(11):1893–900.

    Article  CAS  PubMed  Google Scholar 

  5. Shu H, Zhou J, Lian Q, Li H, Zhao D, Zeng J, Ma J. Modeling gene regulatory networks using neural network architectures. Nat Comput Sci. 2021;1(7):491–501.

    Article  Google Scholar 

  6. Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform. 2021;22(5):009.

    Article  Google Scholar 

  7. Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali T. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kim S. ppcor: an r package for a fast calculation to semi-partial correlation coefficients. Commun Stat Appl Methods. 2015;22(6):665.

    PubMed  PubMed Central  Google Scholar 

  9. Specht AT, Li J. Leap: constructing gene co-expression networks for single-cell rna-sequencing data using pseudotime ordering. Bioinformatics. 2017;33(5):764–6.

    Article  CAS  PubMed  Google Scholar 

  10. Chan TE, Stumpf MP, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 2017;5(3):251–67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Qiu X, Rahimzamani A, Wang L, Mao Q, Durham T, McFaline-Figueroa JL, Saunders L, Trapnell C, Kannan S: Towards inferring causal gene regulatory networks from single cell expression measurements. BioRxiv, 426981 (2018)

  12. Moerman T, Aibar Santos S, Bravo González-Blas C, Simm J, Moreau Y, Aerts J, Aerts S. Grnboost2 and arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics. 2019;35(12):2159–61.

    Article  CAS  PubMed  Google Scholar 

  13. Cirrone J, Brooks MD, Bonneau R, Coruzzi GM, Shasha DE. Outpredict: multiple datasets can improve prediction of expression and inference of causality. Sci Rep. 2020;10(1):1–9.

    Google Scholar 

  14. Gibbs CS, Jackson CA, Saldi G-A, Shah A, Tjärnberg A, Watters A, De Veaux N, Tchourine K, Yi R, Hamamsy T, et al.: Single-cell gene regulatory network inference at scale: The inferelator 3.0. BioRxiv (2021)

  15. Papili Gao N, Ud-Dean SM, Gandrillon O, Gunawan R. Sincerities: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics. 2018;34(2):258–66.

    Article  PubMed  Google Scholar 

  16. Aubin-Frankowski P-C, Vert J-P. Gene regulation inference from single-cell rna-seq data with linear differential equations and velocity inference. Bioinformatics. 2020;36(18):4774–80.

    Article  CAS  PubMed  Google Scholar 

  17. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods. 2016;13(4):310–8.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Saint-Antoine MM, Singh A. Network inference in systems biology: recent developments, challenges, and applications. Curr Opin Biotechnol. 2020;63:89–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Jan Z, Verma B. Multiple strong and balanced cluster-based ensemble of deep learners. Pattern Recogn. 2020;107:107420.

    Article  Google Scholar 

  21. Shahabadi MSE, Tabrizchi H, Rafsanjani MK, Gupta B, Palmieri F. A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems. Technol Forecast Soc Chang. 2021;169:120796.

    Article  Google Scholar 

  22. Stark R, Grzelak M, Hadfield J. Rna sequencing: the teenage years. Nat Rev Genet. 2019;20(11):631–56.

    Article  CAS  PubMed  Google Scholar 

  23. Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, Stolovitzky G. Towards a rigorous assessment of systems biology models: the dream3 challenges. PLoS ONE. 2010;5(2):9202.

    Article  Google Scholar 

  24. Schaffter T, Marbach D, Floreano D. Genenetweaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011;27(16):2263–70.

    Article  CAS  PubMed  Google Scholar 

  25. Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T. An experimentally supported model of the bacillus subtilis global transcriptional regulatory network. Mol Syst Biol. 2015;11(11):839.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Varala K, Marshall-Colón A, Cirrone J, Brooks MD, Pasquino AV, Léran S, Mittal S, Rock TM, Edwards MB, Kim GJ. Temporal transcriptional logic of dynamic regulatory networks underlying nitrogen signaling and use in plants. Proc Natl Acad Sci. 2018;115(25):6494–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hayashi T, Ozaki H, Sasagawa Y, Umeda M, Danno H, Nikaido I. Single-cell full-length total rna sequencing uncovers dynamics of recursive splicing and enhancer rnas. Nat Commun. 2018;9(1):1–16.

    Article  CAS  Google Scholar 

  28. Chu L-F, Leng N, Zhang J, Hou Z, Mamott D, Vereide DT, Choi J, Kendziorski C, Stewart R, Thomson JA. Single-cell rna-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 2016;17(1):1–20.

    Article  Google Scholar 

  29. Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–59.

    Article  Google Scholar 

  30. Aburomman AA, Reaz MBI. A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput Secur. 2017;65:135–52.

    Article  Google Scholar 

  31. Chen T, Guestrin C: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd international conference on knowledge discovery and data mining, pp. 785–794 (2016)

  32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  33. Shasha D, Wilson M. Statistics is easy! Synth Lect Math Stat. 2010;3(1):1–174.

    Google Scholar 

Download references


We would like to acknowledge the suggestions and help of Jacopo Cirrone and Ji Huang.


This work was support by the U.S. National Institutes of Health 1R01GM121753-01A1; the U.S. National Science Foundation under Grants MCB-1412232, IOS-1339362, and MCB-0929338; and by NYU WIRELESS. That support is greatly appreciated.

Author information

Authors and Affiliations



BS and DS wrote the main manuscript text and GC helped with revisions in the biology part of the manuscript. BS carried out all the experiment and analysis for the paper. All figures were prepared by BS. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dennis Shasha.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Parameter search space for all ensemble methods used in our experiments, and the optimal parameters obtained from tuning on a DREAM dataset.

Additional file 2: Table S2.

Execution time of various inference algorithms and ensemble methods used in this research. Time was measured for the mESC, hESC, arabidopsis and B. subtilis dataset, on an Ubuntu 20.04 system with AMD Ryzen™ 9 5900X CPU.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, B., Coruzzi, G. & Shasha, D. EnsInfer: a simple ensemble approach to network inference outperforms any single method. BMC Bioinformatics 24, 114 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Gene regulatory networks
  • Machine learning
  • Transcriptional regulation
  • Non homogeneous ensemble