- Open Access
CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data
- Guangyong Zheng†1Email author,
- Yaochen Xu†1, 2,
- Xiujun Zhang3,
- Zhi-Ping Liu3,
- Zhuo Wang4,
- Luonan Chen3Email author and
- Xin-Guang Zhu1Email author
© The Author(s). 2016
Published: 23 December 2016
A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale.
Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package.
This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/.
In the post-genome era, an important task of molecular biology is to reconstruct gene regulatory networks (GRNs), which represent interactions between genes inside a cell or tissue. A GRN provides molecular interactions and regulatory effects of components involved in a biological process, and hence provides insights into the molecular mechanism of the process [1, 2]. In detail, GRNs can be used to interpret biological processes through studying topological structure information of sub-networks related to these processes, where genes facilitate specific biological functions together [3, 4]. GRNs can help annotate genes clustered in modules and motifs since genes in the same module or motif have similar functions [5, 6]. GRNs can be utilized to identify dynamical network biomarkers (DNB) at the critical states of biological processes if stage-wise data are available, which help biologists understand mechanism of biological process better [7, 8]. Therefore, reconstruction of GRNs can not only support investigating roles of genes and components involved in a biological process, but also help study how a process is developed and maintained.
In the last decade, many algorithms have been developed to infer GRNs based on reverse-engineering methods, such as Bayesian network [9–11], Boolean network [12, 13], linear and non-linear regression [14–18], differential equation [19, 20], information-theoretic approaches [21–26], probabilistic phylogeny network , part mutual information network , and probabilistic graphical models [29–32]. In 2011, we proposed a GRN inference algorithm, named PCA-CMI, which can distinguish direct interactions of gene pairs from indirect ones based on the conditional mutual information (CMI) measurement [33–35]. However, two limitations of the algorithm hinder its wide application. One is that an appropriate threshold should be assigned to the method for direct interactions judgment in advance, which is difficult for users since the threshold is hard to select before GRN reconstruction. The other is that the method is time-costly especially for genomic network reconstruction, which is a common restriction of most current GRN inferring methods.
In this report, we describe a new software package CMIP, which implements the PCA-CMI algorithm with the goal of enable biologists to build genomic networks easily. The CMIP package incorporates a threshold determination method and a parallel computing process for network inference. The threshold determination method can choose an appropriate cutoff on-the-fly for gene interaction judgment. Computing procedure of the CMI measurement is optimized to make the algorithm robust, in which parallel computing strategies are applied to accelerate calculation process. This paper describes the algorithm details, program implementation, prediction performance, and practical application of the CMIP package.
Workflow of the CMIP package
Correlation calculation of the CMIP algorithm
Threshold determination of gene interaction
Parallelization of the CMIP programs
In CMIP, parallel strategies were applied to speed up computing process of correlation. In practice, a CPU and a GPU version program of CMIP algorithm were developed so that users could utilize them in different computational environment. The CPU version program is implemented based on the OpenMP framework , where loop calculation is accelerated with the multi-threads technology. In detail, the total computing task of correlation is first calculated based on gene numbers, and then computing tasks is partitioned equally to each CPU node. While the GPU version program is implemented based on the CUDA framework , where serial and parallel computing tasks are undertaken by CPU and GPU cores respectively. In detail, a production-consumption strategy is used in the GPU version program, in which gene expression data used by correlation calculation is first processed by the CPU cores (production); then pre-processed data is delivered to GPU cores for correlation calculation (consumption) using a parallel mode; finally, the results are transferred from GPU to CPU cores for aggregation.
Evaluation of network inference methods
Results and discussion
Efficiency of the threshold determination method
Effectiveness of threshold determination method under different criteria
Offset less than 5%
Offset less than 10%
Offset less than 20%
Parameter selection of the CMIP software
Performance evaluation of the CMIP package on benchmark datasets
Scores of various network inference methods on benchmark datasets
Application of the CMIP package on real biological datasets
We further applied the CMIP software on real transcriptome data to check its practical applicability. The CMIP software was first used to build GRNs of pineapple leaves. In detail, a GRN of leaf base and a GRN of leaf tip were constructed based on genome-scale expression data. Totally, 15,483 genes (201,537 interactions) and 13,543 genes (188,391 interactions) were included in the base and tip GRNs respectively. Analysis of the node degree distribution suggested that both the tip and the base network showed small-world properties. Then, we extracted genes linked to metabolic enzymes of Crassulacean Acid Metabolism (CAM) in the base and tip networks. After that, genes linked to metabolic enzymes in the tip network but missed in the base network were identified as potential recruited regulators of CAM photosynthesis. Subsequently experimental study showed that regulators identified from network comparison do play important roles in photosynthesis differentiation . This application of CMIP software on real dataset shows its effectiveness and efficiency for genomic GRNs reconstruction.
Effectiveness of parallel computing framework of CMIP programs
Running time of different network inference programs
Running time of the CMIP programs in pineapple GRNs reconstruction
Leaf base network
Leaf tip network
Usage of the CMIP package
In this study, we provide a new software package for network inference, which can reconstruct genomic GRNs within a short time period. The software package has a number of novel features compared with other GRN inference methods. First, CMIP can detect direct gene interactions from indirect ones with a high accuracy based on the CMI measurement. Results of performance evaluation on benchmark datasets show that precision and accuracy of the CMIP algorithms are comparable to most currently used methods. Secondly, an automatic threshold determination method is incorporated into the CMIP algorithm, so users do not need specify a predefined cutoff for gene interaction judgment and an appropriate threshold can be provided on-the-fly. Numerical experiments confirm the efficiency of the threshold determination method. Last but not least, the OpenMP and CUDA framework are applied in the software to speed up computing process of the CMIP algorithms, which enables the software to build GRNs with less running time. With this feature, the software is suitable to reconstruct genomic GRNs. The area of CMIP that needs future development is that it can’t provide directionality to edges of gene regulatory networks, which is a common limitation of many current methods, such as CLR  and minet . This limitation can be resolved by a two-steps routine. First, using the CMIP software to build a gene regulatory network as background model, then giving directionality to edges of the network according to results of biochemical perturbation experiments, or predicting directionality for edges of the network based on time series expression data .
We thank the anonymous reviewers for their constructive suggestion that help improve quality of the manuscript.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 17, 2016: Proceedings of the 27th International Conference on Genome Informatics: bioinformatics. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-17.
This study was supported by the Shanghai Municipal Natural Science Foundation [grant number 14ZR1446700], SA-SIBS Scholarship Program, National 863 Program Green Super Rice [grant number 2014AA101601], National Basic Research and Development Plan of China [grant number: 2015CB150104], CAS Strategic Research Project [grant number XDA08020301], and the National Natural Science Foundation of China [grant numbers 91529303, 91439103, 61134013, 81471047]. Publication costs for this study were funded by foundation mentioned above.
Availability of data and material
The datasets analyzed during the current study are available in the DREAM3 repository http://dreamchallenges.org/project-list/dream3-2008/.
GYZ, LNC, XJZ, ZPL, ZW, and XGZ conceived and designed the experiments. GYZ and YCX performed the experiments and analyzed the data. GYZ, LNC and XGZ wrote the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Barabasi AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–13.View ArticlePubMedGoogle Scholar
- Gardner TS, di Bernardo D, Lorenz D, Collins JJ. Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003;301(5629):102–5.View ArticlePubMedGoogle Scholar
- Artzy-Randrup Y, Fleishman SJ, Ben-Tal N, Stone L. Comment on “Network motifs: simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science. 2004;305(5687):1107. author reply 1107.View ArticlePubMedGoogle Scholar
- Braha D, Bar-Yam Y. Topology of large-scale engineering problem-solving networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69(1 Pt 2):016113.View ArticlePubMedGoogle Scholar
- Angeli D, Ferrell Jr JE, Sontag ED. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proc Natl Acad Sci U S A. 2004;101(7):1822–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Ma W, Trusina A, El-Samad H, Lim WA, Tang C. Defining network topologies that can achieve biochemical adaptation. Cell. 2009;138(4):760–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen L, Liu R, Liu ZP, Li M, Aihara K. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep. 2012;2:342.PubMedPubMed CentralGoogle Scholar
- Liu R, Wang X, Aihara K, Chen L. Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers. Med Res Rev. 2013;34(3):455–78.View ArticlePubMedGoogle Scholar
- Liu F, Zhang SW, Guo WF, Wei ZG, Chen L. Inference of Gene Regulatory Network Based on Local Bayesian Networks. PLoS Comput Biol. 2016;12(8):e1005024.View ArticlePubMedPubMed CentralGoogle Scholar
- Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics. 2005;21(1):71–9.View ArticlePubMedGoogle Scholar
- Brown LE, Tsamardinos I, Aliferis CF. A novel algorithm for scalable and accurate Bayesian network learning. Stud Health Technol Inform. 2004;107(Pt 1):711–5.PubMedGoogle Scholar
- Kauffman S, Peterson C, Samuelsson B, Troein C. Random Boolean network models and the yeast transcriptional network. Proc Natl Acad Sci U S A. 2003;100(25):14796–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Grieb M, Burkovski A, Strang JE, Kraus JM, Gross A, Palm G, Kuhl M, Kestler HA. Predicting Variabilities in Cardiac Gene Expression with a Boolean Network Incorporating Uncertainty. PLoS One. 2015;10(7):e0131832.View ArticlePubMedPubMed CentralGoogle Scholar
- Haury AC, Mordelet F, Vera-Licona P, Vert JP. TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Syst Biol. 2012;6:145.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Y, Joshi T, Zhang XS, Xu D, Chen LN. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics. 2006;22(19):2413–20.View ArticlePubMedGoogle Scholar
- Marbach D, Costello JC, Kuffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804.View ArticlePubMedPubMed CentralGoogle Scholar
- Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006;7(5):R36.View ArticlePubMedPubMed CentralGoogle Scholar
- Brooks AN, Reiss DJ, Allard A, Wu WJ, Salvanha DM, Plaisier CL, Chandrasekaran S, Pan M, Kaur A, Baliga NS. A system-level model for the microbial regulatory genome. Mol Syst Biol. 2014;10:740.View ArticlePubMedPubMed CentralGoogle Scholar
- Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, di Bernardo M, di Bernardo D, Cosma MP. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell. 2009;137(1):172–81.View ArticlePubMedGoogle Scholar
- Honkela A, Girardot C, Gustafson EH, Liu YH, Furlong EE, Lawrence ND, Rattray M. Model-based method for transcription factor target identification with limited data. Proc Natl Acad Sci U S A. 2010;107(17):7793–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7 Suppl 1:S7.View ArticlePubMedPubMed CentralGoogle Scholar
- Meyer PE, Lafitte F, Bontempi G. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics. 2008;9:461.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu X, Zheng G, Shan L, Meng G, Vingron M, Liu Q, Zhu XG. Reconstruction of gene regulatory network related to photosynthesis in Arabidopsis thaliana. Front Plant Sci. 2014;5(3):273.PubMedPubMed CentralGoogle Scholar
- Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5(1):e8.View ArticlePubMedPubMed CentralGoogle Scholar
- Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, Chow A, Steinhauser D, Persson S, Provart NJ. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ. 2009;32(12):1633–51.View ArticlePubMedGoogle Scholar
- Chevalier M, Venturelli O, El-Samad H. The Impact of Different Sources of Fluctuations on Mutual Information in Biochemical Networks. PLoS Comput Biol. 2015;11(10):e1004462.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang X, Moret BM. Refining regulatory networks through phylogenetic transfer of information. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(4):1032–45.View ArticlePubMedGoogle Scholar
- Zhao J, Zhou Y, Zhang X, Chen L. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci U S A. 2016;113(18):5130–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Weaver DC, Workman CT, Stormo GD: Modeling regulatory networks with weight matrices. Pac Symp Biocomput 1999:112–123Google Scholar
- Kramer N, Schafer J, Boulesteix AL. Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics. 2009;10:384.View ArticlePubMedPubMed CentralGoogle Scholar
- Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303(5659):799–805.View ArticlePubMedGoogle Scholar
- Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010, 5(9).Google Scholar
- Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L. NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics. 2012;29(1):106–13.View ArticlePubMedGoogle Scholar
- Zhang X, Zhao J, Hao JK, Zhao XM, Chen L. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res. 2014;43:e31.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao JK, Liu ZP, Chen L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2011;28(1):98–104.View ArticlePubMedGoogle Scholar
- Rabenseifner R, Hager G, Jost G. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on: 18–20 Feb. 2009. 2009. p. 427–36.View ArticleGoogle Scholar
- Nickolls J, Buck I, Garland M, Skadron K. Scalable Parallel Programming with CUDA. Queue. 2008;6(2):40–53.View ArticleGoogle Scholar
- Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci U S A. 2010;107(14):6286–91.View ArticlePubMedPubMed CentralGoogle Scholar
- Marbach D, Schaffter T, Mattiussi C, and Floreano D. Generating Realistic "in silico" Gene Networks for Performance Assessment of Reverse Engineering Methods. J Computational Biol. 2009;16(2):229-239.Google Scholar
- Ming R, VanBuren R, Wai CM, Tang H, Schatz MC, Bowers JE, Lyons E, Wang ML, Chen J, Biggers E, et al. The pineapple genome and the evolution of CAM photosynthesis. Nat Genet. 2015;47(12):1435–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Margolin AA, Wang K, Lim WK, Kustagi M, Nemenman I, Califano A. Reverse engineering cellular networks. Nat Protoc. 2006;1(2):662–71.View ArticlePubMedGoogle Scholar