TimeDelayARACNE: Reverse engineering of gene networks from timecourse data by an information theoretic approach
 Pietro Zoppoli^{1, 2},
 Sandro Morganella^{1, 2} and
 Michele Ceccarelli^{1, 2}Email author
https://doi.org/10.1186/1471210511154
© Zoppoli et al; licensee BioMed Central Ltd. 2010
Received: 27 July 2009
Accepted: 25 March 2010
Published: 25 March 2010
Abstract
Background
One of main aims of Molecular Biology is the gain of knowledge about how molecular components interact each other and to understand gene function regulations. Using microarray technology, it is possible to extract measurements of thousands of genes into a single analysis step having a picture of the cell gene expression. Several methods have been developed to infer gene networks from steadystate data, much less literature is produced about timecourse data, so the development of algorithms to infer gene networks from timeseries measurements is a current challenge into bioinformatics research area. In order to detect dependencies between genes at different time delays, we propose an approach to infer gene regulatory networks from timeseries measurements starting from a well known algorithm based on information theory.
Results
In this paper we show how the ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) algorithm can be used for gene regulatory network inference in the case of timecourse expression profiles. The resulting method is called TimeDelayARACNE. It just tries to extract dependencies between two genes at different time delays, providing a measure of these dependencies in terms of mutual information. The basic idea of the proposed algorithm is to detect timedelayed dependencies between the expression profiles by assuming as underlying probabilistic model a stationary Markov Random Field. Less informative dependencies are filtered out using an auto calculated threshold, retaining most reliable connections. TimeDelayARACNE can infer small local networks of time regulated genegene interactions detecting their versus and also discovering cyclic interactions also when only a mediumsmall number of measurements are available. We test the algorithm both on synthetic networks and on microarray expression profiles. Microarray measurements concern S. cerevisiae cell cycle, E. coli SOS pathways and a recently developed network for in vivo assessment of reverse engineering algorithms. Our results are compared with ARACNE itself and with the ones of two previously published algorithms: Dynamic Bayesian Networks and systems of ODEs, showing that TimeDelayARACNE has good accuracy, recall and Fscore for the network reconstruction task.
Conclusions
Here we report the adaptation of the ARACNE algorithm to infer gene regulatory networks from timecourse data, so that, the resulting network is represented as a directed graph. The proposed algorithm is expected to be useful in reconstruction of small biological directed networks from time course data.
Background
In order to understand cellular complexity much attention is placed on large dynamic networks of coregulated genes at the base of phenotype differences. One of the aims in molecular biology is to make sense of highthroughput data like that from microarray of gene expression experiments. Many important biological processes (e.g., cellular differentiation during development, aging, disease aetiology etc.) are very unlikely controlled by a single gene instead by the underlying complex regulatory interactions between thousands of genes within a fourdimension space. In order to identify these interactions, expression data over time can be exploited. An important open question is related to the development of efficient methods to infer the underlying gene regulation networks (GRN) from temporal gene expression profiles. Inferring, or reverseengineering, gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. A GRN can be modelled as a graph G = (V, U, D), where V is the set of nodes corresponding to genes, U is the set of unordered pair (undirected edges) and D is the set of ordered pairs D (directed edges). A directed edge d_{ ij }from v_{ i }to v_{ j }is present iff there is a causal effect from node v_{ i }to node v_{ j }. An undirected edge u_{ ij }represents the mutual association between nodes v_{ i }and v_{ j }. Gene expression data from microarrays are typically used for this purpose. There are two broad classes of reverseengineering algorithms [1]: those based on the physical interaction approach which aim at identifying interactions among transcription factors and their target genes (genetosequence interaction) and those based on the influence interaction approach that try to relate the expression of a gene to the expression of the other genes in the cell (genetogene interaction), rather than relating it to sequence motifs found in the promoters. We will refer to the ensemble of these influence interactions as gene networks. Many algorithms have been proposed in the literature to model gene regulatory networks [2] and solve the network inference problem [3].
Ordinary Differential Equations
Reverseengineering algorithms based on ordinary differential equations (ODEs) relate changes in gene transcript concentration to each other and to an external perturbation.
Typical perturbations can be for example the treatment with a chemical compound (i.e. a drug), or the over expression or down regulation of particular genes. A set of ODEs, one for each gene, describes gene regulation as a function of other genes. As ODEs are deterministic, the interactions among genes represent causal interactions, rather than statistical dependencies. The ODEbased approaches yield signed directed graphs and can be applied to both steadystate and timeseries expression profiles [3, 4].
Bayesian Networks
A Bayesian network [5] is a graphical model for representing probabilistic relationships among a set of random variables X_{ i }, where i = 1, ⋯, n. These relationships are encoded in the structure of a directed acyclic graph G, whose vertexes (or nodes) are the random variables X_{ i }. The relationships between the variables are described by a joint probability distribution P(X_{1}, ⋯, X_{ n }). The genes, on which the probability is conditioned, are called the parents of gene i and represent its regulators, and the joint probability density is expressed as a product of conditional probabilities. Bayesian networks cannot contain cycles (i.e. no feedback loops). This restriction is the principal limitation of the Bayesian network model [6]. Dynamic Bayesian networks overcome this limitation [7]. Dynamic Bayesian networks are an extension of Bayesian networks able to infer interactions from a data set consisting of timeseries rather than steadystate data.
Graphical Gaussian Model
Graphical Gaussian model, also known as covariance selection or concentration graph models, assumes multivariate normal distribution for underlying data. The independence graph is defined by a set of pairwise conditional independence relationships calculated using partial correlations as a measure of independence of any two genes that determine the edgeset of the graph [8]. Partial cross correlation has been also used to deal with time delays [9].
Gene Relevance Network
Gene relevance networks are based on the covariance graph model. Given a measure of association and defined a threshold value, for all pairs of domain variables (X, Y), association A(X, Y) is computed. Variables X and Y are connected by an undirected edge when association A(X, Y) exceeds the predefined threshold value. One of the measures of association is the mutual information (MI) [10, 11], one of the information theory (IT) main tools. In IT approaches, the expression level of a gene is considered as a random variable. MI is the main tool for measuring if and how two genes influence each other. MI between two variables X and Y is also defined as the reduction in uncertainty about a variable X after observing a second random variable Y. Edges in networks derived by informationtheoretic approaches represent statistical dependencies among gene expression profiles. As in the case of Bayesian network, the edge does not represent a direct causal interaction between two genes, but only a statistical dependency. It is possible to derive the informationtheoretic approach as a method to approximate the joint probability density function of gene expression profiles, as it is performed for Bayesian networks [12–14].
TimeCourse Reverse Engineering
Summary of the Proposed Algorithm
TimeDelayARACNE tries to extend to timecourse data ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) retrieving time statistical dependency between gene expression profiles. The idea on which TimeDelayARACNE is based comes from the consideration that the expression of a gene at a certain time could depend by the expression level of another gene at previous time point or at very few time points before. TimeDelayARACNE is a threesteps algorithm: first it detects, for all genes, the time point of the initial changes in the expression, secondly there is network construction and finally a network pruning step. Is is worth noticing that, the analytical tools for time series often require conditions such as stability and stationarity (see [24]). Although it is not possible to state that these conditions hold in general for microarray data, this is due to the limited number of samples and to the particular experimental setup producing the data, nevertheless time series analysis methods have been demonstrated to be useful tools in many applications of time course gene expression data analysis, for example Ramoni et al. [25], used an autoregressive estimation step as feature extraction prior to classification, while Holter et al., [26] use the characteristic modes obtained by singular value decomposition to model a linear framework resulting in a time translational matrix. In particular TimeDelayARACNE, just as many related works (see for example the paper of [27]) implicitly assumes stationarity and stability conditions in the kernel regression estimator used for the computation of the mutual information, as described in the section Methods. Indeed, the synthetic data generation model (4) and (5) assumes a weakly stationary linear autoregressive time series. We do not attempt removal of the trend because of the short length of the data and the wide variability of the periodicity of the cell division cycle.
Results and Discussion
Algorithm Evaluation
TimeDelayARACNE was evaluated first alone than against ARACNE, dynamical Bayesian Networks implemented in the Banjo package [28] (a software application and framework for structure learning of static and dynamic Bayesian networks) and ODE implemented in the TSNI package [29] (Time Series Network Identification) with both simulated gene expression data and real gene expression data related to yeast cell cycle [30], SOS signaling pathway in E. coli[31] and an in vivo synthetic network, called IRMA [32]. Details on the gene expression data and the construction of the simulated networks are presented in Methods section.
Synthetic Data
In order to quantitatively evaluate the performance of the algorithm reported here over a dataset with a simple and fair "gold standard" and also to evaluate how the performance depend of the size of the problem at the hand, such as network dimension, number of time points, and other variables we generated different synthetic datasets. Our profile generation algorithm (see Methods) starts by creating a random graph which represents the statistical dependencies between expression profiles, and then the expression values are generated according to a set of stochastic difference equation with random coefficients. The network generation algorithm works in such a way that each node can have zero (a "stimulator" node) one or two regulators. In addition to the random coefficients of the stochastic equations, a random Gaussian noise is added to each expression value. The performance are evaluated for each network size, number of time points and amount of noise by averaging the PPV, recall and Fscore over a set of 20 runs with different randomly generated networks. The performance is measured in terms of:

Fscore. Indeed, the overall performance depend both of the positive prediction value and recall as one can improve the former by increasing the detection threshold, but at the expenses of the latter and vice versa. The Fscore measure is the geometric mean of p and r and represents the compromise between them:
Since TimeDelayARACNE always tries to infer edge's direction, so the precisionrecall curves take into account direction. As a matter of fact an edge is considered as a true positive only if the edge exist in reference table and the direction is correct.
TimeDelayARACNE, TSNI, Banjo and ARACNE performance against synthetic data changing network gene numbers.
TimeDelayARACNE  TSNI  Banjo  ARACNE  

Genes  Time Points  PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score 
genes 10  points 50  0.33  0.60  0.41  0.43  0.11  0.18  <0.19  0.13  0.15  0.78  0.12  0.21 
genes 20  points 50  0.46  0.35  0.39  0.52  <0.1  0.11  <0.1  0.13  0.11  0.55  0.06  0.10 
genes 30  points 50  0.47  0.23  0.29  0.35  <0.1  <0.1  <0.1  <0.1  <0.1  0.68  0.04  0.08 
TimeDelayARACNE, TSNI, Banjo and ARACNE performance against synthetic data changing network data points.
TimeDelayARACNE  TSNI  Banjo  ARACNE  

Genes  Time Points  PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score 
genes 10  points 10  0.24  0.22  0.22  0.29  0.14  0.19  <0.1  <0.1  <0.1  0.27  0.14  0.17 
genes 10  points 20  0.31  0.34  0.32  0.22  0.10  0.14  0.20  <0.1  0.13  0.25  0.09  0.13 
genes 10  points 30  0.31  0.42  0.34  0.39  0.11  0.17  0.17  0.10  0.12  0.63  0.14  0.22 
genes 10  points 40  0.41  0.55  0.45  0.39  <0.1  0.15  0.22  0.16  0.18  0.79  0.14  0.23 
genes 10  points 50  0.33  0.60  0.41  0.43  0.11  0.18  0.19  0.13  0.15  0.78  0.12  0.21 
Real Expression Profiles
In order to test TimeDelayARACNE performance on expression profiles we selected an eleven genes network from yeast S. cerevisiae cell cycle, more precisely part of the G1 step. Selected genes are: Cln 3, Cdc 28, Mbp 1, Swi 4, Clb 6, Cdc 6, Sic 1, Swi 6, Cln 1, Cln 2, Clb 5. To try to infer the gene network controlling yeast cell cycle regulation, we choose genes whose mRNA levels respond to the induction of Cln3 and Clb2 that are two wellcharacterized cell cycle regulators [33]. Late in G1 phase, the Cln3Cdc28 protein kinase complex activates two transcription factors, MBF (Mbp1 and Swi6) and SBF (Swi4 and Swi6), and these promote the transcription of some genes important for budding and DNA synthesis [34]. Entry into S phase requires the activation of the protein kinase Cdc28p through binding with cyclin Clb5 or Clb6, as well as the destruction of the cyclindependent kinase inhibitor Sic1 [35]. Swi4 associates with Swi6 to form the SCBbinding factor complex that activates G1 cyclin genes CLN1 and CLN2 in late G1. Mbp1, a transcription factor related to Swi4, forms the MCBbinding factor complex with Swi6, which activates DNA synthesis genes and Sphase cyclin genes CLB5 and CLB6 in late G1 [36]. In budding yeast, commitment to DNA replication during the normal cell cycle requires degradation of the cyclindependent kinase (CDK) inhibitor Sic1. The G1 cyclinCDK complexes Cln1Cdk1 and Cln2Cdk1 initiate the process of Sic1 removal by directly catalyzing Sic1 phosphorylation at multiple sites [37]. In Figure 3 we report network graphs reconstructed by the TimeDelayARACNE, TSNI and Banjo. We also report the KEGG pathway of the cellcycle in yeast. We consider this last information as a true table to compare the results of the algorithm with respect to the others. TSNI and Banjo are used with default settings reported in their manuals. TimeDelayARACNE recovers many genegene edges as reported in Table 3. We don't use PPV, recall and Fscore to evaluate the algorithm. Differences between true table and inferred network could be eventually due to the possible incongruence between experimental data and true table. We also tested the proposed algorithm using eight genes by E. coli SOS pathway [31]. In the E. coli after the cell is exposed to DNA damaging agents there is the activation of the SOS pathway. Such DNA damaging involves the induction of about 30 genes [38]. Many of these gene products are involved in DNA damage tolerance and repair (e.g. recA, lexA, umuDC, polB, sulA, and uvrA). The SOS response to DNA damage requires the recA and lexA gene products. Near the promoters of the SOS response genes there is a site (the SOS box) bonded by the repressor protein LexA that interferes with the binding of RNA polymerase [39]. Selected genes are: uvrD, lexA, umuDC, recA, uvrA, uvrY, ruvA, polB as in [40]. In Figure 4 we report the SOS pathway reconstruction by the three algorithms and the relative bibliographic control. In Table 4 there is a detailed description of the eight genes network connections showing that TimeDelayARACNE recovers these network topologies better than other algorithms.
TimeDelayARACNE test on the yeast eleven genes network.
Genes  Kegg  Correct  Wrong 

Cln3  Swi4, Swi6, Mbp1  Swi4, Swi6, Mbp1   
Swi6  Cln1/2, Clb5/6  Cln1, Clb6  Clb5 
Swi4  Cln1/2, CDC28  Cln1  Clb6 
Mbp1  Clb5/6, Cdc28     
Cln1  Sic1  Sic1   
Cln2  Sic1, Swi4/6  Sic1, Swi4   
Clb5  Cdc6    Sic1 
Clb6  Cdc6  Cdc6   
Sic1  Clb5/6, Cdc28     
Cdc6      Sic1 
Cdc28  Swi4/6, Mbp1, Cln1/2, Sic1, Cdc6  Swi6, Mbp1  Clb5 
TimeDelayARACNE test on the E. coli eight genes network.
Genes  SOS True Relations  Correct  Wrong 

recA  lexA     
lexA  uvrD, umuDC, recA, uvrA, uvrY, ruvA, polB.  uvrD, umuDC, recA, uvrA, ruvA, polB.   
uvrY      lexA 
TimeDelayARACNE test on the IRMA in vivo synthetic network.
TimeDelayARACNE  TSNI  Banjo  ARACNE  

PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score  PPV  Recall  F score  
Switch ON network  
5 genes network  0.71  0.67  0.69  0.80  0.50  0.61  0.30  0.25  0.27  0.5  0.6  0.54 
simplified network  0.80  0.67  0.73  1.0  0.67  0.80        0.5  0.5  0.5 
Switch OFF network  
5 genes network  0.37  0.60  0.46  0.60  0.38  0.46  0.6  0.38  0.46  0.25  0.33  0.28 
simplified network  0.50  0.75  0.60  0.75  0.5  0.60        0.5  0.6  0.54 
According to this we try a compromise: no threshold was applied but we just apply the DPI pruning. We can observe that TimeDelayARACNE reaches good performance, in terms of recall and Fscore, in all considered cases with respect to the results reported in [32] and to the ARACNE results over the same networks. This means that it reaches a good compromise between PPV and recall.
Conclusions
The goal of TimeDelayARACNE is to recover gene time dependencies from timecourse data producing oriented graph. To do this we introduce time Mutual Information and Influence concepts. First tests on synthetic networks and on yeast cell cycle, SOS pathway data and IRMA give good results but many other tests should be made. Particular attention is to be made to the data normalization step because the lack of a rule. According to the little performance loss linked to the increasing gene numbers shown in this paper, next developmental step will be the extension from littlemedium networks to medium networks.
Methods
Datasets
Simulated Gene Expression Data
We construct some synthetic gene networks in order to compute the functions p, r and Fscore of the method having reference true tables and to compare its performance to other methods. According to the terminology in [41] we consider a gene network to be welldefined if its interactions allow to distinguish between regulator genes and regulated genes, where the first affect the behaviour of the second ones. Given a well defined network, we can have genes with zero regulators (called stimulators, which could represent the external environmental conditions), genes with one regulator, genes with two regulators, and so on. If a gene has at least one regulator (it is not a stimulator) then it owns a regulation function which describes its response to a particular stimulus of its regulator/regulators.
Our synthetic networks contain some stimulator genes with a random behaviour and regulated genes which can eventually be regulators of other genes. The network dynamics are modeled by linear causal relations which are formulated by a set of randomly generated equations. In particular, let us call the expression of gene i at time t as , our synthetic network generation module works as follows,
here the coefficients and are random variables uniformly distributed in [0, 1] and is a random Gaussian noise with zero mean and variance σ^{2}. Moreover the regulators genes p_{ i }and q_{ i }of the ith are randomly selected at the beginning of each simulation run. The network generation algorithm is set in such a way that 75% of genes have one regulator and 25% of genes have two regulators.

each expression profile is then normalized to be within the interval [0, 1]
In our experiments, the PPV, recall and Fscore of the proposed and the other methods is computed as the average over a set of 20 runs over different random networks with the same number of genes, number of time points and noise levels.
Microarray Expression Profiles
The time course profiles for a set of 11 genes, part of the G1 step of yeast cell cycle, are selected from the widely used yeast, Saccharomyces cerevisiae, cell cycle microarray data [30]. These microarray experiments were designed to create a comprehensive list of yeast genes whose transcription levels were expressed periodically within the cell cycle. We select one of this profile in which the gene expressions of cell cycle synchronized yeast cultures were collected over 17 time points taken in 10minute intervals. This time series covers more than two complete cycles of cell division. The first time point, related to the M step, is excluded in order to better recover the time relationships present in the G1 step. The true edges of the underlying network were provided by KEGG yeast's cell cycle reference pathway [42].
Green Fluorescent Protein RealTime Gene Expression Data
The time course profiles for a set of 8 genes, part of the SOS pathway of E. coli[31] are selected. Data are produced by a system for realtime monitoring of the transcriptional activity of operons by means of lowcopy reporter plasmids in which a promoter controls GFP (green fluorescent protein). Even if the data contain 50 time points we use only the first 14 points (excluding the first point of the TS data which is zero) avoiding the misguiding flat tails characterizing such gene profiles (the response to the UV stimulus is quick, so very soon mRNAs came back to prestimulus condition). The expression levels are obtained by averaging the replicates.
IRMA network
Two sets of five genes of time course profiles are provided by realtime PCR from an in vivo yeast synthetic network [32]. One set, called Switch ON data set, is the result of the time measurements, every 20 minutes for 5 hours, of the mRNA concentration after shifting cells from glucose to galactose, for a total of 5 profiles of 16 points. The other one, called Switch OFF data set, is the result of the time measurements, every 10 minutes for 3 hours, of the mRNA concentration after shifting cells from galactose to glucose, for a total of 5 profiles of 21 points. The true edges of the underlying network are provided by the experiment design, and are provided as supplementary information from the paper [32].
Algorithms
ARACNE
The ARACNE algorithm has been proposed in [12, 43]. ARACNE is an informationtheoretic algorithm for the reverse engineering of transcriptional networks from steadystate microarray data. ARACNE, just as many other approaches, is based on the assumptions that the expression level of a given gene can be considered as a random variable, and the mutual relationships between them can be derived by statistical dependencies. It defines an edge as an irreducible statistical dependency between gene expression profiles that cannot be explained as an artifact of other statistical dependencies in the network. It is a two steps algorithm: network construction and network pruning. Within the assumption of a twoway network, all statistical dependencies can be inferred from pairwise marginals, and no higher order analysis is needed. ARACNE identifies candidate interactions by estimating pairwise gene expression profile mutual information, I(g_{ i }, g_{ j }) ≡ I_{ ij }, an informationtheoretic measure of relatedness that is zero iff the joint distribution between the expression level of gene i and gene j satisfies P(g_{ i }, g_{ j }) = P(g_{ i })P(g_{ j }). ARACNE estimates MI using a computationally efficient Gaussian Kernel estimator. Since MI is reparameterization invariant, ARACNE copulatransforms (i.e., rankorder) the profiles before MI estimation; the range of these transformed variables is thus between 0 and 1, and their marginal probability distributions are manifestly uniform. This decreases the influence of arbitrary transformations involved in microarray data preprocessing and removes the need to consider positiondependent kernel widths which might be preferable for nonuniformly distributed data. Secondly the MIs are filtered using an appropriate threshold, I_{0} thus removing the most of indirect candidate interactions using a well known information theoretic property, the data processing inequality (DPI). ARACNE eliminate all edges for which the null hypothesis of mutually independent genes cannot be ruled out. To this extent, ARACNE randomly shuffles the expression of genes across the various microarray profiles and evaluate the MI for such manifestly independent genes. The DPI states that if genes g_{1} and g_{3} interact only through a third gene, g_{2}, (i.e., if the interaction network is g_{1} ↔ ... ↔ g_{2} ↔ ... ↔ g_{3} and no alternative path exists between g_{1} and g_{3}), then I(g_{1,}g_{3}) ≤ min(I(g_{1}, g_{2}); I(g_{2}, g_{3})) [44]. Thus the least of the three MIs can come from indirect interactions only, and so it's pruned.
TimeDelayARACNE
TimeDelayARACNE tries to extend to timecourse data ARACNE retrieving time statistical dependency between gene expression profiles. TimeDelayARACNE is a 3 steps algorithm: it first detects, for all genes, the time point of the initial changes in the expression, secondly there is network construction than network pruning.
Step 1
The thresholds are chosen with . In all the reported experiments we used τ_{ up }= 1.2 and consequently τ_{ down }= 0.83. The quantity IcE(g_{ a }) can be used in order to reduce the unuseful influence relations between genes. Indeed, a gene g_{ a }can eventually influence gene g_{ b }only if IcE(g_{ a }) ≤ IcE(g_{ b }) [7].
Step 2
In particular, if the measure Infl(g_{ a }, g_{ b }) is above the the significance threshold, explained below, for a value of κ > 0, then this means that the activation of gene g_{ a }influences the activation of gene g_{ b }at a later time.
In other terms there is a directed link "from" gene g_{ a }"to" gene g_{ b }, this is the way TimeDelayARACNE can recover directed edges. On the contrary, the ARACNE algorithm does not produce directed edges as it corresponds to the case κ = 0, and the Mutual Information is of course symmetric.
We want to show direct gene interactions so under the condition of the perfect choice of experimental time points the best time delay is one because it allows to better capture direct interactions while other delays ideally should evidence more indirect interactions but usually time points are not sharply calibrated to detect such information, so considering few different time points could help in the task. If you consider a too long time delay you can see a correlation between gene a and gene c losing gene b which is regulated by a and regulates c while short time delay can be not sufficient to evidence the connection between gene a and gene b, so using some few delays we try to overcome the above problem. Other approaches based, for example, of conditional mutual information, such as in [48], could of course be exploited.
Step 3
In the last step TimeDelayARACNE uses the DPI twice. In particular the first DPI is useful to split threenodes cliques (triangles) at a single time delay. Whereas the second is applied after the computation of the time delay between each pair of genes as in (8). Just as in the standard ARACNE algorithm, three genes loops are still possible on the basis of a tolerance parameter. In particular triangles are maintained by the algorithm if the difference between the mutual information of the three connections is below the 15% (this the same tolerance parameter adopted in the ARACNE algorithm).
Computational Issues
The computational performance of the TimeDelayARACNE algorithm is influenced by the number of genes, by the mutual information estimation algorithm and by the adopted scheme of bootstrapping for the estimation of the threshold parameter. In particular if the network has n genes and t samples, we have to compute O(Kn^{2}) estimations of the mutual information between two vectors of samples having t elements or less, K being the maximum value of the parameter κ. We adopt a kernelbased estimator of the density of data used in the computation of Mutual Information; it is based on procedure proposed in [58] and implemented in the R package "GenKern" http://cran.rproject.org/. It performs a smoothing of data and an interpolation on a grid of fixed dimensions; the procedure also performs an automatic selection of the kernel bandwidth, by choosing the bandwidth which (approximately) minimizes good quality estimates of the mean integrated squared error [59]. Indeed, there are also other, more recent and elaborate, approaches for estimating entropy and Mutual Information. In particular approaches such as those proposes in [11, 60] deal with entropy estimation in the cases of a small number of highdimensional samples, where the kernelbased density estimator could be rather inefficient.
Therefore, each inner mutual information estimation just depends on t and on the size of the fixed grid, which in all our experiments we fixed at 100 × 100. The algorithms were developed in R and available at the site http://bioinformatics.biogem.it. To have an idea of the computational time required by each network reconstruction, the estimation of the mutual information on a standard platform (Intel Core 2 2, 4 GHz Duo processor with 2 GB RAM) between two expression profiles of size from 10 to 100 points ranges on the average between 0.07 and 0.13 seconds. The whole procedure, apart from the bootstrapping required to estimate the threshold I_{0}, on a network of 50 genes and 50 time points, requires less than 7 minutes. Therefore the most computational demanding step is the bootstrapping, it is needed to compute the threshold I_{0}. It consists in randomly permuting the dataset (the set of expression profiles row values), and then calculating the average mutual information and standard deviation of these random values. Depending on the number of samples in the bootstrap steps, the computational time changes; in all the reported experiments we used a number of 500 bootstrapping samples, this turns out to produce the reconstructed network of 50 genes and 50 time points in about 47 minutes.
Availability and Requirements
The software was implemented in R and can be downloaded at http://bioinformatics.biogem.it or by contacting the corresponding author.
Declarations
Acknowledgements
We thank the anonymous reviewers for their help in the improvement of the original manuscript. This work was supported by Italian Ministry MIUR (PRIN 20085CH22F).
Authors’ Affiliations
References
 Gardner TS, Faith JJ: ReverseEngineering Transcription Control Networks. Physics of Life Reviews 2005, 2: 65–88. 10.1016/j.plrev.2005.01.001View ArticlePubMedGoogle Scholar
 Hasty J, McMillen D, Isaacs F, Collins J: Computational studies of gene regulatory networks: in numeromolecular biology. Nature Review Genetics 2001, 2: 268–279. 10.1038/35066056View ArticleGoogle Scholar
 Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol 2007, 3: 78.View ArticlePubMedPubMed CentralGoogle Scholar
 Kim S, Kim J, Cho K: Inferring Gene Regulatory Networks from Temporal Expression Profiles under TimeDelay and Noise. Computational Biology and Chemistry 2007, 31: 239–245. 10.1016/j.compbiolchem.2007.03.013View ArticlePubMedGoogle Scholar
 Neapolitan R: Learning bayesian networks. Prentice Hall Upper Saddle River, NJ; 2003.Google Scholar
 Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology 2000, 7: 601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
 Zou M, Conzen SD: A new Dnamic Bayesian Network (DBN) Approach for Identifying Gene Regulatory Networks from Time Course Microarray Data. Bioinformatics 2005, 21: 71–79. 10.1093/bioinformatics/bth463View ArticlePubMedGoogle Scholar
 Schäfer J, Strimmer K: An Empirical Bayes Approach to Inferring LargeScale Gene Association Networks. Bioinformatics 2005, 21(6):754–764. 10.1093/bioinformatics/bti062View ArticlePubMedGoogle Scholar
 Stark E, Drori R, Abeles M: Partial CrossCorrelation Analysis Resolves Ambiguity in the Encoding of Multiple Movement Features. J Neurophysiol 2006, 95(3):1966–1975. 10.1152/jn.00981.2005View ArticlePubMedGoogle Scholar
 Butte AJ, Kohane IS: Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements. Pacific Symposium on Biocomputing 2000, 5: 415–426.Google Scholar
 Hausser J, Strimmer K: Entropy inference and the JamesStein estimator, with application to nonlinear gene association networks. Journal of Machine Learning Research 2009, 10: 1469–1484.Google Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics 2006, 7(Suppl I):S7. 10.1186/147121057S1S7View ArticlePubMedPubMed CentralGoogle Scholar
 Faith JJ, Hayete B, Thaden TT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biology 2007, 5: e8+. 10.1371/journal.pbio.0050008View ArticlePubMedPubMed CentralGoogle Scholar
 Meyer PE, Kontos K, Lafitte F, Bontempi G: Information Theoretic Inference of Large Transcriptional Regulatory Network. EURASIP Journal on Bioinformatics and Systems Biology 2007., 2007: 10.1155/2007/79879Google Scholar
 Shmulevich I, Dougherty ER, Kim S, Zhang W: Probabilistic Boolean Networks: a Rulebased Uncertainty Model for Gene Regulatory Networks. Bioinformatics 2002, 19: i255i263.Google Scholar
 Schliep A, Schönhuth A, Steinhoff C: Using Hidden Markov Models to Analyze Gene Expression Time Course Data. Bioinformatics 2003, 18(2):261–274.Google Scholar
 Cui Q, Liu B, Jiang T, Ma S: Characterizing the Dynamic Connectivity Between Genes by Variable Parameter Regression and Kalman Filtering Based on Temporal Gene Expression Data. Bioinformatics 2005, 21(8):1538–1541. 10.1093/bioinformatics/bti197View ArticlePubMedGoogle Scholar
 Bansal M, Gatta G, di Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression. Bioinformatics 2006, 22(7):815–822. 10.1093/bioinformatics/btl003View ArticlePubMedGoogle Scholar
 Chuang C, Jen C, Chen C, Shieh G: A pattern recognition approach to infer timelagged genetic interactions. Bioinformatics 2008, 24(9):1183–1190. 10.1093/bioinformatics/btn098View ArticlePubMedGoogle Scholar
 OpgenRhein R, Strimmer K: Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process. BMC Bioinformatics 2007, 8: S3. 10.1186/147121058S2S3View ArticlePubMedPubMed CentralGoogle Scholar
 Li X, Rao S, Jiang W, Li C, Xiao Y, Guo Z, Zhang Q, Wang L, Du L, Li J, et al.: Discovery of timedelayed gene regulatory networks based on temporal gene expression profiling. BMC bioinformatics 2006, 7: 26. 10.1186/14712105726View ArticlePubMedPubMed CentralGoogle Scholar
 Zhao W, Serpedin E, Dougherty E: Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics 2006, 22(17):2129. 10.1093/bioinformatics/btl364View ArticlePubMedGoogle Scholar
 Waibel A: Modular construction of timedelay neural networks for speech recognition. Neural Computation 1989, 1: 39–46. 10.1162/neco.1989.1.1.39View ArticleGoogle Scholar
 Luktepohl H: New Introduction to Multiple Time Series Analysis. Springer; 2005.Google Scholar
 Ramoni M, Sebastiani P, Kohane I: Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Science 2002, 99(14):9121–9126. 10.1073/pnas.132656399View ArticleGoogle Scholar
 Holter N, Maritan A, Cieplak M, Fedoroff N, Banavar J: Dynamic modeling of gene expression data. Proceedings of the National Academy of Science 2000, 98(4):1693–1698. 10.1073/pnas.98.4.1693View ArticleGoogle Scholar
 Mukhopadhyay ND, Chatterjee S: Causality and pathway search in microarray time series experiment. Bioinformatics 2006, 23(4):442–449. 10.1093/bioinformatics/btl598View ArticlePubMedGoogle Scholar
 Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis AJ: Advances to Bayesian Network Inference for Generating Causal Networks from Observational Biological Data. Bioinformtics 2004, 20(18):3594–3603. 10.1093/bioinformatics/bth448View ArticleGoogle Scholar
 Bansal M, Della Gatta G, Di Bernardo D: Inference of Gene Regulatory Networks and Compound Mode of Action from Time Course Gene Expression Profiles. Bioinformatics 2006, 22(7):815–822. 10.1093/bioinformatics/btl003View ArticlePubMedGoogle Scholar
 Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botsein D, Futcher B: Comprehensive Identification of Cell Cycleregulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 1998, 9(12):3273–3297.View ArticlePubMedPubMed CentralGoogle Scholar
 Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning Numbers to the Arrows: Parameterizing a Gene Regulation Network by Using Accurate Expression Kinetics. Proc Natl Acad Sci USA 2002, 99(16):10555–10560. 10.1073/pnas.152046799View ArticlePubMedPubMed CentralGoogle Scholar
 Cantone I, Marucci L, Iorio F, Ricci M, Belcastro V, Bansal M, Santini S, di Bernardo M, di Bernardo D, Cosma M: A Yeast Synthetic Network for In Vivo Assessment of ReverseEngineering and Modeling Approaches. Cell 2009, 137: 172–181. 10.1016/j.cell.2009.01.055View ArticlePubMedGoogle Scholar
 Nasmyth K: Control of the yeast cell cycle by the Cdc28 protein kinase. Current Opinion in Cell Biology 1993, 5(2):166–179. 10.1016/09550674(93)90099CView ArticlePubMedGoogle Scholar
 Cross FR: Starting the cell cycle: what's the point? Current Opinion in Cell Biology 1995, 7(6):790–797. 10.1016/09550674(95)80062XView ArticlePubMedGoogle Scholar
 Chun K, Goebl M: Mutational analysis of Cak1p, an essential protein kinase that regulates cell cycle progression. Molecular and General Genetics MGG 1997, 256(4):365–375. 10.1007/s004380050580View ArticlePubMedGoogle Scholar
 Siegmund RF, Nasmyth KA: The Saccharomyces cerevisiae Startspecific transcription factor Swi4 interacts through the ankyrin repeats with the mitotic Clb2/Cdc28 kinase and through its conserved carboxy terminus with Swi6. Molecular and Cellular Biology 1996, 16(6):2647–2655.View ArticlePubMedPubMed CentralGoogle Scholar
 Sawarynski KE, Kaplun A, Tzivion G, Brush GS: Distinct activities of the related protein kinases Cdk1 and Ime2. Biochimica Et Biophysica Acta 2007, 1773(3):450–456.View ArticlePubMedGoogle Scholar
 Henestrosa ARFD, Ogi T, Aoyagi S, Chafin D, Hayes JJ, Ohmori H, Woodgate R: Identification of additional genes belonging to the LexA regulon in Escherichia coli. Molecular Microbiology 2000, 35(6):1560–1572. 10.1046/j.13652958.2000.01826.xView ArticleGoogle Scholar
 Sutton MD, Smith BT, Godoy VG, Walker GC: The SOS response: recent insights into umuDCdependent mutagenesis and DNA damage tolerance. Annual Review of Genetics 2000, 34: 479–497. 10.1146/annurev.genet.34.1.479View ArticlePubMedGoogle Scholar
 Saito S, Aburatani S, Horimoto K: Network Evaluation from the Consistency of the Graph Structure with the Measured Data. BMC Systems Biology 2008, 2(84):1–14.Google Scholar
 GatViks I, Tanay A, Shamir R: Modeling and Analysis of Heterogeneous Regulation in Biological Network. Lecture Notes in Computer Science 2005, 3318: 98–113.View ArticleGoogle Scholar
 Kanehisa M, Goto S: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acid Res 2000, 28: 27–30. 10.1093/nar/28.1.27View ArticlePubMedPubMed CentralGoogle Scholar
 Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla Favera R, Califano A: Reverse Engineering of Regulatory Networks in Human B Cells. Nature Genetics 2005, 37(4):382–390. 10.1038/ng1532View ArticlePubMedGoogle Scholar
 Cover TM, Thomas JA: Elements of Information Theory. WileyInterscience; 1991. full_textView ArticleGoogle Scholar
 Havard R, H L: Gaussian Markov random fields: theory and applications. CRC Press; 2005.Google Scholar
 Chen X, Fan Y: Estimation of copulabased semiparametric time series models. Journal of Econometrics 2006, 130(2):307–335. 10.1016/j.jeconom.2005.03.004View ArticleGoogle Scholar
 Nelsen RB: An Introduction to Copulas. Springer; 2006.Google Scholar
 Zhao W, Serpedi E, Dougherty ER: Inferring Connectivity of Genetic Regulatory Networks Using InformationTheoretic Criteria. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008, 5(2):262–274. 10.1109/TCBB.2007.1067View ArticlePubMedGoogle Scholar
 Lahiri S: Resampling Methods for Dependent Data (Springer Series in Statistics). Springer; 2003.View ArticleGoogle Scholar
 Beran R, Ducharme G: Asymptotic theory for bootstrap methods in statistics. Centre de Recherches Mathematiques 1991.Google Scholar
 Hall P: Resampling a coverage process. Stochastic Process Applications 1985, 19: 259–269. 10.1016/03044149(85)900286View ArticleGoogle Scholar
 Efron B, Tibshirani R: An introduction to the bootstrap. CRC Press; 1993.View ArticleGoogle Scholar
 Davison AC, Hinkley DV: Bootstrap methods and their application. Cambridge University Press; 1997.View ArticleGoogle Scholar
 Wolfgang Hardle JH, peter Kreiss J: Bootstrap Methods for Time Series. International Statistical Review 2003, 71(2):435–459.View ArticleGoogle Scholar
 Carlstein E: The use of subseries methods for estimating the variance of a general statistic from a stationary time series. Annals of Statistics 1985, 14: 1171–1179. 10.1214/aos/1176350057View ArticleGoogle Scholar
 Kunsch HR: The Jackknife and the Bootstrap for General Stationary Observations. The Annals of Statistics 1989, 17(3):1217–1241. 10.1214/aos/1176347265View ArticleGoogle Scholar
 Politis D, Romano J: The stationary bootstrap. Journal of the American Statistical Association 2002, 89: 1303–1313. 10.2307/2290993View ArticleGoogle Scholar
 Lucy D, Aykroyd RG, Pollard AM: Nonparametric Calibration for Age Estimation. Journal of the Royal Statistical Society. Series C (Applied Statistics) 2002, 51(2):183–196. [ArticleType: primary_article/Full publication date: 2002/Copyright 2002 Royal Statistical Society] [ArticleType: primary_article/Full publication date: 2002/Copyright 2002 Royal Statistical Society] 10.1111/14679876.00262View ArticleGoogle Scholar
 Wand MP, Jones MC Kernel smoothing. CRC Press; 1995.Google Scholar
 Nemenman I, Shafee F, Bialek W: Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14. Edited by: Dietterich T, Becker S, Ghahramani Z. MIT Press; 2002:471–478.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.