Unraveling gene regulatory networks from timeresolved gene expression data  a measures comparison study
 Sabrina Hempel^{1, 2, 3}Email author,
 Aneta Koseska^{1},
 Zoran Nikoloski^{4, 5} and
 Jürgen Kurths^{2, 3, 6}
DOI: 10.1186/1471210512292
© Hempel et al; licensee BioMed Central Ltd. 2011
Received: 11 March 2011
Accepted: 19 July 2011
Published: 19 July 2011
Abstract
Background
Inferring regulatory interactions between genes from transcriptomics timeresolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy timeresolved readouts of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.
Results
Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as informationtheoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.
Conclusions
Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.
Background
Recent evidence from fullysequenced genomes suggests that organismal complexity arises more from the elaborate regulation of gene expression than from the genome size itself [1]. It is not surprising that determining the interactions between genes, which gives rise to particular system's function and behavior, represents the grand challenge of systems biology [2]. In addition to structural information about the regulatory interactions, a comprehensive understanding of the dynamic behavior of these interactions requires specification of: (1) the type of regulation (i.e., activation or inhibition) [3], (2) kinetics of interactions [4], and (3) the specificity of the interactions with respect to the investigated tissue and/or stress condition [5]. The elucidation of a complete network of regulatory interactions parameterized with kinetic information leading to a particular gene expression is, at present, still a challenging task even for wellstudied model organisms whose networks have been partially assembled either for few selected processes and conditions or at the genomewide level [6–9].
The everincreasing throughput in experimental manipulation of gene activity coupled with the methods for quantitative assessment of transcriptome, proteome, and metabolome have begun to identify the effects of individual transcription factors, binding ligands, and posttranslational modifications on regulated genes [10]. Moreover, such highthroughput transcriptomics data sets can be used to identify gene regulatory modules and entire networks. Understanding the complex network of gene regulatory interactions from a given transcriptome readout necessitates the design, analysis, and testing of networkinference methods (socalled reverse engineering methods). These methods operate on two types of data sets from: (1) static perturbation experiments whose readout is a pseudo steadystate expression level, and (2) timeresolved experiments yielding time series of gene expression.
Transcriptomics time series data hold the promise of identifying the dynamics of the key genes mapped into putative interactions and kinetic laws; consequently, the temporal information must not be neglected by the applied method for reverse engineering of gene regulatory networks. However, despite the decreasing costs of experiments relying on highthroughput technologies, systems biology studies still produce relatively short time series [11], largely due to the problems with gathering a big enough sample material and designing more complex experiments. In addition, timeresolved biological experiments usually involve sampling at irregular rates in order to capture processes spanning different time scales. These two challenges require a careful assessment of the existing methods for network inference from transcriptomics time series data. Moreover, most of the developed methods have been applied directly on real time series data without a prior assessment of their discerning capacity on a difficult synthetic benchmark [12].
The analysis of short time series is affected by the type of employed data representation. For instance, some approaches transform the discrete time series into continuous representations by different fitting methods; in addition, few studies have already considered transforming real valued time series into dataadaptive representations, including: symbols, strings, and trees (for a review, see [13]). Therefore, the extent to which a chosen data representation may affect the accuracy of the inferred networks should also be examined when assessing the strengths and weaknesses of different reverse engineering methods.
The simplest approach for network inference from time series data relies on applying similarity measures [12, 14–27]. Methods borrowed from Bayesian inference [28–32], regression analysis [33], and econometrics models (e.g., Granger causality [34–37]) have also been applied in this context. Although there are already two valuable reviews of methods for gene regulatory network (GRN) reconstruction from transcriptomics time series data [11, 12], we believe that there is a need for a careful assessment of the existing reverse engineering methods based on similarity measures operating on short time series data.
Measures
MEASURE  "SIMPLE" (PAIRWISE)  CONDITIONAL  PARTIAL 

Euclidean distance  μ_{ EC }, [63]     
L^{ s } Norm (here s = 10)  μ_{ L }, [64] (in literature s = 3)     
Manhattan distance  μ_{ MA }, [64]     
dynamic time warping distance  μ_{ W }, [43]     
Pearson's correlation  μ_{ P }, [65]  , [22] (*)  , [23] 
Spearman's correlation  μ_{ S }, [66]     
Kendall's correlation  μ _{ K }, [67]     
mutual information  μ_{ I }, [66]  , [25]  (new) 
coarsegrained information rate  , [52] (*)  , [52] (*)   
Granger causality index  μ_{ G }, [68]  , [34]  , [34] 
symbol sequence similarity  , [38] (*)     
mutual information of symbol sequence  , [38] (*)     
mean of symbol sequence similarity and ^{ μ }I  , [38] (*)     
conditional entropy of symbol sequence    , [38] (*)   
We study the performance of the relevance network algorithm for GRN reconstruction, applied to synthetic gene expression data sets, and compare the capability of different combinations of 21 measures and 6 scoring schemes to detect/predict true and eliminate false links. A description of the data sets and the general definitions of the methods used in this study are given in detail in the Methods section.
Our contributions include: (1) an extensive systematic review and a comparison study which could serve as a basis for selecting a reverse engineering method based on a combination of a similarity measure and a scoring scheme suitable for a given expression data; in this context, we investigate not only the pairwise similarity measures, but also, where applicable, their respective conditional and partial variants; (2) introduction of approaches that are novel or borrowed from other fields, but have not yet been encountered in the field of network reconstruction; and (3) definition of a novel informationtheoretic measure, the residual mutual information, and evaluation of its performance in unraveling gene regulatory interactions.
Results
We investigate the performance of the relevance network algorithm applied to gene expression time series from a network of 100 genes in E. coli under optimal sampling conditions (noisefree, with an uniform sampling in time). Interpolation has not been applied to the time series at this point. We show and discuss the receiver operating characteristic (ROC) curves deduced for the basic algorithm with identity (ID) scoring scheme in combination with all measures included in our investigation, and for the five additional scoring schemes  CLR, ARACNE, MRNET, TS and AWE  with selected measures. Subsequently, since optimal sampling conditions are never achieved, the performance of the similarity measures and scoring schemes is additionally investigated on noisy data. The role of sampling and interpolation on the performance is discussed as well. Moreover, the influences of network properties (e.g., size and degree) are shown using two additional networks, a yeast network composed of 100 genes from S.cerevisiae and a network of 200 genes from E. coli.
Identity scoring scheme
The basic relevance network algorithm (scoring with unit matrix) is used to compare the performance of all measure's classes.
Measures operating on vectors
Additional file 1, Figure S1 shows the efficiency of the reconstruction of links based on classical distance measures and the dynamic time warping. In general, none of these measures is able to avoid false positives on a larger scale without loosing most of the true interactions. On the other hand, the ROC curves are rather flat for high false positive rates, which implies that these measures could be useful initially to determine connections which are not present in the network. All of the curves shown in Additional file 1, Figure S1 are smooth, meaning that the prediction of links is not very sensitive to the explicit choice of the threshold. From this analysis, we can discriminate that the L^{ s } norm (with s = 10, equating the length of the time series) performs best in reconstructing the network. These results outperform the Euclidean (L^{2} norm) and the Manhattan (L^{1} norm) distance, which can be explained by the fact that the L^{ s } weights large distances more heavily. The dynamic time warping fails for the investigated data, which is most likely a result of the coarse sampling and the complexity of the network.
Measures operating on random variables
Furthermore, the ID scoring scheme is evaluated using several measures which employ time series represented via random variables. In particular, we examine in detail the performance of correlation and informationtheoretic measures.
Even if a basic significance test is included  for example the data is reshuffled 100 times, then the measures for the randomized series are calculated, and the results are compared to those obtained from the original time series  the results do not change significantly (Additional file 1, Figure S2). The partial Pearson correlation, on the other hand, shows better results for low false positive rates, but looses its accuracy when high true positive rates are reached. Additionally, the results obtained from the PPC are less significant (in terms of the reshuffled time series). Removing links which have no significant values of the correlation leads to an almost random prediction from the partial Pearson correlation.
As we cannot infer selfregulation by analyzing the similarity of expression series, the diagonal of the correlation matrix was set to zero in our computations above (by definition it is one). Comparing the reconstruction efficiency of the linear PC with that of the rank correlations (diagonal equals to one), we observe that the ROC curve shown in Figure 2(b) is smoother for the Pearson correlation than the curves obtained from the rank correlations. Hence Pearson's correlation measure is less sensitive to the choice of the threshold, whereas the rank correlations can achieve a slightly better overall performance.
Next, we investigate the efficiency of the ID scoring scheme considering informationtheoretic measures. In general, we observe that the resulting reconstruction strongly depends on the method chosen for the estimation of entropies. Here we present the results obtained using the Rpackage "infotheo" (in particular the MillerMadow asymptotic bias corrected empirical estimator) since, for short time series, it yields better estimates of the entropy than the Rpackage "entropy". Besides the basic pairwise mutual information (MI), we also investigated the conditional mutual information (CMI) and the residual mutual information (RMI) in order to reduce the number of false positive links. All these measures result in ROC curves which are more or less discontinuous. This is a finite size effect, as the time series are very short, and thus the estimation of the MI (entropies) becomes problematic.
We find a quite different behavior of the ROC curves, as shown in Figure 2(c), in specific regions of the ROC space. The simple mutual information results in a flat and comparatively smooth ROC curve for high false positive rates. This means that the measure allows removing about 60% of the false positives, by loosing approximately 10% of the true links. An even better performance in the same ROC space region can be achieved using the residual mutual information, which we proposed as a partial mutual information measure to distinguish indirect from direct (linear) relationships between triplets of genes. In contrast to this, the conditional MI results in a more discontinuous curve for high fpr: here, the ratio of the true and false positive rate is nearly the same as observed for a random prediction. In principle, the CMI is stricter in removing indirect links as it also can detect nonlinear interactions. However, the conditional probabilities cannot be estimated sufficiently well from 10 time points. Hence, the conditional MI fails for (the investigated) short data sets in the region of high false positive rates.
Additionally, when looking at the region of low fpr, we observe that the ROC curve of the simple MI becomes more discontinuous than for the high fpr. The true positive rate decreases significantly for slightly reduced threshold values, in the region around 30% and 15% of the false positives. This is manifested as jumps in the curve due to which this measure is rather sensitive to the choice of the threshold if low false positive rates are to be achieved. In contrast to this, the residual mutual information results in a smoother curve for low false positive rates than the simple MI, indicating that the measure is less sensitive to the choice of threshold, although the curve exhibits smaller jumps as well. In the region of fpr < 10% the performance of the RMI decreases slightly compared to the simple measure. The conditional mutual information on the other hand, achieves only very low false positive rates, which also lead to low true positive rates (up to about 5%). Tuning the threshold to allow for slightly higher values of the fpr the ROC curve of the CMI immediately jumps to 50% of false positives. Hence, the region between about 3% and 50% of false positive links is not achievable using the considered conditional measure.
We also implemented a basic significance test for the mutual information measures by reshuffling the time series 100 times, calculating the measure for the randomized series, and comparing the results to those obtained for the original time series. The associated ROC curves are shown in Additional file 1, Figure S3. With respect to the significance, the reconstruction efficiency of the simple and, in particular, the residual mutual information decreases, since the inferred degree of interaction for most of the gene pairs is not significant in the specified sense. In contrast to this, with the significance test, the quality of the prediction obtained from the CMI increases slightly, but its overall performance is still deficient.
That evaluation leads to the conclusion that (from the MI measures) only the simple and the residual mutual information can provide a sufficient reconstruction efficiency using the IDentity scoring scheme. This holds true only in the case that we do not rely on the simple significance test.
Investigating the performance of the coarsegrained measures on the short gene expression time series, we obtained ROC curves which look almost the same as expected for a complete random linking in the network, as illustrated in Additional file 1, Figure S4. Even though the coarsegrained measures are in principle promising for the inference of interdependency from time series of intermediate length, they are not applicable in our case. The reason for this is the limited number of available time points which makes not only the estimation of the MI, but also the identification of a proper time lag a very challenging task. Interpreting the CCIR as a distance, and not as a similarity measure, leads to an increase of the inferred true positives. However, the predictive power of the measure remains very low.
Modelbased measures
The evaluation of the ID scoring scheme using modelbased measures (Granger causality in this case) leads to an almost random prediction of links (the associated ROC curves are shown in Additional file 1, Figure S5). Hence, the Granger causality (GC) measure is not suitable for the reconstruction of GRN, when only very short gene expression time series are available. This is due to the fact that the results of the GC index depend strongly on the model estimation. An AR model has to be estimated for the given data set, whose order is determined based on the Akaike information criterion. However, this seems to be insufficient, since the AIC usually requires a higher order model (due to the high variability of the data), whereas the expression time series are in general very short.
Measures operating on symbolic dynamics
Next, we use the principle of order patterns to derive symbol sequences from the time series [38]. As already shown in general nonlinear time series analysis, the symbol based measures show a good overall performance in reverse engineering.
The ROC curves (Figure 2(d)) obtained for these measures are rather smooth and flat for false positive rates larger than 30%, which means that only a small portion of links is lost when reducing the false positive rates down to this value. Consequently, the results are robust to the choice of threshold in this particular region of the ROC space. However, the ROC curves become less smooth for lower values of the false positive rates. This implies that false positive rates smaller than 20% are barely possible to achieve. The best overall performance has been found here for the combination of symbol sequence similarity and mutual information of the symbol sequences (SySimMI), as well as for the mutual information of the symbol sequences (SyMI). The latter outperforms the simple MI of the time series themselves, as the length of the series used to estimate the measure is much longer in the case of the symbolic dynamics. Additionally, the conditional entropy of the symbol vectors obtained from pairs of time points shows results similar to the SySimMI and the SyMI in a wide range of the ROC space.
Symmetric scoring schemes  CLR, ARACNE and MRNET
Next, we evaluate the possibility for reconstruction of the underlying E. coli (sub)network based on the three modifications of the relevance network algorithm (Algorithm 1, given in the Methods section) as implemented in the "minet"package, namely the CLR, the ARCANE and the MRNET. All three algorithms represent extensions of the basic relevance network approach, to the effect that they introduce additional scoring rules for the pairwise weighting of the interactions in order to reduce the amount of links that are falsely detected.
In Additional file 1, Figure S6, we present the results, in terms of the ROC curves, which we obtain using the different scoring schemes and in all cases the default weights of the pairwise interactions, namely, the squared Spearman's correlation for every set of pairs. As the algorithms implemented in the "minet" are designed to reduce the number of false positives, high false positive rates (of more than about 50%) do not occur here, unless all interactions are set as links. Moreover, the MRNET and the CLR result in ROC curves which are not smooth, meaning that their capability to reconstruct particular links is limited and strongly dependent on a proper choice of the threshold τ. The ARACNE, on the other hand, is restricted to an almost fixed fprtpr value.
Asymmetric scoring schemes  TS and AWE
As none of the previously described scoring schemes is able to indicate directionality from symmetric measures, we include in this study, and to our knowledge for the first time in GRN reconstruction, an evaluation of the performance of the Time Shift (TS) as a symmetrybreaking scoring scheme. We show the results of this modification of the relevance network algorithm removing the links which are falsely detected by the CLR (measure: μ_{ ρ } ) or the AWE (measure: ). However, unraveling the directionality of interaction (between pairs of genes) using the correlation of the delayed time series has shown to decrease the maximal achievable true positive rates.
The slope of the ROC curves (shown in Figure 2(e)) indeed does not change much in comparison to the results of the CLR and the AWE scoring scheme. Moreover, if combined with the Pearson correlation of the delayed time series, the ROC curve obtained from the CLR becomes considerably smoother (compared to the curve shown in Additional file 1, Figure S6) and hence, the prediction is less sensitive to the choice of a threshold. The same does not hold true for the ROC curve obtained from the application of the TS scoring scheme in addition to the AWE. Instead, this curve becomes flatter and is slightly shifted towards lower false positive rates in comparison to the corresponding curve in Figure 2(f). This implies that, while for low fpr the curve looks basicly the same, in the intermediate range of the ROC space (fpr about 0.15 to 0.45) similar tpr values can be obtained for lower fpr. However, in the range of high fpr, the maximal achievable tpr value is lower. Hence, true positive rates of approximately 80% can be achieved with lower costs, as the according number of false positives is in general smaller when the TS scoring scheme is used. On the other hand, as already mentioned, the quality of the link detection becomes worse for higher false positive rates (more than about 40%) compared to the corresponding results of the AWE itself. The true positive rate in the ROC curve in Figure 2(e) is almost constant in this region of the ROC space.
Similar to the TS, the AWE scoring scheme aims at breaking symmetries and thus allows extraction of information about the directionality of interaction from symmetric measures. However, a detailed comparison of the reconstruction efficiency of the AWE using different symbolic dynamics measures shows that in contrast to the TS scoring scheme, AWE does not decrease the maximal achievable true positive rates. Instead, the ROC curves, as shown in Figure 2(f), become more flat for high false positive rates compared to the curves obtained for the basic algorithm with the ID scoring scheme using symbolic dynamics (Figure 2(d)). Hence true positive rates of more than 80% are achievable by the AWE algorithm with much lower costs than with the ID scoring scheme. On the other hand, the ROC curves obtained from AWE are more steep for low false positive rates. This implies that here true positive rates up to approximately 45% can be achieved with false positive rates of less than 10%. Furthermore, the curves shown in Figure 2(f) are much smoother in comparison to those in Figure 2(d), indicating that the reconstruction is less sensitive to the choice of a particular threshold.
Influence of noise
In general, noisefree expression measurements cannot be achieved in real experiments: In fact, intermediate and high noise level are not rare. Thus, in order to account for stochasticity in the time series, and particularly to establish the robustness of the ranking of the investigated similarity measures, we additionally evaluate the ROC curves for noise intensity of 0.3.
The reconstruction efficiency for the symbol based measures decreases significantly as well, which holds true in particular for the mutual information of symbol sequences (as noise affects the inference of a symbol sequence using order pattern, as well as the binning process for MI calculation). However, apart from that, the ROC curves obtained for the symbol based measures are more continuous for noisy data than those in the noisefree case, which implies that the reconstruction process in this case is less influenced by the choice of a particular threshold.
A similar behavior is observed for the rank correlation coefficients. However, the shape of the curves appears more robust under the influence of noise than it is the case for the symbol based measures. Hence, the rank based measures represent the most suitable similarity measures to study the interrelation among short time series at high noise levels.
Finally, we observe that the CLR and the AWE are the most robust scoring schemes with respect to noise, whereas ARACNE fails for short and noisy time series.
A detailed analysis on the reconstruction efficiency of the topranking measures and scoring schemes under various stochastic conditions is considered in the Discussion section. Additionally, the performance as a function of the length of the time series and the noise intensity can be found in the Additional files.
The role of interpolation and sampling
Due to the fact that timeresolved gene expression data are usually quite coarsely sampled, general assumptions upon what happens between two time points cannot be made. This problem becomes obvious when unequally sampled data are used (Additional file 1, Figure S8).
Although the interpolation at the beginning of the time series (where the time points are rather close) seems to be sufficient, it does not accurately capture the dynamics of the expression time series when the distance between the time points becomes larger. Hence, by interpolating the gene expression data sets, artifacts are introduced, which will be further reflected in the results of the particular measures of interdependency. In order to avoid these artifacts, we renounce the interpolation in this comparison study, even though this leads to less significant results for almost all measures, as they operate far below the limit of their theoretically defined preconditions. However, we have observed that the overall results (ROC analysis) are typically equal or even better when interpolation is not included, especially when nonuniformly sampled time series are considered. Additional file 1, Figure S9 illustrates this effect exemplary for the simple mutual information.
However, some measures, such as the Granger causality used in this study, as well as several scoring schemes (e.g., the Time Shift), are explicitly time dependent. Hence, they require uniformly sampled data, meaning that an interpolation is needed if only nonuniformly sampled data is available. This is in general, the case in GRN reconstruction.
However, most of the well performing reconstruction tools in our study are not explicitly timedependent, which means they do not require a specific time sampling. This implicates that they are not very sensitive concerning the spacing on the time axis. Our results, as shown in Additional file 1, Figure S10, illustrate that a nonuniform sampling for these tools can even improve the quality of the reconstruction, since a larger period of the dynamics is captured.
The role of the network topology
Discussion
The observation of the ROC curves does not always allow conclusions on the overall performance of a measure or scoring scheme. Therefore, we further compare the described modifications of the relevance network algorithm in terms of ROC statistics (we focus on the algorithm as illustrated in Algorithm 1, without consideration of additional statistical significance due to the lack of a suitable null model). As an example, we evaluate the ROC statistics from time series with uniform sampling (without performing an interpolation) for the network of 100 genes of E. coli.
ROC statistics for noisefree data
The minet algorithm
PARAMETER (MINET)  AUC(ROC)  YOUDEN  AUC(PvsR) 

clr, mi.empirical, equalfreq  0.80  0.54  0.05 
clr, mi.empirical, equalwidth  0.76  0.45  0.04 
clr, mi.mm, equalfreq  0.80  0.54  0.05 
clr, mi.mm, equalwidth  0.76  0.48  0.04 
clr, mi.shrink, equalfreq  0.80  0.53  0.05 
clr, mi.shrink, equalwidth  0.74  0.41  0.04 
clr, mi.sg, equalfreq  0.80  0.54  0.05 
clr, mi.sg, equalwidth  0.74  0.42  0.04 
clr, pearson, none  0.78  0.49  0.05 
clr, spearman, none  0.80  0.53  0.05 
clr, kendall, none  0.80  0.53  0.05 
mrnet, mi.empirical, equalfreq  0.82  0.59  0.04 
mrnet, mi.empirical, equalwidth  0.76  0.47  0.05 
mrnet, mi.mm, equalfreq  0.81  0.57  0.04 
mrnet, mi.mm, equalwidth  0.77  0.46  0.05 
mrnet, mi.shrink, equalfreq  0.81  0.57  0.04 
mrnet, mi.shrink, equalwidth  0.73  0.39  0.04 
mrnet, mi.sg, equalfreq  0.81  0.57  0.04 
mrnet, mi.sg, equalwidth  0.77  0.47  0.06 
mrnet, pearson, none  0.78  0.49  0.04 
mrnet, spearman, none  0.82  0.58  0.03 
mrnet, kendall, none  0.81  0.56  0.03 
aracne, mi.empirical, equalfreq  0.76  0.52  0.01 
aracne, mi.empirical, equalwidth  0.54  0.12  0.02 
aracne, mi.mm, equalfreq  0.76  0.52  0.01 
aracne, mi.mm, equalwidth  0.54  0.12  0.02 
aracne, mi.shrink, equalfreq  0.76  0.52  0.01 
aracne, mi.shrink, equalwidth  0.55  0.14  0.02 
aracne, mi.sg, equalfreq  0.76  0.52  0.01 
aracne, mi.sg, equalwidth  0.54  0.12  0.02 
aracne, pearson, none  0.54  0.07  0.03 
aracne, spearman, none  0.76  0.52  0.01 
aracne, kendall, none  0.76  0.52  0.01 

well performing for short expression data sets (evaluated on the synthetic data in this case) if:
AUC(ROC) > 0.8,
YOUDEN > 0.5 and
AUC(PvsR) > 0.05,

sufficiently performing if
0.8 >AUC(ROC) > 0.7,
0.5 >YOUDEN > 0.4 and
0.05 >AUC(PvsR) > 0.03

and deficient otherwise.
The modifications of the relevance network algorithm in the "minet" package having best performance in the reconstruction of GRN from short data sets, are the CLR and the MRNET ("minet" is based on Spearman's correlation in this case). Here the AUC(ROC) indicates almost no change compared to the basic algorithm with identity scoring (measure: Spearman's correlation), while the Y OUDEN index decreases for the CLR and increases for the MRNET. However, the opposite is true for the AUC(PvsR). The overall performance of the CLR (in terms of the considered summary statistics) is slightly better than those of the MRNET (CLR scoring scheme was used to set the benchmarks). Moreover, the measures combined with the TS scoring scheme perform sufficiently well. However, the summary statistics do not change much compared to the results obtained for the same measures using the ID. In contrast, the asymmetric weighting yields a significant increase among all the summary statistics compared to the performance of the same measures using only the identity scoring scheme.
 1.AWE + TS (scoring by μ_{ S } ),
 2.AWE + TS (scoring by μ_{ P } ),
 3.AWE,
 4.AWE,
 5.AWE and
 6.
μ _{ S } CLR
The asymmetric weighting (AWE) significantly improves the prediction at this point, since it breaks the symmetry of a particular measure based on topological consideration and, therefore, reduces the number of false positive links. Hence the AWE (measure: ) clearly shows the best performance when short time series are considered (the results become slightly better if Time Shift is applied in addition).
ROC statistics for noisy data
Under the influence of noise, the quality of the results of the symbol based measures (in particular ) decreases. As noise strongly influences the process of symbol assigning, it can principally enhance or distort the information content. The direction of the influence is not predictable a priori, but in the presence of strong noise, symbols are no longer reliable (if no additional information on the influence of the noise is provided). On the other hand, measures operating on random variables are rather robust against noise (the best results in these cases have been achieved using rank correlations).
The ARACNE has proven to be very sensitive with respect to noise. In contrast to this, the asymmetric weighting (compared to the results of the ID using the same measure) still performs well within the given limits, as it is only based on topological considerations, and it is not influenced by the presence of noise. Furthermore, to investigate how noise influences the reconstruction efficiency, we calculated the area under the ROC curve and the YOUDEN index as a function of the noise intensity for the 5 combinations of similarity measures and scoring schemes which performed best in the noisefree case, namely the symbolic measures and the asymmetric scoring schemes, mentioned in the previous section. Additionally, we compared the results to those obtained for time series of different lengths (i.e., 8 and 20 time points). We conclude that for short time series, the capability of the measures and scoring schemes to detect true and at the same time eliminate false positive links depends both on the number of time points and the noise intensity (Additional file 1, Figure S13). However, this dependence is small compared to the differences in the reconstruction efficiency between the various measures. Moreover, the sensitivity against noise is reduced with increased length of the time series (which corresponds to the usage of order pattern of higher dimension). In general, we observe a decrease in the reconstruction efficiency if the noise levels increase or the length of the time series decreases. For the short time series used in this study, however, these dependencies are not monotone.
Conclusions
By performing an extensive comparison analysis of the reconstruction efficiency of the relevance network algorithm using 6 scoring schemes and 21 different measures, we showed that with a suitable choice of a measure and a scoring scheme, this approach is applicable to short time series to gain knowledge about the underlying gene regulatory networks which differ in various properties. However, most of the currently used measures have highly limited capabilities, as the number of time points of the gene expression data is usually not sufficient to infer the underlying structure of the network. This in turn make the distinction between direct and indirect interactions an even more challenging task.
This study could serve as a basis for the selection of a reverse engineering method for network reconstruction, based on the combination of a similarity measure and a scoring scheme suitable for given data. Our results showed that rank and symbol based measures (which we applied for the first time for GRN reconstruction) have a significantly better performance in inferring interdependencies, whereas most of the standard measures (such as Granger causality and several informationtheoretic measures) fail when short time series are considered. The residual mutual information, which we proposed in this work as a partial mutual information measure, increased the reconstruction efficiency of the relevance network algorithm compared to simple and, in particular, conditional mutual information.
Nevertheless, from the analysis presented here, we conclude that it is necessary to move further from the standard similarity measures based on the time series directly, towards measures rooted in the study of symbolic dynamics or ranks deduced from the time series, in order to increase the efficiency of the relevance network algorithm for GRN reconstruction. Although measures based on symbolic dynamics performed significantly well in the noisefree case, their performance was decreased as the noise level in the system increased, and for high noise intensities it became comparable to that of mutual information. This implied that in the presence of strong noise, rank correlations (in particular Spearman's rank correlation) are most efficient tools for GRN reconstruction, since their performance was not significantly affected as the noise level increased. Additionally, we note that the results obtained for RMI, rank correlations and symbol based measures are robust with respect to the network topology. We also showed that an unequal sampling of the data in general does not pose additional problems if measures are considered where interpolation is not essential (such as the topranked measures in this study).
We point once again that all rank and symbol based measures described here are symmetric. This means that the directionality of the interactions cannot be inferred, unless a symmetrybreaking scoring scheme is considered in addition. In that direction, we showed that a novel scoring scheme, the asymmetric weighting (AWE), which we proposed in this work stands as a valuable approach to overcome the problems of introducing directionality in the reconstruction of the regulatory networks.
It would be interesting to compare in future the observed reconstruction efficiency of the relevance network approach to that of other reverse engineering methods, such as the Bayesian network approach.
Methods
Our work focuses on methods for reverse engineering which operate on timeresolved gene expression experiments (in terms of mRNA concentrations). We define a time series profile for a gene measured over n time points as a sequence of expression values x =<x_{1} , ..., x_{ n } >, where each x_{ i } , 1 ≤ i ≤ n, corresponds to a distinct time point t_{ i } . In addition, let each of the m genes be represented by r timeresolved replicates over n time points. Here, we use the mean of r = 6 replicates, resulting in an m × n data matrix M. Let M_{ i } , 1 ≤ i ≤ m, denote the i^{ th } row of a matrix M, which corresponds to the timeresolved expression profile of the i^{ th } gene.
The general reverse engineering method based on a particular similarity measure μ and a scoring scheme F operating on the data matrix M is given in Algorithm 1.
Input:
M, matrix with m rows (genes) and n columns (time points),
μ, similarity measure,
F, scoring scheme
Output:
m × m adjacency matrix, A, of the reconstructed network G
1 foreach gene i, i ∈ {1,..., m} do
2 foreach gene j, j ∈{1,..., m}, j ≠ i do
3 w_{ ij } ←μ(M_{ i } , M_{ j } ) 
4 end
5 end
6 C : c_{ ij } ← w_{ ij } · f_{ ij } ;
7 chose a threshold τ ;
8 a_{ ij } ← 1 if c_{ ij } >τ;
9 a_{ ij } ← 0 if c_{ ij } ≤ τ;
Algorithm 1: General reverse engineering method based on a similarity measure μ and a scoring scheme F. f_{ ij } ∈ F, c_{ ij } ∈ C, a_{ ij } ∈ A, and w_{ ij } ∈ W, where W is the matrix obtained by applying μ on all pairs of rows of the given data matrix M.
The evaluation of the scoring schemes and measures is generally performed in R[40] using available packages, as noted in the manuscript. Additionally, several C routines were developed in order to improve computational speed.
In what follows, we describe the procedure for generating the synthetic time series data, and present the definitions of the similarity measures and scoring schemes used in the comparative analysis. Furthermore, the details of the ROC analysis are briefly reviewed.
Synthetic data sets
The evaluation of the existing methods for reverse engineering gene regulatory networks often employs real timeresolved expression data. However, these data include the convoluted effects of regulons (genes under regulation by the same regulatory protein) and stimulons (genes under regulation by the same external influence), which renders it challenging to realistically assess the performance of investigated methods. Moreover, not every regulatory subnetwork leads to expression of the participating genes over the measured time period and particular condition of interest. These facts lead to a lack of control when using transcriptomics time series data sets for network inference.
Following the example of the DREAM challenge [2, 14], in this comparison study we use synthetically generated data sets to overcome the described disadvantages. The usage of these synthetic data, in contrast to real measurements, enables us to directly compare the performance of different reconstruction tools, since the topology and dynamic of the underlying network is known a priori. In particular, we use the synthetic network generator, SynTReN [41, 42], which creates synthetic transcriptional regulatory networks by providing the edges of the network, as well as information about the interaction (activating, repressing, or dual). Additionally, the generator produces simulated gene expression data of the associated mRNA concentrations for each gene, based on MichaelisMenten and Hill kinetics, which approximates experimental expression measurements. These expression data sets are uniformly sampled in time.
In SynTReN, the levels for three types of noise are user definable: (1) biological noise, corresponding to biological variability given by the stochastic variations in gene expression, (2) experimental noise, corresponding to the technical variability, and (3) noise on correlated inputs, which accounts for the influence of several activated genes on a regulated gene. Note that different noise levels are included in this study, but we make no distinction between the strength of the three noise types.
Interpolation
Since transcriptomics time series data usually consist of expression at a few, possibly nonuniformly sampled time points, data interpolation is often the first preprocessing step. Different techniques, including linear [43] and nonlinear interpolation methods, such as cubic [33] and bsplines [25] have been applied [11]. Although these methods extend the available data, they also introduce artefacts due to overfitting. For this reasons, we preclude from interpolating the synthetic time series data (whenever possible) for ranking of the used similarity measures. Nevertheless, for reason of comprehensiveness, we consider the effect of interpolation using cubic splines on the investigated similarity measures and scoring schemes of highest rank in absence of noise.
Similarity measures
Reverse engineering of regulatory networks relies on the inference of interrelationships among genes, based on similarity measures. In general, given two time series x and y over n time points, a similarity measure is given by the mapping μ : R^{ n } × R^{ n } → I, where I, I ⊆ R. The sodefined pairwise similarity measure detects (non)linear relationships between two variables, in the considered case, between two gene expression time series. The definition allows for the measure to be symmetric, which is not commonly the case for gene regulatory interactions. Moreover, if two genes are linked indirectly via a third gene, the pairwise measure can not distinguish direct from the indirect relationships and hence additional false positive links will be introduced in the network reconstruction.
Nevertheless, the definition of the similarity measure can be extended to conditional and partial measures, incorporating the possibility to exclude the influence of a third gene. Conditional similarity measures are more general, since they do not rely on specific assumptions on the probability distribution (deduced from the time series associated with a discrete random variable), but estimate the distribution which in turn impedes the computation of the measure from short time series. On the other hand, partial measures can indicated conditional independence reliable only for multivariate Gaussian variables.
To be able to discern the direction of a putative interaction and hopefully eliminate any spurious effects, we consider the conditional and partial variants for several of the measures detailed below. We term the basic pairwise measures as simple, in comparison to their conditional and partial variants. An overview on the measures included in this study is given in Table 1.
Measures operating on vectors
Some of the standard measures used for determining gene regulatory interactions are based on the calculation of the distance between expression time series regarded as vectors. In the following, x and y will denote the vectors <x_{1}, ..., x_{ n } > and <y_{1}, ..., y_{ n } >, respectively. Our study includes:
L ^{ s } norm:
In our study, s = 10, which corresponds to the number of available time points.
Euclidean distance
Manhattan distance
Dynamic time warping (DTW)

symmetric1(4)

symmetric2(5)

and asymmetric(6)
to find an optimal alignment. Here μ_{ EC } denotes the local (Euclidean) distance, and the measure μ_{ W } the cumulative distance (representing the minimum sum of local distances along the alignment paths).
Measures operating on random variables
Despite the representation of the expression time series as vectors, time series x = <x_{1}, ..., x_{ n } > can be associated with a discrete random variable X with probability distribution p(x), x ∈ X that is approximated by the frequency via standard binning arguments. This representation allows to calculate several widely used similarity measures, such as correlation and informationtheoretic measures. Note that the temporal information is lost by this representation of time series data. We first review the most commonly used measures in detail:
Pearson correlation
If the variables are independent, the correlation coefficient is μ_{ p } = 0, but the opposite is not true, as this coefficient is sensitive mainly to linear dependencies. Note that μ_{ P } receives values in the interval [1, 1] and is symmetric.
Conditional Pearson correlation (CPC)
Partial Pearson correlation
Rank correlations can be used as a more general measure of interdependence, not restricted to a linear relationship. Even though they measure a different type of relationship than the product moment correlation coefficient, like the previous correlation measures, these are also defined in the interval [1, 1].
Spearman's rank correlation
where R(x) is the rank of x.
Kendall's rank correlation
with n_{ c } being the number of concordant pairs, and n_{ d } the number of discordant pairs of the rank sets.
It is common to regard the rank correlation coefficients (especially Spearman's rank correlation) as alternatives to Pearson's coefficient, since they could either reduce the amount of calculation or make the coefficient less sensitive to nonnormality of distributions. Nevertheless, they quantify different types of association.
Unlike most of the measures discussed here, the correlation measures do not only provide an information about whether two genes are interacting, but also whether it is an activating or repressing relationship. As the latter information is outside of the interest of the current study, only the absolute value (respectively the square) of the correlation coefficient is taken into account.
Additionally to the correlation, informationtheoretic measures are also defined using random variables as relevant representation for expression time series.
Mutual information
It includes also nonlinear interrelations, but same as the other simple measures, the simple MI cannot be used to distinguish between direct and indirect relations.
Conditional mutual information
The degree of interaction is indicated by the values of μ_{ I } or normalized by the largest value occurring among all pairs of genes.
Mutual/conditional coarsegrained information rate
Finally, a normalization by the largest value occurring among all pairs of genes is performed. These (normalized) coarsegrained information rates are then used to indicate the degree of interaction.
Modelbased measures
Granger causality
where u is the number of parameters in the statistical model, and L is the maximized value of the likelihood function for the estimated model.
With properly selected AR models, the part of the variance in the data which is explained by one model in comparison to the other one, provides an information on the causal relationship. This comparison can be formulated in terms of an index.
where a_{11i}, a_{21i}and a_{22i}are the parameters of the models and u_{1t}, respectively u_{2t}represents white noise.
Conditional/partial Granger causality
for the partial Granger causality, with a_{11i}, a_{12i}, a_{21i}, a_{22i}, a_{23i}, a_{31i}, a_{32i}, a_{41i}, a_{42i}and a_{43i}being the parameter of the models and u_{1t}, u_{2t}, u_{3t}and u_{4t}representing noise terms.
The degree of interaction is indicated by the Granger causality index normalized by the largest value occurring among all pairs of genes.
Measures operating on symbolic dynamics
where l is the time lag. In terms of gene regulatory network reconstruction, we need to choose a certain number of time points and rank them according to their expression value in order to obtain the order pattern. Then, each possible ordering corresponds to a predefined symbol.
is the length of the symbol sequence.
In this work, we usually choose the dimension δ such that the length T becomes maximal, given a time series of length n (i.e., δ = 5 for n = 10). However, if the symbol sequences are calculated for longer time series, this approach is not applicable anymore. This is due to the fact that calculations in R are not possible because of the fast growing length of the symbol vectors. Hence, we use for the time series of 20 time points order pattern of dimension 6 instead of the dimension 10, which would lead to the symbol vector of maximal length.
Symbol sequence similarity
to define the symbol sequence similarity (SySim).
Mutual information of symbol vectors
This is further extended to include symbolic dynamics based on a slope comparison (order patterns for pairs of time points), where we consider

the symbol sequence similarity for pairs ( pairs) as a similarity measure

and the conditional entropies for pairs ( pairs) as a distance measure, with .
A novel measure  the residual mutual information
analogously to the idea of partial correlation (the residuals are calculated in the same way as for the partial correlation in Eq. (16)). The degree of interaction in the complex network is then indicated by the values of , normalized by the largest value occurring among all pairs of genes.
Applied to short data sets, we expect that the residual mutual information performs much better in discriminating indirect links than the conditional MI, as we can abandon the estimation of additional conditional probabilities. Hence, the measure is more robust to effects of small sample size. Furthermore, we expect that the RMI's performance ranges between those of the simple and the conditional mutual information for long time series, since in contrast to the CMI, we eliminate here only the linear influence of the variable Z on X and Y. We postpone the confirmation of this claim for further theoretic analysis.
Scoring schemes
Once a chosen similarity measure has been applied on a given data matrix, there are several possibilities to score the resulting "weights" of putative interactions. In this sense, a scoring scheme F is a matrix of dimensions m × m. Let W denote the matrix obtained by applying μ on all pairs of rows of a given data matrix M, for w_{ ij } ≥ 0, ∀i, j. The scores from a given scoring scheme and similarity measure can then be represented by a matrix C calculated from the Hadamard elementwise product of W and F, such that c_{ ij } = w_{ ij } · f_{ ij } .
To unravel the linkage of genes we apply the relevance network algorithm (Algorithm 1) using different scoring schemes and measures. In principle, all of the measures can be combined with any scoring schemes, but we restrict our investigations to the most commonly used.
IDentity (ID)
The identity scoring scheme corresponds to the basic relevance network [26] approach: Given a specific measure, a particular threshold τ is set in order to account for "true" links between elements in the network. The matrix F is the unit matrix (f_{ ij } = 1) in this case. Therefore, for a symmetric similarity measure, the identity scoring scheme cannot infer directionality of interactions. We test the performance of all measures mentioned above in combination with this scoring scheme.
Context Likelihood of Relatedness (CLR)
where and σ_{ i } (σ_{ j } ) are the mean and standard deviation of the empirical distribution of w_{ ik } (w_{ jk } ), k = 1, ..., m. The links having c_{ ij } <τ (with c_{ ij } = w_{ ij } · f_{ ij } and τ a predefined threshold) are removed for the network reconstruction.
Here, we use the CLR as implemented in the Rpackage "minet"[57, 58], which uses either the simple mutual information or a squared correlation matrix (Pearson's, Spearman's or Kendall's) to measure the strength of interaction among genes. We note that the CLR algorithm cannot infer directionality from symmetric measures.
Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE)
Moreover, the ARACNE removes all edges satisfying c_{ ij } <τ, where τ is a predefined threshold.
As for the CLR algorithm, we rely on the "minet"package, using the simple mutual information or a squared correlation matrix to determine the weights. By default, the two thresholds are set to zero. The ARACNE does not distinguish the direction of a link from a symmetric measure as well.
Maximum Relevance/minimum redundancy NETwork (MRNET)
Finally, all edges whose score c_{ ij } lies below a predefined threshold τ will be removed.
The implementation of the MRNET in the "minet"package, which we used in our study, assigns the weights based on the pairwise simple mutual information or a squared correlation among the time series of two genes (normalized to the largest value occurring among the pairs). Also this algorithm is not able to infer directionality from symmetric measures.
Time Shift (TS)
In case the largest significant value of the measure is obtained for a negative shift, the regulatory direction from the first to the second gene is kept, while the opposite direction is preserved if the largest significant value is obtained for a positive shift. Furthermore, both regulatory directions are kept, if the maximum arises for a shift of zero or multiple opposed shift values or in the case when no significant value exists. The scoring scheme aims at providing a hint on the directionality, because the absolute values of the calculated correlations on the delayed time series are rather biased as the data sets are quite short.
In the next step of Algorithm 1, the information regarding the directionality are combined with the weight of interaction inferred from a particular measure (c_{ ij } = w_{ ij } · f_{ ij } ). Hence, the weights for the unlikely direction are set to zero in order to break symmetries, and thus reduce the number of false positives. Finally, all edges with c_{ ij } <τ are removed, where τ is a particular threshold.
We test that scoring scheme using the absolute value of of the correlation coefficients μ_{ P }(Pearson) and μ_{ S }(Spearman) for pairs of the shifted expression series, where the significance level was set to α = 0.01 and only absolute values of correlation larger 0.9 have been taken into account. The choice of the measure to infer the weights in the first step of Algorithm 1 is independent of that and includes here the mean of sequence similarity and mutual information of symbols, as well as Spearman's and Pearson's correlation. Furthermore, this scoring scheme is applied in addition to (or after) another scoring scheme (e.g., ID, CLR, or AWE). It is important to note that in contrast to the previously described modifications of the algorithm, the scoring scheme we propose here allows to investigate the directionality, also when symmetric measures are considered.
A novel scoring scheme  Asymmetric WEighting (AWE)
Hence, the probability that the j^{ th } gene is regulated sums up to one: . Here, the score indicates that how likely a gene is regulating another one depends not only on the strength of interactions, but also on its amount.
Eventually, if c_{ ij } ≥ τ the edge is introduced, otherwise it is omitted.
We test the asymmetric weighting on the matrix W inferred from the symbolic dynamics measures.
ROC analysis
ROC
true positives  correctly identified true edges  tp 

false positives  spurious edges  fp 
true negatives  correctly identified zero edges  tn 
false negatives  unrecognized true edges  fn 
positives  all true edges  p = tp + fn 
negatives  all zero edges  n = tn + fp 
false positive rate  part of negatives set positive  fpr = fp/n 
true positive rate  part of positives set positive  tpr = tp/p 
false negative rate  part of positives set negative  fnr = fn/p 
true negative rate  part of negatives set negative  tnr = tn/n 
recall (sensitivity)  true positive rate  tpr 
specificity  true negative rate  tnr 
precision  positive predictive value  tp/(tp + fp) 
While discrete classifiers lead to just a single point in the ROC space, classifiers such as the similarity measures studied in this work produce probability values of how likely an instance belongs to a certain class. Here, the classification depends on a predefined threshold. The ROC curve is then produced by continuously tuning this threshold, which on the other hand can be suggestive on the performance of the measures. However, a welldefined rating is not always possible "by eye". Therefore, different summary statistics are common, for example the area under the ROC curve (AUC(ROC)) or the YOUDEN index (YOUDEN = max(tpr  fpr)) [61]. Another standard evaluation plot in the field of the ROC analysis is the precision/recall graph (PvsR), which is based on the comparison between the true edges and the inferred ones. Hence, it highlights the precision of the reconstruction, and does not suffer from the typically large number of false positives in a gene regulatory network reconstruction. We thus give the summary statistic using the area under the precisionrecall curve (AUC(PvsR)) as well as the ROC curve. An efficient implementation of the ROC analysis is provided by the Rpackage "ROCR" [62].
All ROC curves are evaluated with respect to the underlying GRN, which is a directed graph. As several of the scoring schemes/measures do not distinguish whether the regulation is directed from gene i to gene j or vise versa, some of the false positives will follow from the missing information on the directionality. However, since the network under study is a sparse one, this additional false positives barely carry a weight.
Declarations
Acknowledgements
The authors acknowledge the GoFORSYS project funded by the Federal Ministry of Education and Research, Grant No. 0313924.
Authors’ Affiliations
References
 Levine M, Tjian R: Transcription regulation and animal diversity. Nature 2003, 424(6945):147–151. 10.1038/nature01763View ArticlePubMed
 Stolovitzky G, Monroe D, Califano A: Dialogue on ReverseEngineering Assessment and Methods: The DREAM of Highthroughput pathway inference. Annals of the New York Academy of Sciences 2007, 1115: 1–22. 10.1196/annals.1407.021View ArticlePubMed
 Albert R, Othmer HG: The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. Journal of Theoretical Biology 2003, 223: 1–18. 10.1016/S00225193(03)000353View ArticlePubMed
 Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. PNAS 2002, 99(16):10555–10560. 10.1073/pnas.152046799PubMed CentralView ArticlePubMed
 Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 2004, 431(7006):308–312. 10.1038/nature02782View ArticlePubMed
 Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, ArenasMena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan Z, Schilstra MJ, Clarke PJ, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H: A genomic regulatory network for development. Science 2002, 295(5560):1669–1678. 10.1126/science.1069883View ArticlePubMed
 Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799–804. 10.1126/science.1075090View ArticlePubMed
 ShenOrr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 2002, 31: 64–68. 10.1038/ng881View ArticlePubMed
 Zhang Z, Liu C, Skogerb G, Zhu X, Lu H, Chen L, Shi B, Zhang Y, Wang J, Wu T, Chen R: Dynamic Changes in Subgraph Preference Profiles of Crucial Transcription Factors. PLoS Computational Biology 2006, 2(5):e47+.PubMed CentralView ArticlePubMed
 Papp B, Oliver S: Genomewide analysis of contextdependence of regulatory networks. Genome Biology 2005, 6(2):206. 10.1186/gb200562206PubMed CentralView ArticlePubMed
 BarJoseph Z: Analyzing time series gene expression data. Bioinformatics 2004, 20: 2493–2503. 10.1093/bioinformatics/bth283View ArticlePubMed
 Wang X, Wu M, Li Z, Chan C: Short timeseries microarray analysis: Methods and challenges. BMC Systems Biology 2008., 2(58):
 Lin J, Keogh E, Lonardi S, Chiu B: A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the ACM DMKD 2003, 20: 2–11.View Article
 Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences 2010, 107(14):6286–6291. 10.1073/pnas.0913357107View Article
 D'haeseleer P: How does gene expression clustering work? Nat Biotech 2005, 23(12):1499–1501. 10.1038/nbt12051499View Article
 Soranzo N, Bianconi G, Altafini C: Comparing association network algorithms for reverse engineering of largescale gene regulatory networks: synthetic versus real data. Bioinformatics 2007, 23(13):1640–1647. 10.1093/bioinformatics/btm163View ArticlePubMed
 Toh H, Horimoto K: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 2002, 18(2):287–297. 10.1093/bioinformatics/18.2.287View ArticlePubMed
 Reverter A, Chan EKF: Combining partial correlation and an information theory approach to the reversed engineering of gene coexpression networks. Bioinformatics 2008, 24(21):2491–2497. 10.1093/bioinformatics/btn482View ArticlePubMed
 Zampieri M, Soranzo N, Altafini C: Discerning static and causal interactions in genomewide reverse engineering problems. Bioinformatics 2008, 24(13):1510–1515. 10.1093/bioinformatics/btn220View ArticlePubMed
 OpgenRhein R, Strimmer K: From correlation to causation networks: a simple approximate learning algorithm and its application to highdimensional plant gene expression data. BMC Systems Biology 2007, 1: 37. 10.1186/17520509137PubMed CentralView ArticlePubMed
 Li W: Mutual information functions versus correlation functions. Journal of Statistical Physics 1990, 60(5):823–837. 10.1007/BF01025996View Article
 Baba K, Shibata R, Sibuya M: Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 2004, 46(4):657–664. 10.1111/j.1467842X.2004.00360.xView Article
 Wille A, Zimmermann P, Vranova E, Furholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, Zitzler E, Gruissem W, Buhlmann P: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 2004, 5(11):R92. 10.1186/gb2004511r92PubMed CentralView ArticlePubMed
 Steuer R, Kurths J, Daub CO, Weise J, Selbig J: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 2002, 18(suppl 2):S231–240. 10.1093/bioinformatics/18.suppl_2.S231View ArticlePubMed
 Liang KC, Wang X: Gene Regulatory Network Reconstruction Using Conditional Mutual Information. EURASIP Journal on Bioinformatics and Systems Biology 2008, 2008: 253894.PubMed CentralView Article
 Butte AJ, Kohane IS: Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements. Pacific Symposium on Biocomputing 2000, 5: 415–426.
 Margolin AA, Nemenman I, Wiggins C, Stolovitzky G, Califano A: On The Reconstruction of Interaction Networks with Applications to Transcriptional Regulation. To appear in Proc. NIPS Comp. Bio. Workshop, 2004, special issue of Bioinformatics 2004.
 Smith VA, Jarvis ED, Hartemink AJ: Evaluating functional network inference using simulations of complex biological systems. Bioinformatics 2002, 18(suppl 1):S216–224. 10.1093/bioinformatics/18.suppl_1.S216View ArticlePubMed
 Geier F, Timmer J, Fleck C: Reconstructing generegulatory networks from time series, knockout data, and prior knowledge. BMC Systems Biology 2007, 1: 11. 10.1186/17520509111PubMed CentralView ArticlePubMed
 Werhli AV, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks. Bioinformatics 2006, 22(20):2523–2531. 10.1093/bioinformatics/btl391View ArticlePubMed
 Markowetz F, Spang R: Inferring cellular networks  a review. BMC Bioinformatics 2007, 8(Suppl 6):S5. 10.1186/147121058S6S5PubMed CentralView ArticlePubMed
 Friedman N: Inferring Cellular Networks Using Probabilistic Graphical Models. Science 2004, 303(5659):799–805. 10.1126/science.1094068View ArticlePubMed
 D'haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. PSB 99 OnLine Proceedings 1999, 41–52.
 Guo S, Seth AK, Kendrick KM, Zhou C, Feng J: Partial Granger causality  Eliminating exogenous inputs and latent variables. Journal of Neuroscience Methods 2008, 172: 79–93. 10.1016/j.jneumeth.2008.04.011View ArticlePubMed
 Ding M, Chen Y, Bressler SL: Granger Causality: Basic Theory and Application to Neuroscience.2006. [http://arxiv.org/abs/qbio/0608035v1]
 Ding M, Chen Y, Bressler SL: Granger Causality: Basic Theory and Application to Neuroscience. In Handbook of Time Series Analysis. Edited by: Björn Schelter JT Matthias Winterhalder. Wiley InterScience; 2007:437–460.
 Lozano AC, Abe N, Liu Y, Rosset S: Grouped graphical Granger modeling for gene expression regulatory networks discovery. Bioinformatics 2009, 25(12):i110–118. 10.1093/bioinformatics/btp199PubMed CentralView ArticlePubMed
 Marwan N, Romano MC, Thiel M, Kurths J: Recurrence plots for the analysis of complex systems. Physics Reports 2007, 438(5–6):237–329. 10.1016/j.physrep.2006.11.001View Article
 Albert R: Scalefree networks in cell biology. J Cell Sci 2005, 118(21):4947–4957. 10.1242/jcs.02714View ArticlePubMed
 Ihaka R, Gentleman R: R version 2.9.2. 2009.
 Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, De Moor B, Marchal K: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006, 7: 43. 10.1186/14712105743PubMed CentralView ArticlePubMed
 Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, De Moor B, Marchal K: syntren generator. 2006. Version 1.1.3
 Aach J, Church GM: Aligning gene expression time series with time warping algorithms. Bioinformatics 2001, 17(6):495–508. 10.1093/bioinformatics/17.6.495View ArticlePubMed
 Ferre F, Clote P: BTW: a web server for Boltzmann time warping of gene expression time series. Nucleic Acids Res 2006, (34 Web Server):W482W485.
 Sakoe H, Chiba S: Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 1978, ASSP26(1):43–49.View Article
 Velichko V, Zagoruyko N: Automatic recognition of 200 words. International Journal of ManMachine Studies 1970, 2(3):223–234. 10.1016/S00207373(70)800086View Article
 Caiani E, Porta A, Baselli G, Turiel M, Muzzupappa S, Pagani M, Malliani A, Cerutti S: Analysis of cardiac leftventricular volume based on time warping averaging. Medical and Biological Engineering and Computing 2002, 40(2):225–233. 10.1007/BF02348129View ArticlePubMed
 Giorgino T: Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software 2009, 31(7):1–24.View Article
 Giorgino T: Dynamic Time Warping algorithms cran.rproject. 2010. [Package version 1.14–3] [Package version 1.143]
 Tormene P, Giorgino T, Quaglini S, Stefanelli M: Matching Incomplete Time Series with Dynamic Time Warping: An Algorithm and an Application to PostStroke Rehabilitation. Artificial Intelligence in Medicine 2009, 45: 11–34. 10.1016/j.artmed.2008.11.007View ArticlePubMed
 Frenzel S, Pompe B: Partial Mutual Information for Coupling Analysis of Multivariate Time Series. Phys Rev Lett 2007, 99(20):204101.View ArticlePubMed
 Palus M, Komarek V, Hrncir Z, Sterbova K: Synchronization as adjustment of information rates: Detection from bivariate time series. Phys Rev E 2001, 63(4):046211.View Article
 Pfaff B: VAR, SVAR and SVEC Models: Implementation Within R Package vars. Journal of Statistical Software 2008, 27(4):1–32.View Article
 Pfaff B: Analysis of Integrated and Cointegrated Time Series with R. 2nd edition. New York: Springer; 2008. [ISBN 0–387–27960–1] [ISBN 0387279601]View Article
 Akaike H: A new look at the statistical model identification. Automatic Control, IEEE Transactions on 2003, 19(6):716–723.View Article
 Wessel N, Suhrbier A, Riedl M, Marwan N, Malberg H, Bretthauer G, Penzel T, Kurths J: Detection of timedelayed interactions in biosignals using symbolic coupling traces. EPL (Europhysics Letters) 2009, 87: 10004. 10.1209/02955075/87/10004View Article
 Meyer PE: infotheo: InformationTheoretic Measures cran. 2009. [R package version 1.1.0] [R package version 1.1.0]
 Meyer PE, Lafitte F, Bontempi G: minet: Mutual Information Network Inference cran. 2009. [R package version 2.0.0] [R package version 2.0.0]
 Meyer PE, Kontos K, Lafitte F, Bontempi G: InformationTheoretic Inference of Large Transcriptional Regulatory Networks. EURASIP Journal on Bioinformatics and Systems Biology 2007, 2007: 9.View Article
 Fawcett T: An intoduction to ROC analysis. Pattern Recogn Lett 2006, 27(8):861–874. 10.1016/j.patrec.2005.10.010View Article
 Fluss R, Faraggi D, Reiser B: Estimation of the Youden Index and its Associated Cutoff Point. Biometrical Journal 2005, 47(4):458–472. 10.1002/bimj.200410135View ArticlePubMed
 Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21(20):3940–3941. 10.1093/bioinformatics/bti623View ArticlePubMed
 Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R: Largescale temporal gene expression mapping of central nervous system development. Proceedings of the National Academy of Sciences of the United States of America 1998, 95: 334–339. 10.1073/pnas.95.1.334PubMed CentralView ArticlePubMed
 Gibbons FD, Roth FP: Judging the Quality of Gene ExpressionBased Clustering Methods Using Gene Annotation. Genome Research 2002, 12(10):1574–1581. 10.1101/gr.397002PubMed CentralView ArticlePubMed
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMed
 D'haeseleer P, Wen X, Fuhrman S, Somogyi R: Mining the gene expression matrix: inferring gene relationships from large scale gene expression data. Proceedings of the second international workshop on Information processing in cell and tissues, Plenum Press 1998, 203–212.View Article
 Veiga D, Vicente F, Nicolas M, Vasconcelos AT: Predicting transcriptional regulatory interactions with artificial neural networks applied to E. coli multidrug resistance efflux pumps. BMC Microbiology 2008, 8: 101. 10.1186/147121808101PubMed CentralView ArticlePubMed
 Mukhopadhyay ND, Chatterjee S: Causality and pathway search in microarray time series experiment. Bioinformatics 2007, 23(4):442–449. 10.1093/bioinformatics/btl598View ArticlePubMed
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.