Nonparametric identification of regulatory interactions from spatial and temporal gene expression data
 Anil Aswani^{1}Email author,
 Soile VE Keränen^{2},
 James Brown^{3},
 Charless C Fowlkes^{4},
 David W Knowles^{2},
 Mark D Biggin^{2},
 Peter Bickel^{3} and
 Claire J Tomlin^{1}
https://doi.org/10.1186/1471210511413
© Aswani et al; licensee BioMed Central Ltd. 2010
Received: 7 April 2010
Accepted: 4 August 2010
Published: 4 August 2010
Abstract
Background
The correlation between the expression levels of transcription factors and their target genes can be used to infer interactions within animal regulatory networks, but current methods are limited in their ability to make correct predictions.
Results
Here we describe a novel approach which uses nonparametric statistics to generate ordinary differential equation (ODE) models from expression data. Compared to other dynamical methods, our approach requires minimal information about the mathematical structure of the ODE; it does not use qualitative descriptions of interactions within the network; and it employs new statistics to protect against overfitting. It generates spatiotemporal maps of factor activity, highlighting the times and spatial locations at which different regulators might affect target gene expression levels. We identify an ODE model for eve mRNA pattern formation in the Drosophila melanogaster blastoderm and show that this reproduces the experimental patterns well. Compared to a nondynamic, spatialcorrelation model, our ODE gives 59% better agreement to the experimentally measured pattern. Our model suggests that protein factors frequently have the potential to behave as both an activator and inhibitor for the same cisregulatory module depending on the factors' concentration, and implies different modes of activation and repression.
Conclusions
Our method provides an objective quantification of the regulatory potential of transcription factors in a network, is suitable for both low and moderatedimensional gene expression datasets, and includes improvements over existing dynamic and static models.
Keywords
Background
Inferring transcriptional regulatory networks in animals is challenging. For example, the large number of genes, the spatial and temporal complexity of expression patterns, and the presence of many redundant and indirect interactions all make it difficult to learn the network. In the long term, it will be necessary to use multiple data setsincluding gene expression, genome wide in vivo DNA binding, and network perturbation datato accurately represent all interactions. Combining multiple data classes in this way, however, is an open and challenging problem.
An alternative, intermediate approach is to use only gene expression data to infer regulatory networks. Here the relationships between the expression levels of one or more transcription factors and those of many putative target genes are used to predict which genes are the most likely targets of each factor. While much work has been done in this area, it is critical to understand the maximum amount of information that can be obtained about the network using this strategy.
Typical approaches for inferring regulatory networks have been to assume a model formulation and then fit the data to this formulation [1, 2]. Many models have been proposed, including coexpression networks [3–5], informationtheoretic representations [6–8], regression onto dynamical systems [9–14], Bayesian networks [15–17], and other graphical models [18, 19], each of which has advantages and disadvantages. The primary differences between these models lie in the tradeoff between statistical and interpretational issues. Techniques like Bayesian networks, graphical models, and informationtheoretic models have protections against overfitting (i.e., fitting models with many parameters to a small amount of experimental data); however, these techniques do not provide dynamical models which can generate new biological insights. On the other hand, techniques such as nonlinear regression and regression onto dynamical systems provide more biologically interpretable models, but sometimes suffer from inaccurate assumptions or overfitting of the model to the data.
There is disagreement on the necessity of dynamical [9–13, 8, 15–19, 14] as opposed to static [3, 6, 7, 4, 5, 20–23] models. We feel that dynamical models are more philosophically pleasing because regulatory networks contain temporal characteristics: For example, a protein binds to DNA and initiates transcription, which eventually leads to transport of the mature mRNA to the cytoplasm. Yet the argument is often made that static models provide a quasisteadystate interpretation of the network that may provide a sufficient approximation. Rigorous comparison of the two approaches, however, is lacking.
Dynamical modeling of animal regulatory networks has a long history [24, 25, 9, 26, 10, 27, 11, 28]. It is a powerful approach in which researchers hypothesize a set of nonlinear, differential equations to describe the network, but it generally requires significant prior knowledge about the network. If there is insufficient biological knowledge about the network, then the structure of the equations can be incorrectly chosen. And if the model is not carefully chosen, it will have a large number of parameters, possibly leading to weak biological effects being erroneously identified as strong effects. Furthermore, it is sometimes shown that a wide range of different parameter values can reproduce the biological behavior of the network, which could be taken as evidence for either network robustness or overfitting [26].
The purpose of this paper is to describe a novel approach for inferring regulatory networks from expression data, and it provides a new way to trade off statistical issues and model interpretability. We generate a quasigenetic, formal model of regulatory networks using nonparametric ordinary differential equations (ODEs) which are fit using the nonparametric exterior derivative estimator (NEDE) [29, 30]. For these reasons, we call our method and the resulting model the NODE (an amalgamation of NEDE and ODE) model. Our NODE model is similar to qualitative piecewise linear network modeling and identification [13, 12, 14], and we extend these models by using identification techniques that have improved statistical properties and protect against overfitting. The NEDE estimator adds constraints to the identification problem by learning correlations between factors, and these constraints protect the model from overfitting and erroneously identifying weak biological effects as strong effects. Though we focus the discussion in this paper to temporalspatial expression patterns, our NODE method can easily be used with timeseries microarray datasets. It is also scalable to a network sized on the order of hundreds of species.
We focus our modeling effort on the formation of eve mRNA stripes during Stage 5 of Drosophila melanogaster embryogenesis. We apply our technique to this portion of the regulatory network, and compare the performance of our method to that of other more commonly used models. We show that there are significant differences in the regulatory predictions made by the NODE model and other commonly used models, including the fact that our technique predicts that factors frequently have both positive and negative effects on the same targets, depending on the concentration of the factor. We also show that the NODE model performs better than a static, spatialcorrelation model.
Results and Discussion
Our NODE model is a formalization of a quasigenetic model that seeks to capture the total net effect of direct and indirect influence of each factor on a target gene, and it is generated by looking at the correlation between factor concentrations and the change in target mRNA concentration over time. This is done in small windows of neighboring cells on the embryo and at different time intervals during development. By looking at the change in target mRNA over time, we are able to generate a dynamic equation model that describes each factors' influence on each gene in space and time; tuning parameters for our method are selected in a datadriven manner using crossvalidation (see Methods and Models for more details). In general, the model formally predicts repression in all cases where increases in the concentration of a factor leads to a decrease in the rate of change in target mRNA over time. Similarly, it formally predicts activation as all instances where increasing the concentration of a factor leads to an increase in the rate of change in target mRNA over time.
Model fit
We assess the fit of our NODE model to the experimental data both qualitatively and quantitatively. Because we have an ODE model that describes the formation of the eve mRNA stripes, we can run a simulation of the model using only the experimentally measured eve concentration at the first time point of Stage 5 as the initial condition of the ODE. Only transcription factor protein and eve mRNA data from the first two time points was used to derive the NODE model for predicted regulatory interactions. By using this model along with the transcription factor protein expression data from all time points, we can then simulate the eve mRNA pattern for all six time points and then compare this to the experimentally measured eve pattern.
Factor activity plots
Such factor activity plots show the intensity and variation of predicted effects of factors at different locations on the embryo and at different time points. Our model is a formal, quasigenetic ODE model. It is not a mechanistic model, because it cannot capture the various mechanisms involved in the regulation of eve mRNA. This, however, is a strength because of the flexibility gained by not having to make a priori assumptions on the regulatory mechanisms. This comes at the cost of not being able to identify which interactions are direct or indirect.
Comparison to spatialcorrelation model
To aid understanding of our NODE model and help establish its utility, we compared it to a spatialcorrelation model. Such models have also been used for identifying regulatory interactions from quantitative expression data [21–23, 20, 33], and are based on the descriptions of the relationship between transcription factor and target gene expression that have been most widely used by developmental biologists. These models are not dynamic and look at the correlation, at fixed time points, between factor concentrations and target mRNA concentrations. To make the result comparable to our NODE model, we consider a new variant of spatialcorrelation models which looks separately at the correlation between factor levels and target mRNA levels in different, small regions of the embryo and at different stages of development.
This result fits with the idea that the NODE model is intrinsically more biologically realistic than a spatialcorrelation model. As stated earlier, biological networks are marked by temporal effects. For instance, a protein binds to DNA which initiates transcription. This is not an instantaneous process, and there is some delay between when a factor initiates transcription and when the target mRNA is expressed. The spatialcorrelation model does not model this notion of temporal effects, whereas the NODE model does.
Concentrationdependent effects
In many cases it is known that individual gene expression stripes can be controlled via a single cisregulatory module (CRM) and current computational models generally assume that a given factor acts only as an activator or a repressor on a given CRM (e.g. [26, 27, 34–36]). However, both our NODE model and our variant of the spatial correlation model frequently predict concentration dependent effects whereby, on and around the same expression stripe, a factor has both repressing and activating effects (see the yellow and light blue bars above the plots in Figures 6 and 7 and more generally Figures 4 and 5). For example, consistent with previous molecular genetic evidence, KR is predicted as a repressor of posterior eve stripe 2, but is also implied by the model to be as an activator just anterior of this in cells where KR concentrations are lower (Figure 7). This and the many other similar cases could represent spurious correlations, perhaps due to other factors having dominant effects on targets in cells where the factor under study is expressed at lower levels. However, there are a number of cases where factors, including KR, have been shown to switch from activating to repressing the same target as their concentrations increase [37, 38]. Thus, the predictions of both our NODE model and our variant of the spatial correlation model make it more obvious that gene regulation can involve multiple mechanisms of factor action that should be considered hence forth.
In some cases, the NODE model predicts factor activities that are closer to biological expectations than the spatialcorrelation model. Figure 6 indicates that both models predict strong repression by GT in almost the same anterior portion of eve stripe 2 (regions where the blue lines are below 0). On the other hand, Figure 7 indicates that the spatialcorrelation model predicts repression by KR mostly in the interstripe region between stripes 2 and 3, whereas the NODE model predicts repression by KR in the posterior half of stripe 2. Since it has been experimentally observed that the eve stripes narrow over time [31, 20], and the NODE model more accurately indicates narrowing of the stripes, this provides further support for the idea that the NODE model performs better.
Another significant difference between the two models is that the NODE model can distinguish between multiple regions of the embryo where target mRNA either increases or decreases over time, whereas spatialcorrelation models, by definition, cannot. This allows the NODE model to provide more subtle distinctions of factor activity.
We make the following formal definitions (see Methods and Models for the mathematical definitions):

Type I Repression  At current factor concentrations, the target mRNA will decrease in concentration over time. An increase in factor concentration will lead to a faster rate of decrease in target mRNA amounts over time.

Type II Repression  At current factor concentrations, the target mRNA will increase in concentration over time. An increase in factor concentration will lead to a slower rate of increase in target mRNA amounts over time.

Type I Activation  At current factor concentrations, the target mRNA will increase in concentration over time. An increase in factor concentration will lead to a faster rate of increase in target mRNA amounts over time.

Type II Activation  At current factor concentrations, the target mRNA will decrease in concentration over time. An increase in factor concentration will lead to a slower rate of decrease in target mRNA amounts over time.
Comparison to dynamical models
It is also instructive to compare our NODE model to existing dynamical models of spatial pattern regulation in Drosophila embryogenesis. There are dynamic models, some using nonlinear ODEs, that describe the developmental change in the expression of gap genes [26, 27, 35, 36] and the eve stripes [34]. Some of these models only describe the network at the level of protein expression [26, 27] whereas others include more detailed processes such as protein binding [34–36]. Like our model, these models can replicate experimentally measured gene expression patterns.
The models in [26, 27] are similar to our work in some regards in that they concern the network at the expression level. However, they require significant biological knowledge in order to hypothesize the structural forms of their equations, which can be problematic because this limits their ability to provide new biological insights. For instance, an a priori biological assumption made by the models in [26, 27] is that factors do not have concentrationdependent effects. A factor always either represses, activates, or does not affect the target gene. Biological experiments [37, 38] and our models suggest that this is not always true.
The main disadvantage of the models in [34–36] is that they use in vitro data in fitting models for in vivo behavior. These models contain detailed predictions of the regulatory network such as levels of proteinDNA binding in vivo. This is problematic because the parameters of the models were calculated using only gene expression data and in vitro DNA binding data. No comparison was made between the models' inferences and actual measurements of in vivo DNA binding. Work by the BDTNP shows that there is no simple correlation between in vitro affinity and in vivo occupancy, even on highly bound functional targets [39]. This suggests that the models in [34–36] are unlikely to be accurate and that more quantitative data, such as ChIPchip or ChIPseq binding data, needs to be used to calculate the model parameters.
Conclusions
We have described a novel approach for inferring interactions within animal regulatory networks. Our approach uses nonparametric statistics to generate ordinary differential equation (ODE) models from expression data, and it has certain statistical and mathematical advantages over existing approaches. It is able to generate spatiotemporal maps of factor activity, highlighting the times and spatial locations at which different regulators might affect target gene expression levels.
We identified an ODE model for eve mRNA pattern formation in the Drosophila blastoderm, and our model was able to reproduce the experimental patterns well. It gives 59% better agreement to the experimentally measured pattern, as compared to a spatialcorrelation model. Our model suggests that protein factors frequently have the potential to behave as both an activator and inhibitor for the same cisregulatory module depending on the factors' concentration, and implies different modes of activation and repression. This suggests further avenues of research.
Methods
Here, we describe our NODE technique which uses timeseries data to generate a dynamical model. We assume that the ratelimiting species (i.e. transcription factor protein concentrations) which drive the behavior of the network have been measured, and we do not consider actions on faster timescales (e.g. the dynamics of factors binding to target genes). Also, we assume that concentrations are large enough for the rates of interaction to be deterministic.
where x is a vector whose elements are the concentrations of the ratelimiting species. Nonlinear regression techniques [2, 1] start with a function with unknown coefficients, and then they regress the data onto this function. This is problematic because the relationships are highly nonlinear and one risks overfitting the data by starting with a function with many unknown coefficients. In contrast, our NODE method does not make any assumptions on the functional form of f(x). We use nonparametric statistics to make local estimates of the ODE in Equation 1, and our tools can scale to networks with hundreds of species.
We focus our presentation on NODE models which describe the effect of five regulatory transcription factors on target eve mRNA, and we briefly comment on how this technique can be used with general, timeseries data. The data set we use, code for our methods, and the models generated by our methods are publically available and can be downloaded from http://bdtnp.lbl.gov/FlyNet/bioimaging.jsp?w=node.
Experimental data
We apply our technique to experimental data that has been collected and processed by the BDTNP [31, 20], where measurements of protein and mRNA concentrations are taken by analyzing images of many Drosophila embryos to create a virtual embryo. The virtual embryo consists of 6078 cells and is a computational, spatial decomposition which is determined by averaging the geometry and number of cells of different embryos [31, 20]. The virtual embryo has measurements of the concentration (averaged over the different embryos at fixed points in time) of various protein factors and target mRNAs at the cellular level for six different time points during Stage 5 of the Drosophila embryo. We denote the vector of factor concentrations as x[t, e] and the vector of target gene concentrations as y[t, e], where t = 1, ..., 6 is the time of the measurement and e = 1, ..., 6078 is an index which uniquely identifies each cell in the virtual embryo. Notation like x_{ bcd }[t, e] denotes the [bcd] concentration in cell e at time t.
Computational and statistical methods
The NODE technique is summarized in the following algorithm. Any tuning parameters are chosen in a datadriven manner using crossvalidation [40, 29, 30].
Inputs: Factor concentrations x[t, e], target gene concentrations y[t, e]
 1)Presmooth the factor concentrations x[t, e] and then compute time derivatives of the target gene concentrations y[t, e]
 a)For each e = 1, ..., 6078
 i)
Do a leastsquares fit of the polynomial $\stackrel{\wedge}{x}[t,e]={c}_{0}+{c}_{1}t+\mathrm{...}+{c}_{r}{t}^{r}$ (where c_{0},..., c_{ r }are coefficients and r is a tuning parameter) with the data points: x[t, e], for each t = 1, ..., 6
 ii)
Do a leastsquares fit of the polynomial $\stackrel{\wedge}{y}[t,e]={k}_{0}+{k}_{1}t+\mathrm{...}+{k}_{r}{t}^{r}$ (where k_{0}, ..., k_{ r }are coefficients and r is a tuning parameter) with the data points: y[t, e], for each t = 1, ..., 6
 i)
 b)
Presmoothed factor concentration data is given by $\stackrel{\wedge}{x}[t,e]$, and time derivative of target gene data is given by $d\stackrel{\wedge}{y}/dt[t,e]={k}_{1}+{k}_{2}t\mathrm{...}+{k}_{r}{t}^{r1}$
 a)
 2)
Define matrix Y with rows given by $(d\stackrel{\wedge}{y}/dt[t,e])$, for each t = 1, ..., 6 and e = 1, ..., 6078
 3)Calculate the NODE model
 a)For each t = 1, ..., 6 and e = 1, ..., 6078
 i)
Define matrix X_{[t, e]}= [1 Ξ_{[t, e]}], where first column is all one's and Ξ_{[t, e]}is matrix with rows given by $(\stackrel{\wedge}{x}[u,v]\stackrel{\wedge}{x}[t,e])$, for each u = 1, ..., 6 and v = 1, ..., 6078
 ii)Define weighting matrix W_{[t, e]}to be diagonal matrix with entries along diagonal given by$w[u,v]=\{\begin{array}{cc}3\left(1{(n[u,v]/h)}^{2}\right)/4,& \text{if}n[u,v]\le h\\ 0& \text{otherwise}\end{array}$(2)
for each u = 1, ..., 6 and v = 1, ..., 6078, where n[u, v] = x[u, v]x[t, e]_{2} is the Euclidean distance and h is a tunable parameter
 iii)
Define matrix P_{[t, e]}by making its columns be the (p  d) principal components of Ξ_{[t, e]}^{ T }W_{[t, e]}Ξ_{[t, e]}with smallest eigenvalues, where p is number of factors (p = 5 for the NODE model of target eve mRNA) and d is a tuning parameter
 iv)Coefficients of NODE model, for eth cell at tth time point, are given by NEDE estimator$\begin{array}{l}{\left[\begin{array}{cccc}{b}_{[t,e]}& {a}_{bcd,[t,e]}& \mathrm{...}& {a}_{Kr,[t,e]}\end{array}\right]}^{T}=\\ \phantom{\rule{0.5em}{0ex}}\mathrm{arg}\underset{\beta}{\mathrm{min}}{\Vert {{W}_{[t,e]}}^{1/2}\left(Y{X}_{[t,e]}\beta \right)\Vert}_{2}{}^{2}+\lambda {\Vert {P}_{[t,e]}\beta \Vert}_{2}{}^{2}\end{array}$(3)
 i)
 a)
Step 1 involves presmoothing the experimental data and computing its time derivatives. We prefer to do this with local polynomial regression (LPR) [41] because it suffers from fewer transient effects than digital filters [42, 41]. To simplify the presentation, Step 1.a describes polynomial regression (PR). LPR is a variant of PR which protects against oversmoothing the data, and it can be quickly computed by doing a weighted linear regression. More details on LPR can be found in [41].
This step is important because otherwise the NODE model will be statistically biased [43]. However, caution must be used when deciding to presmooth certain data sets in which the measurements are very noisy and taken at a sparse grid of points in time. In such cases, there is a risk of smoothing out biologicallyrelevant, temporal trends in the data because of the sparsity of the temporal grid.
Step 3 computes the NODE model, and we make use of the NEDE estimator: a new statistical tool that protects against overfitting [30]. The computation in Step 3.a.ii determines a window of cells v at time u that have concentrations similar to cell e at time t. The size of this window is selected by the parameter h, and cells with highly (weakly) similar concentrations are weighted highly (weakly) in the estimation of the coefficients of the NODE model. Equation 2 uses the Epanechnikov kernel to do this weighting. Note that weights for cells with very different concentrations can be similar, because the Euclidean distances computed in Step 3.a.ii can be similar. This does not cause problems because the NEDE estimator has been proven to be statistically wellbehaved in the presence of such weighting schemes [41, 29, 30].
Step 3.a.iv uses the NEDE estimator in Equation 3 to compute the coefficients of the NODE model. It protects against overfitting by learning constraints which the data obeys (Step 3.a.iii), and then using these constraints to reduce the degrees of freedom in the regression. In general, the data points x_{[t, e]}form a manifold, and the projection matrix P_{[t, e]}in Equation 3 enforces that the regression coefficients lie close to the manifold. This methodology is motivated by differential geometry which says that the exterior derivative of a function on an embedded submanifold lies in the cotangent space [44, 29, 30]. The NEDE estimator can be calculated quickly on a computer because it is a convex optimization problem. Theoretical properties and a more detailed description of the NEDE estimator can be found in [29, 30].
NODE model interpretation
Instead of using a single ODE model to describe the regulatory network, the NODE model uses a group of ODE models consisting of the first order Taylor expansion (i.e., linearization) of the ODE given in Equation 1. Each equation of the NODE model describes how the behaviour of the regulatory network changes if concentrations of the factors in cell e at time t are changed. It requires fewer assumptions or prior knowledge about the system, because it does not require knowing the mathematical structure of the single ODE model in Equation 1. The disadvantage of this approach is that it is more difficult to interpret a series of models. The full NODE model for formation of target eve mRNA is given by Equation 4, and there is a different equation for each cell e at time t. Though the NEDE estimator protects against overfitting, some might feel that the NODE model overfits. The predictive ability of the NODE model, as discussed in Results and Discussion, gives evidence that it does not overfit. In that test, we used our algorithm on the first two time points of data, and we assumed that the model for cell e at times t = 3, 4, 5 was the same as the model for cell e at time t = 2.
Equation 4 is difficult to interpret because the coefficients vary depending on cell e at time t, due to the fact that each equation is a linearization that is valid for when factor concentrations are close to x[t, e]. The model describes how a change in factor concentrations in the presence of all factors (the right hand side of Equation 4), affects the change in time of eve (the left hand side of Equation 4). If d[eve]/dt is positive (negative), then eve concentration will increase (decrease) by the next instant of time. For example, suppose the concentrations of all species are kept fixed at $\stackrel{\wedge}{x}[t,e]$ except for the concentration of GT which is slightly increased from ${\stackrel{\wedge}{x}}_{gt}[t,e]$ to $[gt]={\stackrel{\wedge}{x}}_{gt}[t,e]+\Delta gt$. In this situation, the change in time of eve concentration will be given by d[eve]/dt = a_{gt,[t, e]}Δgt + b[t, e]. The increase of GT concentration by Δgt leads to a change in the change in time of eve concentration by a_{gt,[t, e]}Δgt amount, and the sign of a_{gt,[t, e]}describes whether this change is positive or negative.
Because this equation describes relationships in the presence of all factors, this can lead to seemingly contradictory results, such as when one species is a putative activator (e.g., BCD protein upregulates eve mRNA), but increasing the concentration of the activator in the presence of the other species can have a slight repressive effect because of interactions between the activator and the other factor species (i.e., the described concentrationdependent effects). Such a situation leads to an odd result: The coefficient of the "activator" will be negative.
and this model looks for the correlation of eve mRNA with protein factor concentrations. Whereas Equation 4 is a dynamical model, the model in Equation 5 is a static model, because it does not describe the temporal changes in eve concentration. A comparison of the fits between these models can be seen in Results and Discussion. Note that the coefficients in Equation 5 are computed with the algorithm for our NODE technique, with the change that Y is a vector of eve concentrations.
Factor activity
The first term is the coefficient from Equation 4, and the second term in parenthesis is a measure of average GT concentration within cells whose factor concentrations are similar to cell e at time t. The second term in parenthesis in Equation 6 is a measure of average concentrations, because it is a measure of the mean difference from the baseline concentration of x[t, e]. To clarify the notation, suppose the ith value of x[t, e] denotes: x_{ gt }[t, e], which is GT concentration. Then the term [Ξ_{[t, e]}^{ T }W_{[t, e]}Ξ_{[t, e]}]_{ gt }denotes the ith value along the diagonal of the matrix Ξ_{[t, e]}^{ T }W_{[t, e]}Ξ_{[t, e]}.
Mathematical definition of factor activity classification in the NODE model
sign(a_{gt,[t, e]})  sign(d[eve]/dt)  

Type I Repression     
Type II Repression    + 
Type I Activation  +  + 
Type II Activation  +   
Window sizes
An example of a window is shown in Figure 11. The NODE method uses Equation 2 to take the similarity of the cells into account when doing the regression procedure. The size of the window is determined by the parameter h which is chosen using crossvalidation, and it changes for each cell e at time t. As explained earlier, the statistical tools are wellbehaved when weighting of cells within this window is computed with Euclidean distance.
To check that window sizes selected in a datadriven manner were not too small and missing important features, we did a check in which we fixed the windows to surround cell e at time t with a circle of radius of three cells at time t. This size was chosen, because the eve stripes are about six cells wide at Stage 5. A circular window with this size would not miss important regulatory features of the network. The [Additional file 1] shows plots of factor activity as generated by our NODE method for both dataselected and fixed windows. A visual comparison of the factor activity plots generated by these two windows shows that the dataselected windows were able to identify the same features as the fixed, circular window.
General timeseries data
where (a) ξ[n] for n = 1, ..., N is a userselected set of linearization points of Equation 1, (b) A_{ξ[n]}= Df(ξ[n]) and b_{ξ[n]}= f(ξ[n]) are the coefficients of the model, and (c) Df is the gradient of f(x). The NODE technique is unchanged except e refers to different experiments (instead of different cells), and Equation 3 is applied columnwise to Y to give columns of the matrix of coefficients: [b_{ξ[n]}^{ T }A_{ξ[n]}^{ T }]^{ T }.
Declarations
Acknowledgements
This work is part of a broader collaboration by the BDTNP. We are grateful for the frequent advice, support, criticisms, and enthusiasm of its members. The gene expression data were funded by the U.S. National Institutes of Health (NIH) under grants GM704403 (to MDB). Computational analyses were funded by National Science Foundation (NSF) award #CCR0225610 (ITR), which supports the CHESS at UC Berkeley (to CT) and NSF awards #DMS0605236 and #DMS0906808 (to PB). Work at Lawrence Berkeley National Laboratory was conducted under Department of Energy contract DEAC0205CH11231.
Authors’ Affiliations
References
 Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Molecular Systems Biology 2007., 3(78):Google Scholar
 Markowetz F, Spang R: Inferring cellular networks  a review. BMC Bioinformatics 2007, 8(Suppl 6):S5. [http://www.biomedcentral.com/1471–2105/8/S6/S5] 10.1186/147121058S6S5View ArticlePubMedPubMed CentralGoogle Scholar
 Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genomewide expression patterns. PNAS 1998, 98: 14863–14868. 10.1073/pnas.95.25.14863View ArticleGoogle Scholar
 Stuart J, Segal E, Koller D, Kim S: A genecoexpression network for global discovery of conserved genetic modules. Science 2003, 302(5643):249–255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
 Bickel D: Probabilities of spurious connections in gene networks: application to expression time series. Bioinformatics 2005, 21(7):1121–1128. 10.1093/bioinformatics/bti140View ArticlePubMedGoogle Scholar
 Butte A, Kohane I: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing 2000, 418–429.Google Scholar
 Steuer R, Kurths J, Daub C, Weise J, Selbig J: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 2002, 18(Suppl 2):S231S240.View ArticlePubMedGoogle Scholar
 Rao A, Hero A, States D, james Engel: Using Directed Information to Build Biologically Relevant Influence Networks. Journal of Bioinformatics and Computational Biology 2008, 6(3):493–519. 10.1142/S0219720008003515View ArticlePubMedGoogle Scholar
 D'haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pacific Symposium on Biocomputing 1999, 41–52.Google Scholar
 Stephen Yeung MK, Tegnér J, Collins J: Reverse engineering gene networks using singular value decomposition and robust regression. PNAS 2002, 99(9):6163–6168. 10.1073/pnas.092576199View ArticleGoogle Scholar
 Bonneau R, Reiss D, Shannon P, Facciotti M, Hood L, Baliga N, Thorsson V: The Inferelator: an algorithm for learning parsimonious regulatory networks from systemsbiology data sets de novo. Genome Biology 2006, 7(5):R36. [http://genomebiology.com/2006/7/5/R36] 10.1186/gb200675r36View ArticlePubMedPubMed CentralGoogle Scholar
 de Jong H, Ropers D: Qualitative approaches towards the analysis of genetic regulatory networks. In System Modeling in Cellular Biology: From Concepts to Nuts and Bolts. Edited by: Szallasi Z, Periwal V, Stelling J. MIT Press; 2006:125–148.View ArticleGoogle Scholar
 Porreca R, Drulhe S, de Jong H, FerrariTrecate G: Structural identification of piecewiselinear models of genetic regulatory networks. Journal of Computational Biology 2008, 15(10):1365–1380. 10.1089/cmb.2008.0109View ArticlePubMedGoogle Scholar
 Cinquemani E, MiliasArgeitis A, Summers S, Lygeros J: Local Identification of Piecewise Deterministic Models of Genetic Networks. Hybrid Systems: Computation and Control 2009, 105–119. full_textView ArticleGoogle Scholar
 Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(3):601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
 Yu J, Smith V, Wang P, Hartemink A, Jarvis E: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20(18):3594–3603. 10.1093/bioinformatics/bth448View ArticlePubMedGoogle Scholar
 Xiang Z, Minter R, Bi X, Woolf P, Hechmer Y: miniTUBA: medical inference by network integration of temporal data using Bayesian analysis. Bioinformatics 2007, 23(18):2423–2432. 10.1093/bioinformatics/btm372View ArticlePubMedGoogle Scholar
 Friedman N: Inferring cellular networks using probabilistic graphical models. Science 2004, 303(5659):799–805. 10.1126/science.1094068View ArticlePubMedGoogle Scholar
 Werhli A, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics 2006, 22(20):2523–2531. 10.1093/bioinformatics/btl391View ArticlePubMedGoogle Scholar
 Fowlkes C, Hendriks CL, Keränen S, Weber G, Rübel O, Huang MY, Chatoor S, Simirenko L, DePace A, Henriquez C, Beaton A, Weiszmann R, Celniker S, Hamann B, Knowles D, Biggin M, Eisen M, Malik J: Constructing a quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm. Cell 2008, 133(2):364–374. 10.1016/j.cell.2008.01.053View ArticlePubMedGoogle Scholar
 Small S, Blair A, Levine M: Regulation of Two PairRule Stripes by a Single Enhancer in the Drosophila Embryo. Developmental Biology 1996, 175(2):314–324. 10.1006/dbio.1996.0117View ArticlePubMedGoogle Scholar
 Fujioka M, EmiSarker Y, Yusibova G, Goto T, Jaynes J: Analysis of an evenskipped rescue transgene reveals both composite and discrete neuronal and early blastoderm enhancers, and multistripe positioning by gap gene repressor gradients. Development 1999, 126: 2527–2538.PubMedPubMed CentralGoogle Scholar
 Arnosti D, Barolo S, Levine M, Small S: The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 1996, 122: 205–214.PubMedGoogle Scholar
 Turing A: The Chemical Basis of Morphogenesis. Philosophical Transactions of the Royal Society B 1952, 237: 37–72. 10.1098/rstb.1952.0012View ArticleGoogle Scholar
 Baianu I: Computer Models and Automata Theory in Biology and Medicine. Mathematical Modelling 1987, 7: 1513–1577. 10.1016/02700255(86)900898View ArticleGoogle Scholar
 von Dassow G, Meir E, Munro E, Odell G: The segment polarity network is a robust developmental module. Nature 2000, 406: 188–192. 10.1038/35018085View ArticlePubMedGoogle Scholar
 Jaeger J, Blagov M, Kosman D, Kozlov K, Myasnikova E, Surkova S, VanarioAlonso C, Samsonova M, Sharp D, Reinitz J: Dynamical Analysis of Regulatory Interactions in the Gap Gene System of Drosophila melanogaster . Genetics 2004, 167: 1721–1737. 10.1534/genetics.104.027334View ArticlePubMedPubMed CentralGoogle Scholar
 Murray J: Mathematical Biology. Springer. 2007.Google Scholar
 Aswani A, Bickel P, Tomlin C: Statistics for sparse, highdimensional, and nonparametric system identification. Proceedings of the IEEE International Conference on Robotics and Automation 2009, 2133–2138.Google Scholar
 Aswani A, Bickel P, Tomlin C: Regression on Manifolds: Estimation of the Exterior Derivative. Annals of Statistics 2010. [To appear].Google Scholar
 Luengo Hendriks C, Keränen S, Fowlkes C, Simirenko L, Weber G, DePace A, Henriquez C, Kaszuba D, Hamann B, Eisen M, Malik J, Sudar D, Biggin M, Knowles D: Threedimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: Data acquisition pipeline. Genome Biology 2006, 7(12):R123. [http://genomebiology.com/2006/7/12/R123] 10.1186/gb2006712r123View ArticlePubMedPubMed CentralGoogle Scholar
 Harding K, Hoey T, Warrior R, Levine M: Autoregulatory and gap gene response elements of the evenskipped promoter of Drosophila . EMBO J 1989, 8(4):1205–1212.PubMedPubMed CentralGoogle Scholar
 Marco A, Konikoff C, Karr T, Kumar S: Relationship between gene coexpression and sharing of transcription factor binding sites in Drosophila melanogaster . Bioinformatics 2009, 25(19):2473–2477. 10.1093/bioinformatics/btp462View ArticlePubMedPubMed CentralGoogle Scholar
 Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J: Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genetics 2006, 38: 1159–1165. 10.1038/ng1886View ArticlePubMedGoogle Scholar
 Segal E, RavehSadka T, Schroeder M, Unnerstall U, Gaul U: Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 2008, 451: 535–540. 10.1038/nature06496View ArticlePubMedGoogle Scholar
 Fakhouri W, Ay A, Sayal R, Dresch J, Dayringer E, Arnosti D: Deciphering a transcriptional regulatory code: modeling shortrange repression in the Drosophila embryo. Molecular Systems Biology 2010., 6(341):Google Scholar
 Ptashne M, Jeffrey A, Johnson A, Maurer R, Meyer B, Pabo C, Roberts T, Sauer T: How the lambda repressor and cro work. Cell 1980, 19: 1–11. 10.1016/00928674(80)903839View ArticlePubMedGoogle Scholar
 Sauer F, Jäckle H: Concentrationdependent transcriptional activation or repression by Krüppel from a single binding site. Nature 1991, 353: 563–566. 10.1038/353563a0View ArticlePubMedGoogle Scholar
 Li Xy, MacArthur S, Bourgon R, Nix D, Pollard DA, Iyer VN, Hechmer A, Simirenko L, Stapleton M, Hendriks CLL, Chu HC, Ogawa N, Inwood W, Sementchenko V, Beaton A, Weiszmann R, Celniker SE, Knowles DW, Gingeras T, Speed TP, Eisen MB, Biggin MD: Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS biology 2008, 6(2):e27. 10.1371/journal.pbio.0060027View ArticlePubMedPubMed CentralGoogle Scholar
 Bickel P, Li B: Local Polynomial Regression on Unknown Manifolds. Complex Datasets And Inverse Problems: Tomography, Networks And Beyond, Volume 54 of IMS Lecture NotesMonograph Series, Institute of Mathematical Studies 2007.Google Scholar
 Ruppert D, Wand M: Multivariate Locally Weighted Least Squares Regression. Annals of Statistics 1994, 22(3):1346–1370. 10.1214/aos/1176325632View ArticleGoogle Scholar
 Savitzky A, Golay MJ: Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 1964, 36: 1627–1639. 10.1021/ac60214a047View ArticleGoogle Scholar
 Schneeweiß H: Consistent estimation of a regression with errors in variables. Metrika 1976, 23: 101–115. 10.1007/BF01902854View ArticleGoogle Scholar
 Lee J: Introduction to Smooth Manifolds. Springer. 2003.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.