- Research article
- Open Access
Nonparametric identification of regulatory interactions from spatial and temporal gene expression data
© Aswani et al; licensee BioMed Central Ltd. 2010
- Received: 7 April 2010
- Accepted: 4 August 2010
- Published: 4 August 2010
The correlation between the expression levels of transcription factors and their target genes can be used to infer interactions within animal regulatory networks, but current methods are limited in their ability to make correct predictions.
Here we describe a novel approach which uses nonparametric statistics to generate ordinary differential equation (ODE) models from expression data. Compared to other dynamical methods, our approach requires minimal information about the mathematical structure of the ODE; it does not use qualitative descriptions of interactions within the network; and it employs new statistics to protect against over-fitting. It generates spatio-temporal maps of factor activity, highlighting the times and spatial locations at which different regulators might affect target gene expression levels. We identify an ODE model for eve mRNA pattern formation in the Drosophila melanogaster blastoderm and show that this reproduces the experimental patterns well. Compared to a non-dynamic, spatial-correlation model, our ODE gives 59% better agreement to the experimentally measured pattern. Our model suggests that protein factors frequently have the potential to behave as both an activator and inhibitor for the same cis-regulatory module depending on the factors' concentration, and implies different modes of activation and repression.
Our method provides an objective quantification of the regulatory potential of transcription factors in a network, is suitable for both low- and moderate-dimensional gene expression datasets, and includes improvements over existing dynamic and static models.
- Factor Concentration
- Node Model
- Ordinary Differential Equation Model
- Transcription Factor Protein
- Node Method
Inferring transcriptional regulatory networks in animals is challenging. For example, the large number of genes, the spatial and temporal complexity of expression patterns, and the presence of many redundant and indirect interactions all make it difficult to learn the network. In the long term, it will be necessary to use multiple data sets--including gene expression, genome wide in vivo DNA binding, and network perturbation data--to accurately represent all interactions. Combining multiple data classes in this way, however, is an open and challenging problem.
An alternative, intermediate approach is to use only gene expression data to infer regulatory networks. Here the relationships between the expression levels of one or more transcription factors and those of many putative target genes are used to predict which genes are the most likely targets of each factor. While much work has been done in this area, it is critical to understand the maximum amount of information that can be obtained about the network using this strategy.
Typical approaches for inferring regulatory networks have been to assume a model formulation and then fit the data to this formulation [1, 2]. Many models have been proposed, including coexpression networks [3–5], information-theoretic representations [6–8], regression onto dynamical systems [9–14], Bayesian networks [15–17], and other graphical models [18, 19], each of which has advantages and disadvantages. The primary differences between these models lie in the trade-off between statistical and interpretational issues. Techniques like Bayesian networks, graphical models, and information-theoretic models have protections against over-fitting (i.e., fitting models with many parameters to a small amount of experimental data); however, these techniques do not provide dynamical models which can generate new biological insights. On the other hand, techniques such as nonlinear regression and regression onto dynamical systems provide more biologically interpretable models, but sometimes suffer from inaccurate assumptions or over-fitting of the model to the data.
There is disagreement on the necessity of dynamical [9–13, 8, 15–19, 14] as opposed to static [3, 6, 7, 4, 5, 20–23] models. We feel that dynamical models are more philosophically pleasing because regulatory networks contain temporal characteristics: For example, a protein binds to DNA and initiates transcription, which eventually leads to transport of the mature mRNA to the cytoplasm. Yet the argument is often made that static models provide a quasi-steady-state interpretation of the network that may provide a sufficient approximation. Rigorous comparison of the two approaches, however, is lacking.
Dynamical modeling of animal regulatory networks has a long history [24, 25, 9, 26, 10, 27, 11, 28]. It is a powerful approach in which researchers hypothesize a set of nonlinear, differential equations to describe the network, but it generally requires significant prior knowledge about the network. If there is insufficient biological knowledge about the network, then the structure of the equations can be incorrectly chosen. And if the model is not carefully chosen, it will have a large number of parameters, possibly leading to weak biological effects being erroneously identified as strong effects. Furthermore, it is sometimes shown that a wide range of different parameter values can reproduce the biological behavior of the network, which could be taken as evidence for either network robustness or over-fitting .
The purpose of this paper is to describe a novel approach for inferring regulatory networks from expression data, and it provides a new way to trade off statistical issues and model interpretability. We generate a quasi-genetic, formal model of regulatory networks using nonparametric ordinary differential equations (ODEs) which are fit using the nonparametric exterior derivative estimator (NEDE) [29, 30]. For these reasons, we call our method and the resulting model the NODE (an amalgamation of NEDE and ODE) model. Our NODE model is similar to qualitative piece-wise linear network modeling and identification [13, 12, 14], and we extend these models by using identification techniques that have improved statistical properties and protect against over-fitting. The NEDE estimator adds constraints to the identification problem by learning correlations between factors, and these constraints protect the model from over-fitting and erroneously identifying weak biological effects as strong effects. Though we focus the discussion in this paper to temporal-spatial expression patterns, our NODE method can easily be used with time-series micro-array datasets. It is also scalable to a network sized on the order of hundreds of species.
We focus our modeling effort on the formation of eve mRNA stripes during Stage 5 of Drosophila melanogaster embryogenesis. We apply our technique to this portion of the regulatory network, and compare the performance of our method to that of other more commonly used models. We show that there are significant differences in the regulatory predictions made by the NODE model and other commonly used models, including the fact that our technique predicts that factors frequently have both positive and negative effects on the same targets, depending on the concentration of the factor. We also show that the NODE model performs better than a static, spatial-correlation model.
Our NODE model is a formalization of a quasi-genetic model that seeks to capture the total net effect of direct and indirect influence of each factor on a target gene, and it is generated by looking at the correlation between factor concentrations and the change in target mRNA concentration over time. This is done in small windows of neighboring cells on the embryo and at different time intervals during development. By looking at the change in target mRNA over time, we are able to generate a dynamic equation model that describes each factors' influence on each gene in space and time; tuning parameters for our method are selected in a data-driven manner using cross-validation (see Methods and Models for more details). In general, the model formally predicts repression in all cases where increases in the concentration of a factor leads to a decrease in the rate of change in target mRNA over time. Similarly, it formally predicts activation as all instances where increasing the concentration of a factor leads to an increase in the rate of change in target mRNA over time.
We assess the fit of our NODE model to the experimental data both qualitatively and quantitatively. Because we have an ODE model that describes the formation of the eve mRNA stripes, we can run a simulation of the model using only the experimentally measured eve concentration at the first time point of Stage 5 as the initial condition of the ODE. Only transcription factor protein and eve mRNA data from the first two time points was used to derive the NODE model for predicted regulatory interactions. By using this model along with the transcription factor protein expression data from all time points, we can then simulate the eve mRNA pattern for all six time points and then compare this to the experimentally measured eve pattern.
Factor activity plots
Such factor activity plots show the intensity and variation of predicted effects of factors at different locations on the embryo and at different time points. Our model is a formal, quasi-genetic ODE model. It is not a mechanistic model, because it cannot capture the various mechanisms involved in the regulation of eve mRNA. This, however, is a strength because of the flexibility gained by not having to make a priori assumptions on the regulatory mechanisms. This comes at the cost of not being able to identify which interactions are direct or indirect.
Comparison to spatial-correlation model
To aid understanding of our NODE model and help establish its utility, we compared it to a spatial-correlation model. Such models have also been used for identifying regulatory interactions from quantitative expression data [21–23, 20, 33], and are based on the descriptions of the relationship between transcription factor and target gene expression that have been most widely used by developmental biologists. These models are not dynamic and look at the correlation, at fixed time points, between factor concentrations and target mRNA concentrations. To make the result comparable to our NODE model, we consider a new variant of spatial-correlation models which looks separately at the correlation between factor levels and target mRNA levels in different, small regions of the embryo and at different stages of development.
This result fits with the idea that the NODE model is intrinsically more biologically realistic than a spatial-correlation model. As stated earlier, biological networks are marked by temporal effects. For instance, a protein binds to DNA which initiates transcription. This is not an instantaneous process, and there is some delay between when a factor initiates transcription and when the target mRNA is expressed. The spatial-correlation model does not model this notion of temporal effects, whereas the NODE model does.
In many cases it is known that individual gene expression stripes can be controlled via a single cis-regulatory module (CRM) and current computational models generally assume that a given factor acts only as an activator or a repressor on a given CRM (e.g. [26, 27, 34–36]). However, both our NODE model and our variant of the spatial correlation model frequently predict concentration dependent effects whereby, on and around the same expression stripe, a factor has both repressing and activating effects (see the yellow and light blue bars above the plots in Figures 6 and 7 and more generally Figures 4 and 5). For example, consistent with previous molecular genetic evidence, KR is predicted as a repressor of posterior eve stripe 2, but is also implied by the model to be as an activator just anterior of this in cells where KR concentrations are lower (Figure 7). This and the many other similar cases could represent spurious correlations, perhaps due to other factors having dominant effects on targets in cells where the factor under study is expressed at lower levels. However, there are a number of cases where factors, including KR, have been shown to switch from activating to repressing the same target as their concentrations increase [37, 38]. Thus, the predictions of both our NODE model and our variant of the spatial correlation model make it more obvious that gene regulation can involve multiple mechanisms of factor action that should be considered hence forth.
In some cases, the NODE model predicts factor activities that are closer to biological expectations than the spatial-correlation model. Figure 6 indicates that both models predict strong repression by GT in almost the same anterior portion of eve stripe 2 (regions where the blue lines are below 0). On the other hand, Figure 7 indicates that the spatial-correlation model predicts repression by KR mostly in the inter-stripe region between stripes 2 and 3, whereas the NODE model predicts repression by KR in the posterior half of stripe 2. Since it has been experimentally observed that the eve stripes narrow over time [31, 20], and the NODE model more accurately indicates narrowing of the stripes, this provides further support for the idea that the NODE model performs better.
Another significant difference between the two models is that the NODE model can distinguish between multiple regions of the embryo where target mRNA either increases or decreases over time, whereas spatial-correlation models, by definition, cannot. This allows the NODE model to provide more subtle distinctions of factor activity.
We make the following formal definitions (see Methods and Models for the mathematical definitions):
Type I Repression - At current factor concentrations, the target mRNA will decrease in concentration over time. An increase in factor concentration will lead to a faster rate of decrease in target mRNA amounts over time.
Type II Repression - At current factor concentrations, the target mRNA will increase in concentration over time. An increase in factor concentration will lead to a slower rate of increase in target mRNA amounts over time.
Type I Activation - At current factor concentrations, the target mRNA will increase in concentration over time. An increase in factor concentration will lead to a faster rate of increase in target mRNA amounts over time.
Type II Activation - At current factor concentrations, the target mRNA will decrease in concentration over time. An increase in factor concentration will lead to a slower rate of decrease in target mRNA amounts over time.
Comparison to dynamical models
It is also instructive to compare our NODE model to existing dynamical models of spatial pattern regulation in Drosophila embryogenesis. There are dynamic models, some using nonlinear ODEs, that describe the developmental change in the expression of gap genes [26, 27, 35, 36] and the eve stripes . Some of these models only describe the network at the level of protein expression [26, 27] whereas others include more detailed processes such as protein binding [34–36]. Like our model, these models can replicate experimentally measured gene expression patterns.
The models in [26, 27] are similar to our work in some regards in that they concern the network at the expression level. However, they require significant biological knowledge in order to hypothesize the structural forms of their equations, which can be problematic because this limits their ability to provide new biological insights. For instance, an a priori biological assumption made by the models in [26, 27] is that factors do not have concentration-dependent effects. A factor always either represses, activates, or does not affect the target gene. Biological experiments [37, 38] and our models suggest that this is not always true.
The main disadvantage of the models in [34–36] is that they use in vitro data in fitting models for in vivo behavior. These models contain detailed predictions of the regulatory network such as levels of protein-DNA binding in vivo. This is problematic because the parameters of the models were calculated using only gene expression data and in vitro DNA binding data. No comparison was made between the models' inferences and actual measurements of in vivo DNA binding. Work by the BDTNP shows that there is no simple correlation between in vitro affinity and in vivo occupancy, even on highly bound functional targets . This suggests that the models in [34–36] are unlikely to be accurate and that more quantitative data, such as ChIP-chip or ChIP-seq binding data, needs to be used to calculate the model parameters.
We have described a novel approach for inferring interactions within animal regulatory networks. Our approach uses nonparametric statistics to generate ordinary differential equation (ODE) models from expression data, and it has certain statistical and mathematical advantages over existing approaches. It is able to generate spatio-temporal maps of factor activity, highlighting the times and spatial locations at which different regulators might affect target gene expression levels.
We identified an ODE model for eve mRNA pattern formation in the Drosophila blastoderm, and our model was able to reproduce the experimental patterns well. It gives 59% better agreement to the experimentally measured pattern, as compared to a spatial-correlation model. Our model suggests that protein factors frequently have the potential to behave as both an activator and inhibitor for the same cis-regulatory module depending on the factors' concentration, and implies different modes of activation and repression. This suggests further avenues of research.
Here, we describe our NODE technique which uses time-series data to generate a dynamical model. We assume that the rate-limiting species (i.e. transcription factor protein concentrations) which drive the behavior of the network have been measured, and we do not consider actions on faster time-scales (e.g. the dynamics of factors binding to target genes). Also, we assume that concentrations are large enough for the rates of interaction to be deterministic.
where x is a vector whose elements are the concentrations of the rate-limiting species. Nonlinear regression techniques [2, 1] start with a function with unknown coefficients, and then they regress the data onto this function. This is problematic because the relationships are highly nonlinear and one risks over-fitting the data by starting with a function with many unknown coefficients. In contrast, our NODE method does not make any assumptions on the functional form of f(x). We use nonparametric statistics to make local estimates of the ODE in Equation 1, and our tools can scale to networks with hundreds of species.
We focus our presentation on NODE models which describe the effect of five regulatory transcription factors on target eve mRNA, and we briefly comment on how this technique can be used with general, time-series data. The data set we use, code for our methods, and the models generated by our methods are publically available and can be downloaded from http://bdtnp.lbl.gov/Fly-Net/bioimaging.jsp?w=node.
We apply our technique to experimental data that has been collected and processed by the BDTNP [31, 20], where measurements of protein and mRNA concentrations are taken by analyzing images of many Drosophila embryos to create a virtual embryo. The virtual embryo consists of 6078 cells and is a computational, spatial decomposition which is determined by averaging the geometry and number of cells of different embryos [31, 20]. The virtual embryo has measurements of the concentration (averaged over the different embryos at fixed points in time) of various protein factors and target mRNAs at the cellular level for six different time points during Stage 5 of the Drosophila embryo. We denote the vector of factor concentrations as x[t, e] and the vector of target gene concentrations as y[t, e], where t = 1, ..., 6 is the time of the measurement and e = 1, ..., 6078 is an index which uniquely identifies each cell in the virtual embryo. Notation like x bcd [t, e] denotes the [bcd] concentration in cell e at time t.
Computational and statistical methods
Inputs: Factor concentrations x[t, e], target gene concentrations y[t, e]
- 1)Presmooth the factor concentrations x[t, e] and then compute time derivatives of the target gene concentrations y[t, e]
- a)For each e = 1, ..., 6078
Do a least-squares fit of the polynomial (where c0,..., c r are coefficients and r is a tuning parameter) with the data points: x[t, e], for each t = 1, ..., 6
Do a least-squares fit of the polynomial (where k0, ..., k r are coefficients and r is a tuning parameter) with the data points: y[t, e], for each t = 1, ..., 6
Presmoothed factor concentration data is given by , and time derivative of target gene data is given by
Define matrix Y with rows given by , for each t = 1, ..., 6 and e = 1, ..., 6078
- 3)Calculate the NODE model
- a)For each t = 1, ..., 6 and e = 1, ..., 6078
Define matrix X[t, e]= [1 Ξ[t, e]], where first column is all one's and Ξ[t, e]is matrix with rows given by , for each u = 1, ..., 6 and v = 1, ..., 6078
- ii)Define weighting matrix W[t, e]to be diagonal matrix with entries along diagonal given by(2)
for each u = 1, ..., 6 and v = 1, ..., 6078, where n[u, v] = ||x[u, v]-x[t, e]||2 is the Euclidean distance and h is a tunable parameter
Define matrix P[t, e]by making its columns be the (p - d) principal components of Ξ[t, e] T W[t, e]Ξ[t, e]with smallest eigenvalues, where p is number of factors (p = 5 for the NODE model of target eve mRNA) and d is a tuning parameter
- iv)Coefficients of NODE model, for e-th cell at t-th time point, are given by NEDE estimator(3)
Step 1 involves presmoothing the experimental data and computing its time derivatives. We prefer to do this with local polynomial regression (LPR)  because it suffers from fewer transient effects than digital filters [42, 41]. To simplify the presentation, Step 1.a describes polynomial regression (PR). LPR is a variant of PR which protects against over-smoothing the data, and it can be quickly computed by doing a weighted linear regression. More details on LPR can be found in .
This step is important because otherwise the NODE model will be statistically biased . However, caution must be used when deciding to presmooth certain data sets in which the measurements are very noisy and taken at a sparse grid of points in time. In such cases, there is a risk of smoothing out biologically-relevant, temporal trends in the data because of the sparsity of the temporal grid.
Step 3 computes the NODE model, and we make use of the NEDE estimator: a new statistical tool that protects against over-fitting . The computation in Step 3.a.ii determines a window of cells v at time u that have concentrations similar to cell e at time t. The size of this window is selected by the parameter h, and cells with highly (weakly) similar concentrations are weighted highly (weakly) in the estimation of the coefficients of the NODE model. Equation 2 uses the Epanechnikov kernel to do this weighting. Note that weights for cells with very different concentrations can be similar, because the Euclidean distances computed in Step 3.a.ii can be similar. This does not cause problems because the NEDE estimator has been proven to be statistically well-behaved in the presence of such weighting schemes [41, 29, 30].
Step 3.a.iv uses the NEDE estimator in Equation 3 to compute the coefficients of the NODE model. It protects against over-fitting by learning constraints which the data obeys (Step 3.a.iii), and then using these constraints to reduce the degrees of freedom in the regression. In general, the data points x[t, e]form a manifold, and the projection matrix P[t, e]in Equation 3 enforces that the regression coefficients lie close to the manifold. This methodology is motivated by differential geometry which says that the exterior derivative of a function on an embedded submanifold lies in the cotangent space [44, 29, 30]. The NEDE estimator can be calculated quickly on a computer because it is a convex optimization problem. Theoretical properties and a more detailed description of the NEDE estimator can be found in [29, 30].
NODE model interpretation
Instead of using a single ODE model to describe the regulatory network, the NODE model uses a group of ODE models consisting of the first order Taylor expansion (i.e., linearization) of the ODE given in Equation 1. Each equation of the NODE model describes how the behaviour of the regulatory network changes if concentrations of the factors in cell e at time t are changed. It requires fewer assumptions or prior knowledge about the system, because it does not require knowing the mathematical structure of the single ODE model in Equation 1. The disadvantage of this approach is that it is more difficult to interpret a series of models. The full NODE model for formation of target eve mRNA is given by Equation 4, and there is a different equation for each cell e at time t. Though the NEDE estimator protects against over-fitting, some might feel that the NODE model over-fits. The predictive ability of the NODE model, as discussed in Results and Discussion, gives evidence that it does not over-fit. In that test, we used our algorithm on the first two time points of data, and we assumed that the model for cell e at times t = 3, 4, 5 was the same as the model for cell e at time t = 2.
Equation 4 is difficult to interpret because the coefficients vary depending on cell e at time t, due to the fact that each equation is a linearization that is valid for when factor concentrations are close to x[t, e]. The model describes how a change in factor concentrations in the presence of all factors (the right hand side of Equation 4), affects the change in time of eve (the left hand side of Equation 4). If d[eve]/dt is positive (negative), then eve concentration will increase (decrease) by the next instant of time. For example, suppose the concentrations of all species are kept fixed at except for the concentration of GT which is slightly increased from to . In this situation, the change in time of eve concentration will be given by d[eve]/dt = agt,[t, e]Δgt + b[t, e]. The increase of GT concentration by Δgt leads to a change in the change in time of eve concentration by agt,[t, e]Δgt amount, and the sign of agt,[t, e]describes whether this change is positive or negative.
Because this equation describes relationships in the presence of all factors, this can lead to seemingly contradictory results, such as when one species is a putative activator (e.g., BCD protein upregulates eve mRNA), but increasing the concentration of the activator in the presence of the other species can have a slight repressive effect because of interactions between the activator and the other factor species (i.e., the described concentration-dependent effects). Such a situation leads to an odd result: The coefficient of the "activator" will be negative.
and this model looks for the correlation of eve mRNA with protein factor concentrations. Whereas Equation 4 is a dynamical model, the model in Equation 5 is a static model, because it does not describe the temporal changes in eve concentration. A comparison of the fits between these models can be seen in Results and Discussion. Note that the coefficients in Equation 5 are computed with the algorithm for our NODE technique, with the change that Y is a vector of eve concentrations.
The first term is the coefficient from Equation 4, and the second term in parenthesis is a measure of average GT concentration within cells whose factor concentrations are similar to cell e at time t. The second term in parenthesis in Equation 6 is a measure of average concentrations, because it is a measure of the mean difference from the baseline concentration of x[t, e]. To clarify the notation, suppose the i-th value of x[t, e] denotes: x gt [t, e], which is GT concentration. Then the term [Ξ[t, e] T W[t, e]Ξ[t, e]] gt denotes the i-th value along the diagonal of the matrix Ξ[t, e] T W[t, e]Ξ[t, e].
Mathematical definition of factor activity classification in the NODE model
Type I Repression
Type II Repression
Type I Activation
Type II Activation
An example of a window is shown in Figure 11. The NODE method uses Equation 2 to take the similarity of the cells into account when doing the regression procedure. The size of the window is determined by the parameter h which is chosen using cross-validation, and it changes for each cell e at time t. As explained earlier, the statistical tools are well-behaved when weighting of cells within this window is computed with Euclidean distance.
To check that window sizes selected in a data-driven manner were not too small and missing important features, we did a check in which we fixed the windows to surround cell e at time t with a circle of radius of three cells at time t. This size was chosen, because the eve stripes are about six cells wide at Stage 5. A circular window with this size would not miss important regulatory features of the network. The [Additional file 1] shows plots of factor activity as generated by our NODE method for both data-selected and fixed windows. A visual comparison of the factor activity plots generated by these two windows shows that the data-selected windows were able to identify the same features as the fixed, circular window.
General time-series data
where (a) ξ[n] for n = 1, ..., N is a user-selected set of linearization points of Equation 1, (b) Aξ[n]= Df(ξ[n]) and bξ[n]= f(ξ[n]) are the coefficients of the model, and (c) Df is the gradient of f(x). The NODE technique is unchanged except e refers to different experiments (instead of different cells), and Equation 3 is applied column-wise to Y to give columns of the matrix of coefficients: [bξ[n] T Aξ[n] T ] T .
This work is part of a broader collaboration by the BDTNP. We are grateful for the frequent advice, support, criticisms, and enthusiasm of its members. The gene expression data were funded by the U.S. National Institutes of Health (NIH) under grants GM704403 (to MDB). Computational analyses were funded by National Science Foundation (NSF) award #CCR-0225610 (ITR), which supports the CHESS at UC Berkeley (to CT) and NSF awards #DMS-0605236 and #DMS-0906808 (to PB). Work at Lawrence Berkeley National Laboratory was conducted under Department of Energy contract DE-AC02-05CH11231.
- Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D: How to infer gene networks from expression profiles. Molecular Systems Biology 2007., 3(78):Google Scholar
- Markowetz F, Spang R: Inferring cellular networks - a review. BMC Bioinformatics 2007, 8(Suppl 6):S5. [http://www.biomedcentral.com/1471–2105/8/S6/S5] 10.1186/1471-2105-8-S6-S5View ArticlePubMedPubMed CentralGoogle Scholar
- Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. PNAS 1998, 98: 14863–14868. 10.1073/pnas.95.25.14863View ArticleGoogle Scholar
- Stuart J, Segal E, Koller D, Kim S: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302(5643):249–255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
- Bickel D: Probabilities of spurious connections in gene networks: application to expression time series. Bioinformatics 2005, 21(7):1121–1128. 10.1093/bioinformatics/bti140View ArticlePubMedGoogle Scholar
- Butte A, Kohane I: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing 2000, 418–429.Google Scholar
- Steuer R, Kurths J, Daub C, Weise J, Selbig J: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 2002, 18(Suppl 2):S231-S240.View ArticlePubMedGoogle Scholar
- Rao A, Hero A, States D, james Engel: Using Directed Information to Build Biologically Relevant Influence Networks. Journal of Bioinformatics and Computational Biology 2008, 6(3):493–519. 10.1142/S0219720008003515View ArticlePubMedGoogle Scholar
- D'haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pacific Symposium on Biocomputing 1999, 41–52.Google Scholar
- Stephen Yeung MK, Tegnér J, Collins J: Reverse engineering gene networks using singular value decomposition and robust regression. PNAS 2002, 99(9):6163–6168. 10.1073/pnas.092576199View ArticleGoogle Scholar
- Bonneau R, Reiss D, Shannon P, Facciotti M, Hood L, Baliga N, Thorsson V: The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology 2006, 7(5):R36. [http://genomebiology.com/2006/7/5/R36] 10.1186/gb-2006-7-5-r36View ArticlePubMedPubMed CentralGoogle Scholar
- de Jong H, Ropers D: Qualitative approaches towards the analysis of genetic regulatory networks. In System Modeling in Cellular Biology: From Concepts to Nuts and Bolts. Edited by: Szallasi Z, Periwal V, Stelling J. MIT Press; 2006:125–148.View ArticleGoogle Scholar
- Porreca R, Drulhe S, de Jong H, Ferrari-Trecate G: Structural identification of piecewise-linear models of genetic regulatory networks. Journal of Computational Biology 2008, 15(10):1365–1380. 10.1089/cmb.2008.0109View ArticlePubMedGoogle Scholar
- Cinquemani E, Milias-Argeitis A, Summers S, Lygeros J: Local Identification of Piecewise Deterministic Models of Genetic Networks. Hybrid Systems: Computation and Control 2009, 105–119. full_textView ArticleGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(3):601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
- Yu J, Smith V, Wang P, Hartemink A, Jarvis E: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20(18):3594–3603. 10.1093/bioinformatics/bth448View ArticlePubMedGoogle Scholar
- Xiang Z, Minter R, Bi X, Woolf P, Hechmer Y: miniTUBA: medical inference by network integration of temporal data using Bayesian analysis. Bioinformatics 2007, 23(18):2423–2432. 10.1093/bioinformatics/btm372View ArticlePubMedGoogle Scholar
- Friedman N: Inferring cellular networks using probabilistic graphical models. Science 2004, 303(5659):799–805. 10.1126/science.1094068View ArticlePubMedGoogle Scholar
- Werhli A, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics 2006, 22(20):2523–2531. 10.1093/bioinformatics/btl391View ArticlePubMedGoogle Scholar
- Fowlkes C, Hendriks CL, Keränen S, Weber G, Rübel O, Huang MY, Chatoor S, Simirenko L, DePace A, Henriquez C, Beaton A, Weiszmann R, Celniker S, Hamann B, Knowles D, Biggin M, Eisen M, Malik J: Constructing a quantitative spatio-temporal atlas of gene expression in the Drosophila blastoderm. Cell 2008, 133(2):364–374. 10.1016/j.cell.2008.01.053View ArticlePubMedGoogle Scholar
- Small S, Blair A, Levine M: Regulation of Two Pair-Rule Stripes by a Single Enhancer in the Drosophila Embryo. Developmental Biology 1996, 175(2):314–324. 10.1006/dbio.1996.0117View ArticlePubMedGoogle Scholar
- Fujioka M, Emi-Sarker Y, Yusibova G, Goto T, Jaynes J: Analysis of an even-skipped rescue transgene reveals both composite and discrete neuronal and early blastoderm enhancers, and multi-stripe positioning by gap gene repressor gradients. Development 1999, 126: 2527–2538.PubMedPubMed CentralGoogle Scholar
- Arnosti D, Barolo S, Levine M, Small S: The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 1996, 122: 205–214.PubMedGoogle Scholar
- Turing A: The Chemical Basis of Morphogenesis. Philosophical Transactions of the Royal Society B 1952, 237: 37–72. 10.1098/rstb.1952.0012View ArticleGoogle Scholar
- Baianu I: Computer Models and Automata Theory in Biology and Medicine. Mathematical Modelling 1987, 7: 1513–1577. 10.1016/0270-0255(86)90089-8View ArticleGoogle Scholar
- von Dassow G, Meir E, Munro E, Odell G: The segment polarity network is a robust developmental module. Nature 2000, 406: 188–192. 10.1038/35018085View ArticlePubMedGoogle Scholar
- Jaeger J, Blagov M, Kosman D, Kozlov K, Myasnikova E, Surkova S, Vanario-Alonso C, Samsonova M, Sharp D, Reinitz J: Dynamical Analysis of Regulatory Interactions in the Gap Gene System of Drosophila melanogaster . Genetics 2004, 167: 1721–1737. 10.1534/genetics.104.027334View ArticlePubMedPubMed CentralGoogle Scholar
- Murray J: Mathematical Biology. Springer. 2007.Google Scholar
- Aswani A, Bickel P, Tomlin C: Statistics for sparse, high-dimensional, and nonparametric system identification. Proceedings of the IEEE International Conference on Robotics and Automation 2009, 2133–2138.Google Scholar
- Aswani A, Bickel P, Tomlin C: Regression on Manifolds: Estimation of the Exterior Derivative. Annals of Statistics 2010. [To appear].Google Scholar
- Luengo Hendriks C, Keränen S, Fowlkes C, Simirenko L, Weber G, DePace A, Henriquez C, Kaszuba D, Hamann B, Eisen M, Malik J, Sudar D, Biggin M, Knowles D: Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: Data acquisition pipeline. Genome Biology 2006, 7(12):R123. [http://genomebiology.com/2006/7/12/R123] 10.1186/gb-2006-7-12-r123View ArticlePubMedPubMed CentralGoogle Scholar
- Harding K, Hoey T, Warrior R, Levine M: Autoregulatory and gap gene response elements of the even-skipped promoter of Drosophila . EMBO J 1989, 8(4):1205–1212.PubMedPubMed CentralGoogle Scholar
- Marco A, Konikoff C, Karr T, Kumar S: Relationship between gene co-expression and sharing of transcription factor binding sites in Drosophila melanogaster . Bioinformatics 2009, 25(19):2473–2477. 10.1093/bioinformatics/btp462View ArticlePubMedPubMed CentralGoogle Scholar
- Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J: Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genetics 2006, 38: 1159–1165. 10.1038/ng1886View ArticlePubMedGoogle Scholar
- Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U: Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 2008, 451: 535–540. 10.1038/nature06496View ArticlePubMedGoogle Scholar
- Fakhouri W, Ay A, Sayal R, Dresch J, Dayringer E, Arnosti D: Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo. Molecular Systems Biology 2010., 6(341):Google Scholar
- Ptashne M, Jeffrey A, Johnson A, Maurer R, Meyer B, Pabo C, Roberts T, Sauer T: How the lambda repressor and cro work. Cell 1980, 19: 1–11. 10.1016/0092-8674(80)90383-9View ArticlePubMedGoogle Scholar
- Sauer F, Jäckle H: Concentration-dependent transcriptional activation or repression by Krüppel from a single binding site. Nature 1991, 353: 563–566. 10.1038/353563a0View ArticlePubMedGoogle Scholar
- Li Xy, MacArthur S, Bourgon R, Nix D, Pollard DA, Iyer VN, Hechmer A, Simirenko L, Stapleton M, Hendriks CLL, Chu HC, Ogawa N, Inwood W, Sementchenko V, Beaton A, Weiszmann R, Celniker SE, Knowles DW, Gingeras T, Speed TP, Eisen MB, Biggin MD: Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS biology 2008, 6(2):e27. 10.1371/journal.pbio.0060027View ArticlePubMedPubMed CentralGoogle Scholar
- Bickel P, Li B: Local Polynomial Regression on Unknown Manifolds. Complex Datasets And Inverse Problems: Tomography, Networks And Beyond, Volume 54 of IMS Lecture Notes-Monograph Series, Institute of Mathematical Studies 2007.Google Scholar
- Ruppert D, Wand M: Multivariate Locally Weighted Least Squares Regression. Annals of Statistics 1994, 22(3):1346–1370. 10.1214/aos/1176325632View ArticleGoogle Scholar
- Savitzky A, Golay MJ: Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 1964, 36: 1627–1639. 10.1021/ac60214a047View ArticleGoogle Scholar
- Schneeweiß H: Consistent estimation of a regression with errors in variables. Metrika 1976, 23: 101–115. 10.1007/BF01902854View ArticleGoogle Scholar
- Lee J: Introduction to Smooth Manifolds. Springer. 2003.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.