 Methodology article
 Open Access
A modelbased optimization framework for the inference of regulatory interactions using timecourse DNA microarray expression data
 Reuben Thomas^{1},
 Carlos J Paredes^{4, 5},
 Sanjay Mehrotra^{2},
 Vassily Hatzimanikatis^{3}Email author and
 Eleftherios T Papoutsakis^{4, 6}Email author
https://doi.org/10.1186/147121058228
© Thomas et al; licensee BioMed Central Ltd. 2007
Received: 20 October 2006
Accepted: 29 June 2007
Published: 29 June 2007
Abstract
Background
Proteins are the primary regulatory agents of transcription even though mRNA expression data alone, from systems like DNA microarrays, are widely used. In addition, the regulation process in genetic systems is inherently nonlinear in nature, and most studies employ a timecourse analysis of mRNA expression. These considerations should be taken into account in the development of methods for the inference of regulatory interactions in genetic networks.
Results
We use an Ssystem based model for the transcription and translation process. We propose an optimizationbased regulatory network inference approach that uses timevarying data from DNA microarray analysis. Currently, this seems to be the only modelbased method that can be used for the analysis of timecourse "relative" expressions (expression ratios). We perform an analysis of the dynamic behavior of the system when the number of experimental samples available is varied, when there are different levels of noise in the data and when there are genes that are not considered by the experimenter. Our studies show that the principal factor affecting the ability of a method to infer interactions correctly is the similarity in the time profiles of some or all the genes. The less similar the profiles are to each other the easier it is to infer the interactions. We propose a heuristic method for resolving networks and show that it displays reasonable performance on a synthetic network. Finally, we validate our approach using real experimental data for a chosen subset of genes involved in the sporulation cascade of Bacillus anthracis. We show that the method captures most of the important known interactions between the chosen genes.
Conclusion
The performance of any inference method for regulatory interactions between genes depends on the noise in the data, the existence of unknown genes affecting the network genes, and the similarity in the time profiles of some or all genes. Though subject to these issues, the inference method proposed in this paper would be useful because of its ability to infer important interactions, the fact that it can be used with timecourse DNA microarray data and because it is based on a nonlinear model of the process that explicitly accounts for the regulatory role of proteins.
Keywords
 Expression Ratio
 Mass Balance Equation
 Inference Method
 Bacillus Anthracis
 Synthetic Network
1. Background
Inference of regulatory interactions in a genetic system provides fundamental biological knowledge and significant efforts have been invested for the solution of this problem, [1–23]. The method we propose in this paper improves upon the previous contributions to the solution of this problem: it employs a more realistic model, it reduces the effect of noise on the solution obtained, it avoids the costly step involving numerical integration and, significantly, it explicitly utilizes gene expression ratios, which are typically the primary data of microarraybased gene expression analysis. Here, we use an Ssystem based [24–29] model that explicitly accounts for proteins serving as regulatory agents. It also accounts for the nonlinear dependency of transcription rates in the protein concentrations. We are solely dealing with gene expression data in view of the fact that reasonablycomplete proteomic data are not readily available. We used the same model as in our previous work, [12] for the development of a method for gene regulatory network inference based on steady state gene expression ratio data. In this paper, a heuristic solution for the problem is given, as dictated by the Ssystem based model and timevarying gene expression ratio data. The computational complexity of the method is exponential in the number of genes in the system. However if a subset of the interactions were already known to exist, then the method could be used on networks with a larger number of genes. The impact of noise in the data is reduced by using smoothing splines as approximations to the time profiles of gene expression.
The model used in this paper shares similarity with inference methods based on Ssystem models [11–15]. However, these earlier methods do not consider the effect of proteins (whose concentrations are not measured) in regulating gene expression. Also, every evaluation of the objective function set up in [11] and [13] for optimization required the integration of a set of differential equations. This integration can be costly in terms of computational resources, as was pointed out in [28] and [29].
Related to the methods based on the Ssystem models are methods based on linear differential equations [16–19]. The methods of Refs. [17] and [19] involve a least square fitting approach, but their models do not involve protein concentrations. Dasika et al. [2] used a linear regulatory model but allowed the current gene expressions to depend on the levels of gene expression of the previous time points. This time delay of the action of an mRNA on the transcription rates may capture the delay due to the proteintranslation process and possible protein modification events like glycosylation, phosphorylation, methylation etc. However, the value of the timedelay parameter cannot be mapped easily to the biophysical and biochemical process it represents. The model presented here directly accounts for the protein translation process and thus there is an implicit timedelay in the regulation of gene expression. The model used in Ref. [18] involves both mRNA and protein concentrations. However, the authors assume that all protein concentrations can be measured. The work in Refs. [20–23] are representative of methods which analyze the time course gene expression data using a Bayesian network framework. This framework assumes a linear model between geneexpression levels at multiple time points and hence is similar, conceptually, to the one used in Ref. [16].
Most of the previous modelbased methods (Eg. [11, 13, 16, 19]) assume that the geneexpression data are available as absolute concentrations and they also assume linear, additive action of the regulatory mRNAs on the transcription rates. The method presented here is tailored for the analysis of relative gene expression data, and it can be regarded as a nonlineargeneralization of the previous models. Apart from these models, there are modelbased identification methods that include even broader description of cellular processes by including models for metabolic processes [14]. However, the applicability of such models is restricted to smaller systems because of the complexity involved due to experimental measurements and computational requirements.
Here we describe a modelbased inference approach of the regulatory network of a genetic system using timevarying mRNAexpression ratios obtained from experiments involving DNA microarrays. We employ an Ssystem approach to model the transcription and translation processes and, propose an optimizationbased regulatory network inference method. The method is tested using synthetic data from a model genetic network of genes, and is applied on expression data of a core subset of genes involved in the sporulation cascade of the prokaryote Bacillus anthracis.
2. Results
2.1 Dynamic regulation model and its characteristics
where V_{sm,i}and V_{dm,i}denote the rates of synthesis and degradation rates of the i^{ th }mRNA, V_{sp,i}and V_{dp,i}denote the rates of synthesis and degradation rates of the i^{ th }protein, α_{ i }and γ_{ i }denote the transcription and translation rate constants, and β_{ i }and δ_{ i }are the firstorder decay constants of the mRNA and protein, respectively. The real parameters ε_{ig} quantify the strength of regulatory control exerted by the activity of protein g on the synthesis rate of mRNA i. If ε_{ig} is equal to zero, protein g does not affect the expression of gene i, and if ε_{ig} is positive (negative), then protein g induces (represses) the expression of gene i. A discussion on the ranges of these parameters can be found Appendix 1 of [12] and [29].
2.2 Derivation of the optimization method
The basic goal is to quantify the strengths of regulatory interactions and rate constants that best fit the dynamic model described by Equation (1) to a given set of timecourse, geneexpression data. We consider a network of n genes, which are perturbed at some time before t = 0, from t = 0 onwards there are no external perturbations, and the mRNA and protein concentrations change continuously over time.
where is a reference state for gene i.
Protein concentrations are not directly observable, unless an accurate proteomics technology is used [31, 32], and therefore we employ the following novel methodology that utilizes smoothing cubic splines, [33]. We fit smoothing splines through the gene expression ratios at different time points and use them to predict the protein concentrations. It is analytically possible to do this because of the polynomial forms of the splines. As a result, we can avoid the expensive steps of numerical integration during the parameter estimation stage. The concentration of protein i at time t can be written in the following form:
p_{ i }(t) = p_{ i }(0)f_{ i }(δ_{ i }, t) + γ_{ i }h_{ i }(δ_{ i }, t) (2)
where p_{ i }(0) is the initial concentration of protein i, f_{ i }, and h_{ i }are nonlinear functions of δ_{ i }and time, t, derived using the splines fitted to the geneexpression data in the mass balance equations for the proteins. The initial protein concentrations p_{ i }(0) and the reference states are also unknown parameters. Estimates of the decay constants β_{ i }and δ_{ i }can be obtained from the available halflife of mRNAs and proteins [34, 35] and also see Section A6.2, Additional file. In the following analysis, we will assume that these decay constants are known. The derivation of Equation (2) is given in the Section 4.1.
subject to
DY_{ ij }≤ ε_{ ij }≤ DY_{ ij }, i, j = 1,2,...,n (5)
Y_{ ij }∈ {0,1}, i, j = 1,2,...,n (7)
α_{ i }, γ_{ i }, , P_{ i }(0) ≥ 0, i = 1,2,...,n (8)
where, is the error term which is an approximate restatement of the mass balance equations of all the n genes at N_{ t }discrete points in time (see Section 4.2 for the derivation of this term). represents the vector of regulatory interactions affecting gene i and   represents its Euclidean norm. τ_{ i }is a regularization parameter for each gene i. Regularization [36] of the formulation can be used when the quality of the timeseries data leads to illconditioned systems. Regularization has been also used in the network inference method proposed by Kikuchi et al. [13], and by Gardner et al. [3] in the form of ridge regression. However, if the data do not lead to an ill conditioned system, such regularization is not necessary and the regularization parameters τ_{ i }are set equal to zero. Therefore, the objective in Equation (3) minimizes the sum of the error in fitting the model to the experimental data, and the weighted norm of the strength of the interactions.
Y_{ ij }is a binary variable which is equal to 1 when gene j interacts with gene i or zero otherwise. D in Equation (5) is some positive number that limits the strength of an interaction. This constant can either be assigned a number based on prior biological knowledge (for typical kinetics, it can be set equal to one for MichaelisMenten kinetics, or up to 4 (for the usual tetramerdependent cooperative kinetics) or set equal to an arbitrarily large number. Constraints (5), (6) and (7) enforce the assumption that each gene is regulated by not more than k other genes as has been explained in [16] and [12]. Constraint (8) guarantees the nonnegativity requirements of the other unknowns.
2.3 Coordinate descent based heuristic method
subject to
DY_{ ij }≤ ε_{ ij }≤ DY_{ ij }, j = 1,2,...,n (10)
log(α_{ i }) ≥ A (13)
where A is some large positive number.
2. Fix the values of and log(α_{ i }) determined from Step 1 and solve the following optimization problem in the three sets of parameters, γ_{ i }, , and p_{ i }(0).
subject to,
γ_{ i }, , p_{ i }(0) ≥ 0, i = 1,2,...,n (15)
Our numerical studies suggest that the improvements attained by increasing the number of repetitions of the above two steps are marginal (Figure A.1, Additional file), i.e., a relatively small value for N_{ l }may be good enough.
Different initial guesses would potentially lead to different solutions, and the proposed method does not guarantee finding the globally optimal solutions. A procedure of reporting the best solution considers the network of interactions derived from all the collected solutions, i.e., of similar optimum objective function values, by accepting an interaction to be present if it is inferred to be present in the majority of the solutions. Since the set of optimal solutions can be considered as alternative networks that are consistent with the experimental data, an interaction can be considered physiologically significant if it occurred in the majority of the solutions.
2.4 Parameters and issues affecting the performance of the algorithm
The applicability of any genetic network inference method is affected by a number of factors (see Ref. [39] for a mathematical description of such factors). We used 10gene synthetic networks (see Section A3, Additional file) to generate data that are used to study the performance of the algorithm. The following six factors which are known to have an effect on the performance of inference algorithms were studied: (i) the degree of similarity between the time profiles of the expression of different genes; (ii) the number of experimental samples available; (iii) noise in the data; (iv) interactions involving genes that do not show significant variation; (v) the parameters N_{ l }(the number of iterations), and N_{ t }(the number of discretization time points in the objective function) of the heuristic method; and (vi) missing genes from the analysis.
The timeseries data were obtained by the integration of the Ssystem of differential equations (1) in MATLAB [40], for different values of the parameters. The mixedinteger nonlinear solver of LINDO [41] was used in all the optimization problems. In all the numerical studies in this section only, we assume that the parameters γ_{ i }, , p_{ i }(0) are known. This will allow us to correctly base our conclusions using globally optimal solutions.
The degree of similarity between the different time profiles of mRNA expression is an important determinant of the amount of information present in the data. We studied three different types of networks that we labeled "Low", "Medium" and "High" according to the degree of similarity between the different profiles which we quantified using the condition number of the matrix Φ formed by the logarithm of protein concentrations at each time point:
Φ = [log(p_{ gj })], g = 1,...,n, j = 1,2,...,N_{ t }
Variation of correct identifications with increasing levels of noise in data
% Error  % Correct 

0.01  70.8 ± 0.2 
s1  44.1 ± 0.2 
5  41.4 ± 0.2 
10  38.2 ± 0.2 
20  34.8 ± 0.2 
50  31.9 ± 0.2 
The DNA microarray technology tends to suppress the measured expression ratios [44], and some of the gene expression profiles do not show much variation, i.e., they are more or less constant over the period of observation. Therefore, it is unlikely that the algorithm, due to numerical constraints, will infer interactions involving these genes, since these interactions will be absorbed in the parameter α that quantifies the transcription rate constant. Therefore, in order to make the experimental geneexpression profiles more suitable for the geneticnetwork inference method, we examined the possibility of rescaling all logarithmic expression ratios by a constant factor or raise all expression ratios to a certain power. This way the larger expression ratios (>1) become larger while the smaller expression ratios (<1) become smaller.
If all the interacting genes in a network are not considered for analysis by an inference method, then incorrect interactions are likely to be identified. If we remove genes that contribute to significant regulatory interactions, the number of incorrect identifications would increase (see Table A.7, Additional file). Finally, relatively small errors in the estimates of the halflives of the mRNAs and proteins cause only a modest deterioration in the performance of the method (see Table A.8, Additional file).
2.5 Algorithm testing
2.5.1 Synthetic networks
The algorithm was tested using data from the "Low" network with 1000 "experimental" samples. There was no noise in the data but the parameters γ_{ i }, , p_{ i }(0) were now assumed to be unknown, i.e., the problem was nonconvex. We found 7 solutions using the Coordinate descent heuristic method starting from 7 random initial guesses and all solutions converged to similar (O(1)) objective function values. The best solution was identified as the one whose interactions occurred in the majority of the solutions. The '4 out of 7' solution identified 11/30 interactions correctly while the '5 out of 7' solution identified 10/30 interactions correctly. If the method identifying the interactions were random, and since we have assumed a 10gene network with 3 regulatory inputs for each gene, an interaction will be identified as inducing with probability 15%, repressing with probability 15% and absent with probability 70%. Therefore the average number of correct interactions identified will be 15% (4.5/30), suggesting that the heuristic method we are using is doing better than a random method.
Note that we found that about 30% of the interactions could be identified with a large amount of noise, or when the parameters γ_{ i }, , p_{ i }(0) are unknown. The results here give a similar coverage indicating that these 10/30 interactions are not just robust to noise but are also important in the sense that they are captured in all the solutions found.
To give an estimate of the time required to obtain a solution, it took about 15 hours to obtain 5 solutions, each running in parallel on a P4, 2 GHz, and 1 Gig RAM PC. Since the main emphasis of this study was not to obtain a computationally efficient method, only this estimate of the time taken for a solution is provided. Note that the computational complexity of the heuristic method is of O(n e^{ n }).
2.5.2 A network from the sporulation cascade of Bacillus anthracis
2.5.2.1 Background
Bacillus anthracis is an endosporeforming bacterium (a prokaryote) that is responsible for the anthrax disease. Under environmentalstress conditions, like most bacilli, it commits to sporulation via the bacillus endospore program. Mature spores can survive many extreme conditions, thus assuring species survival. When conditions are suitable, the endospore germinates and the organism then can begin to grow again.
2.5.2.2 Data and choice of genes

kinA had an insufficient number of usable data points.

Genes like kinD, abrB, codY exhibited insufficient variation.

The transcriptional and total protein levels of Spo0A are not relevant; rather, it is its activation (phosphorylated Spo0A, Spo0A~P) that matters, and, in the absence of reliable kinA data, this is better captured by the expression of spo0F. Thus, we will use spo0F expression to represent Spo0A~P and is shown below as spo0F/Spo0A~P.

Genes with similar profiles were not considered. E.g. spoIIAA and spoIIAB had similar profiles to sigF and sigE. spoIVB had a similar profile to sigK.

The variation of spoIIGA did not correspond to what is known about its role in the cascade. From prior biological knowledge [46], one would have expected spoIIGA to have a profile similar to those of sigF and sigE.
The chosen subset of 9 genes consists of spo0F/ Spo0A~P, sigF, sigE, spoIIIJ, sigG, spoIVFB, spoIIID, sigK and gerE. Refer to Section A5.2, Additional file for a discussion of the biological basis of this choice.
2.5.2.3 Inferred interactions
In the experiments [45] to generate the B. anthracis microarray data, the reference state (parameter , in terms of copies of mRNA per cell) for the expression of gene i was taken to be the average of equal amounts of samples drawn at each of the time points over the course of the experiment. If a gene was expressed only for a short period of time, then the expression ratios during this period would be relatively high. But if the expression of a gene changed slightly over the entire course of the experiment, then the expression ratios would show only very small variations around the value of 1. Because of this and because, as stated, DNAmicroarray analysis underestimates the true expression ratios, the logexpression ratio data were scaled by a factor of 2 (see Section 2.4) in order to accentuate the variations within each profile. Smoothing splines were fit to the expression ratio data derived from these scaled logexpression ratio ones. The graphs of these smoothing splines along with the units and the bounds on the different unknown parameters involved in the optimization problem are given in Section A6, Additional file.
Identified interactions among subset of genes in B. anthracis
spo0F/ Spo0A~P  sigF  sigE  spoIIIJ  sigG  spoIVFB  spoIIID  sigK  gerE  

spo0F/Spo0A~P  0  0  0  *  0  0  1  0  1 
SigF  1  0  0  0  0  *  1  0  0 
SigE  0  1  0  *  *  0  *  0  0 
SpoIIIJ  1  0  0  0  *  0  *  *  0 
SigG  1  0  1  0  0  0  0  0  * 
spoIVFB  *  0  1  0  0  0  *  1  0 
SpoIIID  1  1  1  0  0  0  0  0  0 
SigK  *  0  0  *  0  1  0  1  0 
GerE  0  0  0  0  1  1  0  1  0 
Overall, the algorithm was able to identify many important interactions based on this set of experimental data. While we can assess the effect of all the factors discussed in Section 2.4 on the specific set of experimental data, we propose that the missing genes/signals are probably mainly responsible for the incorrect interaction identifications for the 'start' gene (Spo0A) and for those responsible for shutting down the expression of various genes.
3. Conclusion
We have developed a regulatory inference method that can be used on dynamic, timecourse expression data such as those obtained from DNA microarray analysis. The method takes into account the nonlinear regulatory roles of the corresponding proteins in the system. We validated our approach on a synthetic network and on a set of genes that are involved in the sporulation cascade of B. anthracis. We did not consider the impact of external perturbations during the course of the experiment. However, the extension of our approach to include this case would be straightforward if we assume that the external perturbations can be modeled as artificial genes that are not influenced by any of the genes involved in the study.
The ability of the method to generate a set of alternative regulatory networks that are consistent with the experimental data allows a broader analysis of a system when the number of experimental samples is low and the degree of similarity between the time profiles of different genes is high.
4. Methods
4.1 Prediction of protein concentrations
So now we have approximations to M_{ i }(t) and p_{i}(t) for any time t in the interval [0, T]. The error in the approximation of M_{ i }(t) is O(T × N_{s}^{4}) [33] while the error in approximation of p_{i}(t) is the sum O(γ × T × N_{s}^{4}) and the error in the estimation of the initial protein concentration, see Section A2, Additional file.
4.2 Derivation of objective function for optimization problem
Declarations
Acknowledgements
Supported by a grant from the National Institutes of Health (NIH R01GM065476). RT acknowledges communications with Ranjith Nair on the mathematical analysis in the Additional file.
Authors’ Affiliations
References
 Akutsu T, Miyano S, Kuhara S: Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics. 2000, 16: 727734. 10.1093/bioinformatics/16.8.727.View ArticlePubMedGoogle Scholar
 Di Bernardo D, Gardner TS, Collins JJ: Robust identification of large genetic networks. Pac Symp Biocomput. 2004, 9: 486497.Google Scholar
 Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301: 102105. 10.1126/science.1081900.View ArticlePubMedGoogle Scholar
 Ideker TE, Thorsson V, Karp RM: Discovery of regulatory interactions through perturbation: Inference and experimental design. Pac Symp Biocomput. 2000, 5: 302313.Google Scholar
 Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001, 292: 929934. 10.1126/science.292.5518.929.View ArticlePubMedGoogle Scholar
 Noda K, Shinohara A, Takeda M, Matsumoto S, Miyano S, Kuhara S: Finding genetic network from experiments by weighted network model. Genome Inform Ser Workshop Genome Inform. 1998, 9: 141150.PubMedGoogle Scholar
 Moriyama T, Shinohara A, Takeda M, Maruyama O, Goto T, Miyano S, Kuhara S: A system to find genetic networks using weighted network model. Genome Inform Ser Workshop Genome Inform. 1999, 10: 186195.PubMedGoogle Scholar
 Wu FX, Zhang FX, Kusalik AJ: Modeling gene expression from microarray expression data with statespace equations. Pac Symp Biocomput. 2004, 581592.Google Scholar
 Liang S, Fuhrman S, Somogyi R: REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998, 3: 1829.Google Scholar
 Lin X, Floudas CA, Wang Y, Broach JR: Theoretical and computational studies of the glucose signaling pathways in yeast using global gene expression data. Biotechnol Bioeng. 2003, 84: 864886. 10.1002/bit.10844.View ArticlePubMedGoogle Scholar
 Maki Y, Tominaga D, Okamoto M, Watanabe S, Eguchi Y: Development of a system for the inference of large scale genetic networks. Pac Symp Biocomput. 2001, 446458.Google Scholar
 Thomas R, Mehrotra S, Papoutsakis ET, Hatzimanikatis V: A modelbased optimization framework for the inference of gene regulatory networks from DNA microarray data. Bioinformatics. 2004, 20 (17): 32213235. 10.1093/bioinformatics/bth389.View ArticlePubMedGoogle Scholar
 Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and Ssystem. Bioinformatics. 2003, 19 (5): 643650. 10.1093/bioinformatics/btg027.View ArticlePubMedGoogle Scholar
 Almeida JS, Voit EO: NeuralNetworkBased Parameter Estimation in SSystem Models of Biological Networks. Genome Informatics. 2003, 14: 114123.PubMedGoogle Scholar
 Tsai KY, Wang FS: Evolutionary optimization with data collocation for reverse engineering of biological networks. Bioinformatics. 2005, 21 (7): 11801188. 10.1093/bioinformatics/bti099.View ArticlePubMedGoogle Scholar
 Dasika M, Gupta A, Maranas CD, Varner JD: A mixed integer linear programming (MILP) framework for inferring time delay in gene regulatory networks. Pac Symp Biocomput. 2004, 9: 474485.Google Scholar
 D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mrna expression levels during cns development and injury. Pac SympBiocomput. 1999, 4: 4152.Google Scholar
 Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pac Symp Biocomput. 1999, 4: 2940.Google Scholar
 Bansal M, Gatta GD, Di Bernardo D: Inference of gene regulatory networks and compound modes of action from time course gene expression profiles. Bioinformatics. 2006, 22: 815822. 10.1093/bioinformatics/btl003.View ArticlePubMedGoogle Scholar
 Imoto S, Goto T, Miyano S: Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput. 2002, 7: 175186.Google Scholar
 Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003, 19: 22712282. 10.1093/bioinformatics/btg313.View ArticlePubMedGoogle Scholar
 Yu J, Smith V, Wang P, Hartemink A, Jarvis E: Advances to Bayesian Network Inference for Generating Causal Networks from Observational Biological Data. Bioinformatics. 2004, 20: 35943603. 10.1093/bioinformatics/bth448.View ArticlePubMedGoogle Scholar
 Yamanaka T, Toyoshiba H, Sone H, Parham FM, Portier CJ: The TAOGen Algorithm for Identifying Gene Interaction Networks with Application to SOS repair in E. coli . Toxicogenomics. 2004, 112 (16): 16141621.View ArticleGoogle Scholar
 Savageau MA: Biochemical systems analysis, I. Some mathematicalproperties of the rate law for the component enzymatic reactions. J Theor Biol. 1969, 25: 370379. 10.1016/S00225193(69)800263.View ArticlePubMedGoogle Scholar
 Savageau MA: Biochemical systems analysis, II. The steadystate solutions for an npool system using a powerlaw approximation. J Theor Biol. 1969, 25: 365369. 10.1016/S00225193(69)800263.View ArticlePubMedGoogle Scholar
 Savageau MA: Biochemical Systems Analysis. 1976, Addison Wesley Longman Publishing CoGoogle Scholar
 Savageau MA: Rules for the evolution of gene circuitry. Pac Symp Biocomput. 1998, 3: 5465.Google Scholar
 Voit EO: Canonical Nonlinear Modeling – SSystem Approach to Understanding Complexity. 1991, New York: Van Nostrand ReinholdGoogle Scholar
 Voit EO: Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists. 2000, Cambridge University Press, CambridgeGoogle Scholar
 Voit EO, Almeida JS: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004, 20 (11): 16701681. 10.1093/bioinformatics/bth140.View ArticlePubMedGoogle Scholar
 Hatzimanikatis V, Choe LH, Lee KH: Proteomics: Theoretical and Experimental Considerations. Biotechnology Progress. 1999, 15 (3): 312318. 10.1021/bp990004b.View ArticlePubMedGoogle Scholar
 Pandey A, Mann M: Proteomics to study genes and genomes. Nature. 2000, 405 (6788): 83746. 10.1038/35015709.View ArticlePubMedGoogle Scholar
 de Boor C: A Practical Guide to Splines. 1978, New York: SpringerVerlagView ArticleGoogle Scholar
 Hambraeus G, von Wachenfeldt C, Hederstedt L: Genomewide survey of mRNA halflives in Bacillus subtilis identifies extremely stable mRNAs. Mol Genet Genomics. 2003, 269 (5): 70614. 10.1007/s0043800308836.View ArticlePubMedGoogle Scholar
 Varshavsky A: The Nend rule: functions, mysteries, uses. Proc Natl Acad Sci USA. 1996, 93 (22): 121429. 10.1073/pnas.93.22.12142.PubMed CentralView ArticlePubMedGoogle Scholar
 Björck A: Numerical Methods for Least Squares Problems. 1996, Philadelphia: SIAMView ArticleGoogle Scholar
 Floudas CF: Deterministic Global Optimization: Theory, Methods and Applications. 2005, MA: Kluwer Academic PublishersGoogle Scholar
 Polisetty PK, Voit EO, Gatzke EP: Identification of metabolic system parameters using global optimization methods. Theoretical Biology and Medical Modelling. 2006, 3: 410.1186/1742468234.PubMed CentralView ArticlePubMedGoogle Scholar
 Wessels LFA, van Someren EP, Reinders MJT: A comparison of genetic network models. Pac Symp Biocomput. 2000, 6: 508519.Google Scholar
 MATLAB, MathWorks, Natick, MA, USA.Google Scholar
 Schrage L: Optimization Modeling with Lindo. 1997, Duxberry PressGoogle Scholar
 Craven P, Wahba G: Smoothing Noisy Data with Spline Functions: Estimating the Correct Degree of Smoothness by the Method of Generalized Cross Validation. Journal of Numerical Mathematics. 1979, 31: 377403.View ArticleGoogle Scholar
 R development core team: R: A Language and Environment for Statistical Computing. 2006, R Foundation for Statistical Computing. Vienna AustriaGoogle Scholar
 Yang H, Haddad H, Tomas C, Alsaker K, Papoutsakis ET: A segmental nearest neighbor normalization and gene identification method gives superior results for DNAarray analysis. Proc Natl Acad Sci. 2003, 100 (3): 11227. 10.1073/pnas.0237337100.PubMed CentralView ArticlePubMedGoogle Scholar
 Liu H, Bergman NH, Thomason B, Shallom S, Hazen A, Crossno J, Rasko DA, Ravel J, Read TD, Peterson SN, Yates J, Hanna PC: Formation andComposition of the Bacillus Anthracis Endospore. Journal of Bacteriology. 2004, 186 (1): 164178. 10.1128/JB.186.1.164178.2004.PubMed CentralView ArticlePubMedGoogle Scholar
 Paredes CJ, Alsaker KV, Papoutsakis ET: A comparative genomic view of clostridial sporulation and physiology. Nat Rev Microbiol. 2005, 3 (12): 96978. 10.1038/nrmicro1288.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.