Thermodynamically consistent Bayesian analysis of closed biochemical reaction systems
- Garrett Jenkinson^{1},
- Xiaogang Zhong^{2} and
- John Goutsias^{1}Email author
https://doi.org/10.1186/1471-2105-11-547
© Jenkinson et al; licensee BioMed Central Ltd. 2010
Received: 21 April 2010
Accepted: 5 November 2010
Published: 5 November 2010
Abstract
Background
Estimating the rate constants of a biochemical reaction system with known stoichiometry from noisy time series measurements of molecular concentrations is an important step for building predictive models of cellular function. Inference techniques currently available in the literature may produce rate constant values that defy necessary constraints imposed by the fundamental laws of thermodynamics. As a result, these techniques may lead to biochemical reaction systems whose concentration dynamics could not possibly occur in nature. Therefore, development of a thermodynamically consistent approach for estimating the rate constants of a biochemical reaction system is highly desirable.
Results
We introduce a Bayesian analysis approach for computing thermodynamically consistent estimates of the rate constants of a closed biochemical reaction system with known stoichiometry given experimental data. Our method employs an appropriately designed prior probability density function that effectively integrates fundamental biophysical and thermodynamic knowledge into the inference problem. Moreover, it takes into account experimental strategies for collecting informative observations of molecular concentrations through perturbations. The proposed method employs a maximization-expectation-maximization algorithm that provides thermodynamically feasible estimates of the rate constant values and computes appropriate measures of estimation accuracy. We demonstrate various aspects of the proposed method on synthetic data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, and examine its robustness under conditions that violate key assumptions. Software, coded in MATLAB^{®}, which implements all Bayesian analysis techniques discussed in this paper, is available free of charge at http://www.cis.jhu.edu/~goutsias/CSS%20lab/software.html.
Conclusions
Our approach provides an attractive statistical methodology for estimating thermodynamically feasible values for the rate constants of a biochemical reaction system from noisy time series observations of molecular concentrations obtained through perturbations. The proposed technique is theoretically sound and computationally feasible, but restricted to quantitative data obtained from closed biochemical reaction systems. This necessitates development of similar techniques for estimating the rate constants of open biochemical reaction systems, which are more realistic models of cellular function.
Keywords
Background
Biochemical reaction systems are popular models of cellular function. These models are extensively used to represent the inter-connectivity and functional relationships among molecular species in cells and, most often, they provide accurate description of cellular behavior. Inferring a biochemical reaction system from experimental data is an important step towards building mathematical and computational tools for the analysis of cellular systems. This step requires both structure (stoichiometry) identification as well as parameter (rate constant) estimation [1–4]. Due however to the large combinatorial complexity of determining the stoichiometry of a biochemical reaction system, solving this problem requires large amounts of high quality experimental data and substantial computational resources, which are not usually available in practice.
Recently, several approaches have been proposed in the literature for addressing a simpler problem, known as model calibration. The objective of model calibration is to adjust the kinetic parameters of a biochemical reaction system with given stoichiometry in order to obtain a sufficiently good match between simulated and observed dynamics; e.g. see [2, 5–11].
Among known model calibration techniques, the ones based on Bayesian analysis [7, 10, 11] are perhaps the most versatile. Bayesian analysis allows us to effectively incorporate biophysical knowledge into the problem at hand and naturally draw statistical conclusions about the unknown kinetic parameters. This is done by employing a probability density function that encapsulates prior information about the rate constants of a biochemical reaction system and by deriving a posterior probability density function over the kinetic parameters after experimental data have been collected. By taking into account the experimental data and the information contained in the prior, the posterior density summarizes all knowledge available about the unknown kinetic parameters and quantifies uncertainty about their true values [12, 13]. Moreover, the posterior allows us to quantify our confidence about estimation accuracy, compute probabilities over alternative calibrations, and design additional experiments to improve inference.
Most published model calibration techniques do not take into account constraints on the reaction rate constants imposed by the fundamental laws of thermodynamics. If these constraints, known as Wegscheider conditions [14, 15], are not explicitly considered by a model calibration technique, then the method will spend most time examining impossible kinetic parameter sets and will most probably produce a biochemical reaction system that is not physically realistic [16]. This issue has been recently recognized in the literature, and new modeling formalisms have been suggested in an effort to address it [17–20]. The proposed formalisms describe a biochemical reaction system by well-defined thermodynamic parameters whose values always guarantee that the reaction rate constants satisfy the Wegscheider conditions. For example, in [19, 20], a biochemical reaction system is parameterized in terms of molecular capacities and reaction resistances, by using a thermodynamic kinetic modeling (TKM) formalism that enjoys a number of advantages over the ones suggested in [17, 18].
We believe that parameterizing a biochemical reaction system in terms of capacities and resistances is unnecessary and, in certain instances, problematic. It has been pointed out in [19] that different choices for the TKM parameters can lead to the same concentration dynamics. As a consequence, the TKM parameters cannot be determined uniquely from concentration measurements. A way to address this problem is to take the capacities to be the equilibrium concentrations (which is always possible in closed biochemical reaction systems), in which case the capacities are constrained by conservation relationships imposed by the system stoichiometry. Then, parameter estimation in the TKM formalism may be possible by arbitrarily fixing a subset of capacity values and estimating the remaining capacities and resistances. However, this approach can be very cumbersome when dealing with molecular perturbations (as we do in this paper) or when merging estimated TKM models, since, in both cases, the capacities may not refer to compatible equilibrium concentrations. It has been suggested in [19] that a way to merge two models using the TKM formalism is to first convert the capacities and resistance to the rate parameters, merge the two models, and then convert back to the TKM formalism. However, this approach seems to be overly complicated, especially in view of the model calibration methodology presented here.
In this paper, we introduce a thermodynamically consistent Bayesian analysis approach to model calibration that does not require reparametrization. Our approach relies on statistically modeling the reaction rate constants of the forward reactions as well as the equilibrium constants of individual reactions. We restrict our attention to closed systems (or systems that can be approximately considered to be closed), since thermodynamic analysis of such systems is easier to handle than open systems. The proposed approach controls thermodynamic consistency of the reaction rate constants by employing well-defined relationships between the kinetic parameters of a biochemical reaction system, imposed by the Wegscheider conditions. By embedding these relationships within an iterative algorithm that finds the mode of the posterior density, we arrive at a thermodynamically consistent Bayesian estimate for the rate constants.
Bayesian analysis can be appreciably influenced by the choice of the prior probability density functions. This is particularly true in systems biology problems in which only a small number of observations is usually available. It is therefore important to focus our effort on constructing appropriate prior densities for the unknown rate constants of the forward reactions and the equilibrium constants of individual reactions. Although a number of choices may be possible, it is imperative to use fundamental biophysical and thermodynamic principles to derive informative prior densities that effectively encapsulate such principles.
By using the classical Arrhenius formula of chemical kinetics [21], we construct an appropriate prior density for the log-rate constants of the forward reactions. To do so, we assume that the prefactor and activation energy associated with the Arrhenius formula are both random variables following log-normal and exponential distributions, respectively. This approach takes into account unpredictable changes in biochemical conditions affecting the structure of the reactant molecules and the probability of reaction after collision. On the other hand, by exploiting the thermodynamic relationship between rate constants, equilibrium concentrations, and stoichiometric coefficients, we derive an analytical expression for the joint prior density of the logarithms of the equilibrium constants. This expression depends on steady-state concentration measurements and on the stoichiometry of the biochemical reaction system under consideration.
Another important issue associated with the inference problem considered in this paper is the need to collect an informative set of measurements that can lead to sufficiently accurate parameter estimation. It has been increasingly recognized in the literature that a powerful approach to accomplish this goal in problems of systems biology is to selectively perturb key molecular components and measure the effects of these perturbations on the underlying concentrations [22–24]. We follow this strategy here and assume that we can selectively perturb, one at a time, the initial concentrations of a selected number of molecular species in a biochemical reaction system, by increasing or decreasing their values without altering the underlying stoichiometry. This can be achieved by a variety of experimental techniques, such as RNA interference (RNAi), transfection, or molecular injection. Therefore, our approach combines Bayesian analysis with current experimental practices, thus bridging the gap between statistical inference approaches and experimental design.
The Bayesian analysis technique discussed in this paper requires numerical evaluation of a number of statistical summaries of the posterior density. Although several methods are available to deal with this problem (e.g., see [25, 26]), we employ here a maximization-expectation-maximization (MEM) strategy that calculates a thermodynamically consistent estimate of the reaction rate constants as well as Monte Carlo estimates of posterior summaries used to evaluate the quality of inference. This strategy is based on sequentially combining a powerful stochastic optimization technique, known as simultaneous perturbation stochastic approximation (SPSA) [27], with Markov chain Monte Carlo (MCMC) sampling [25]. Our experience with extensive synthetic experiments, based on data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, indicates that the proposed algorithm is robust, producing excellent estimation results even in cases of high measurement errors and limited time measurements.
This paper is structured as follows. In the "Methods" section, we provide a brief overview of biochemical reaction systems, discuss how to model perturbations, and present a standard statistical model for the measurements. We then outline our Bayesian analysis approach to model calibration and present our choices for the prior and posterior densities. By emphasizing the fact that the prior density must assign zero probability over the thermodynamically infeasible region of the parameter space, and by employing an encompassing prior approach to Bayesian analysis, we derive an appropriate posterior density that satisfies this condition. We finally outline our proposed methodology for computing thermodynamically consistent Bayesian estimates of the kinetic parameters and for assessing estimation accuracy.
In the "Results/Discussion" section, we provide simulation results, based on a subset of a well-established model of the EGF/ERK signal transduction pathway. These results illustrate key aspects of the proposed model calibration methodology and show its potential for producing sufficiently accurate thermodynamically consistent estimates of a biochemical reaction system from noisy time-series measurements of molecular concentrations.
Finally, in the "Conclusions" section, we discuss a key statistical advantage of the proposed model calibration methodology, viewed from a bias-variance tradeoff perspective. Moreover, we provide suggestions for further research to address a number of practical issues associated with model calibration, such as estimating the initial concentrations and their perturbations, dealing with partially observed or missing data, and extending the proposed technique to the case of open biochemical reaction systems.
Extensive mathematical and computational details are required to rigorously formulate, derive, and understand various aspects of the proposed approach. We provide these details in three Additional files accompanying this paper. In Additional file 1, we present a detailed exposition of the underlying theory, whereas, in Additional file 2, we carefully discuss computational implementation. Finally, in Additional file 3, we provide all necessary details pertaining the biochemical reaction system we use in our simulations. Well-documented software, coded in MATLAB^{®}, which implements all Bayesian analysis techniques discussed in this paper, is available to interested readers free of charge at http://www.cis.jhu.edu/~goutsias/CSS%20lab/software.html.
Methods
Biochemical reaction systems
s_{ nm } is the net stoichiometry coefficient of the n^{th} molecular species associated with the m^{th} reaction, defined by ${s}_{nm}:={{v}^{\prime}}_{nm}-{v}_{nm}$, and $\mathcal{T}$ := [0, t_{max}] is an observation time window of interest.
Equations (2)-(4) are based on the assumption that we can selectively perturb, one at a time, the concentrations of molecular species in a set $\mathcal{P}$, by increasing or decreasing their values at time t = 0 without altering the underlying stoichiometry. For notational convenience, we include 0 in $\mathcal{P}$ and assign p = 0 to the original unperturbed system. In this case, ${x}_{n}^{(0)}(t)$ is the concentration of the n^{th} molecular species in the unperturbed system at time t, whereas, ${x}_{n}^{(p)}(t)$, for p ≠ 0, is the concentration of the n^{th} molecular species at time t, obtained by perturbing the initial concentration of the p^{th} species. In (3), π_{ p } ≥ -c_{ p } quantifies the perturbation applied on the initial concentration c_{ p } of the p^{th} molecular species at time t = 0. When -c_{ p } ≤ π_{ p }< 0, the initial concentration of the p^{th} molecular species is reduced, a situation that can be achieved by a variety of experimental techniques, such as RNA interference (RNAi). On the other hand, when π_{ p } > 0, the initial concentration of the p^{th} molecular species is increased, a situation that can be achieved by transfection or molecular injection.
Due to the enormous complexity of biological reaction networks, (1) is used to model a limited number of molecular interactions embedded within a larger and more complex system. Mass flow between the biochemical reaction system given by (1) and its surroundings complicates modeling. As a matter of fact, some molecular concentrations in the system may be influenced by unknown reactions, not modeled by (1), or by partially known reactions with reactants regulated by unknown biochemical mechanisms. To address this problem, we will assume that there is no appreciable mass transfer between the biochemical reaction system and its surroundings during the observation time interval $\mathcal{T}$ = [0, t_{max}]. As a consequence, we can assume that (1) characterizes a closed biochemical reaction system within $\mathcal{T}$. Moreover, we will assume that the system reaches quasi-equilibrium at some time t_{*} ≤ t_{max}, after which its thermodynamic properties do not appreciably change for t_{*} < t ≤ t_{max}. Note however that the quasi-equilibrium assumption does not necessarily imply that the biochemical reaction system will be at thermodynamic equilibrium after time t_{max}, since mass transfer may take place at some time t > t_{max}. Although we may be able to satisfy these assumptions by appropriately designed synthetic or in vitro biological experiments, the assumptions are certainly not satisfied in vivo. For this reason, we believe that future research must be focused on extending the approaches and techniques discussed in this paper to the case of open biochemical reaction systems.
Measurements
for q = 1, 2, ..., Q + 1, where ${\u03f5}_{n}^{(p)}({t}_{q})$ is a multiplicative random error factor and ${\eta}_{n}^{(p)}({t}_{q}):=\mathrm{ln}{\u03f5}_{n}^{(p)}({t}_{q})$. The assumption of multiplicative errors is common in most data acquisition procedures, such as DNA microarray-based genomics and mass spectrometry-based proteomics [28–30], whereas, the logarithm is used to obtain a convenient additive error model for the measurements.
In the following, we will assume that the biochemical reaction system, and all its perturbed versions, is sufficiently close to steady-state at time point t_{Q+1}. We can justify this assumption by taking t_{*} ≤ t_{Q+1}≤ t_{max} and by recalling our previous assumption that the biochemical reaction system is at thermodynamic quasi-equilibrium at times t_{*} ≤ t ≤ t_{max}. Our Bayesian analysis approach is based on data y, whereas, we use the steady-state measurements $\overline{\mathit{y}}$ to derive a joint probability density function for the logarithms {ln(k_{2m-1}/k_{2m}), m ∈ ℳ} of the equilibrium constants of the reactions needed for specifying the posterior density.
Finally, we will assume that the error components ${\eta}_{n}^{(p)}({t}_{q})$ are statistically independent zero-mean Gaussian random variables. The Gaussian assumption is quite common in genomic problems and has been experimentally verified in some cases; e.g., see [31]. This assumption is usually justified by the central limit theorem and the premise that the errors are due to a large number of independent multiplicative error sources. We may attempt to justify the independence assumption between measurement errors by arguing that an error occurred in a particular measurement may only be due to the acquisition process used to obtain that measurement and, hence, it may not affect the error values of other measurements. In general, however, this is only a mathematically convenient assumption that may not be realistic. We experimentally demonstrate later that, at least for the example considered in this paper, the proposed estimation methodology is quite effective even in the case of non-Gaussian and correlated measurement errors. For simplicity, we finally assume equal error variances; i.e., we will assume that $\text{var}[{\eta}_{n}^{(p)}({t}_{q})]={\sigma}^{2}$, for every n, p, and q. This assumption is not crucial to our approach and can be relaxed if necessary.
Bayesian model calibration
In this paper, we deal with the following problem: Given noisy concentration measurements y and $\overline{\mathit{y}}$, we want to calculate thermodynamically consistent estimates of the log-rate constants κ:= {κ_{2m-1}:= ln k_{2m-1}, κ_{2m}:= ln k_{2m}, m ∈ ℳ} of a closed biochemical reaction system, such that (2), initialized by (3), produce molecular concentrations ${x}_{n}^{(p)}(t)$ that "best" match (in some well-defined sense) the available measurements.
We should note here that it is convenient to estimate the logarithms of the rate constants instead of the constants themselves. By focusing on the logarithms, we can reduce the dynamic range of rate constant values and make their estimation numerically easier. To simplify our developments, we will assume that the initial concentrations {c_{ n }, n ∈ $\mathcal{N}$} and perturbations {π_{ p }, p ∈ $\mathcal{P}$} are known or have been estimated by an appropriate experimental procedure. When this is not true, these quantities must be treated as unknown parameters and estimated from data, together with the rate constants, provided that a sufficient amount of data is available to allow reliable estimation.
Note that the prior density of the log-rate constants κ depends on z. For this reason, we view z as a set of random hyperparameters (in Bayesian analysis, parameters used to specify prior densities are known as hyperparameters), specified by means of the prior density p(z).
The posterior density p(κ| y) takes into account our prior belief about the rate constant values and the data formation process, summarized by the prior density p(z) of the log-equilibrium constants, the conditional prior density p(κ| z) of the log-rate constants given the log-equilibrium constants, the conditional probability density p(σ^{2} | κ) of the error variance given the log-rate constants, and the likelihood p(y| κ, σ^{2}). However, the posterior density is hard to interpret, especially in high-dimensional problems that involve many parameters, such as the problem we are dealing with here. As a consequence, the main objective of Bayesian analysis is to produce numerical information that can be effectively used to summarize the posterior density and simplify the task of statistical inference to the extent possible. Typical summaries include measures of location and scale of the posterior, which are used to produce estimates for the parameter values and to evaluate the accuracy of such estimates, respectively.
It is clear from (6) that, to evaluate the posterior p(κ| y), we need to compute the "effective" prior density ∫ p(κ| z)p(z) d z as well as the "effective" likelihood ∫ p(y| κ, σ^{2})p(σ^{2} | κ)dσ^{2}. To do so, we must specify the aforementioned densities p(σ^{2} | κ), p(z), p(κ| z), and p(y| κ , σ^{2}). We discuss this problem next.
Prior density of error variance
for two parameters α, b > 0.
The independence assumption between σ^{2} and κ is reasonable, in view of the fact that the errors are mainly due to the experimental methodology used to obtain the measurements, whereas, the rate constants are due to biophysical principles underlying the biochemical reaction system. On the other hand, the choice given by (9) has been well-justified in Bayesian analysis. In fact, the inverse gamma distribution is the conjugate prior for the variance of additive Gaussian errors [13]. Conjugate priors are common in Bayesian analysis, since they often lead to attractive analytical and computational simplifications. Note that E[σ^{2}] = b/(α - 1) and var[σ^{2}] = {E[σ^{2}]}^{2}/(α - 2) = b^{2}/[(α - 1)^{2}(α - 2)], for α > 2. Therefore, the parameters α, b control the location and scale of the inverse gamma distribution given by (9). We illustrate this prior in Figure S-1.3 of Additional file 1. In the following, we treat α and b as hyperparameters with known values. For a practical method to determine these values, the reader is referred to Additional file 1.
Prior density of log-equilibrium constants
Before we consider the problem of specifying a prior density for the log-equilibrium constants z, we first investigate how much information about z can be extracted from measurements.
known as Wegscheider conditions [14, 15], where r_{ m } is the m^{th} element of the M × 1 vector r, $\mathbb{S}$ is the N × M stoichiometry matrix of the biochemical reaction system with elements s_{ nm } , and null($\mathbb{S}$) is the null space of $\mathbb{S}$. As a consequence, for a biochemical reaction system to be physically realizable, it is required that the reaction rates satisfy the thermodynamically imposed Wegscheider conditions.
where ℍ is an M × M matrix with elements ${h}_{m{m}^{\prime}}={\sum}_{n\in \mathcal{N}}{s}_{nm}{s}_{n{m}^{\prime}}$, and α, b are the two hyperparameters associated with the prior density of the measurement variance, given by (9).
The previous result suggests that we may be able to use $p(\mathit{z}|\tilde{\mathit{y}})$ as an informative prior for the log-equilibrium constants z; i.e., we may be able to replace p(z) by $p(\mathit{z}|\tilde{\mathit{y}})$ in (6). At a first glance, this idea may not seem appropriate. However, it perfectly agrees with the fact that, in Bayesian analysis, hyperparameters are often estimated directly from data [13]. Since we have shown here that steady-state measurements can be effectively used to construct the entire posterior probability density function of z, it seems reasonable to use this posterior as a prior density for z. Note however that, by replacing p(z) with $p(\mathit{z}|\tilde{\mathit{y}})$ in (6), we must make sure that $\tilde{\mathit{y}}$ is independent of y(see Additional file 1). Otherwise, our choice for p(z) may not lead to a proper posterior density p(κ| y) (i.e., it may not lead to a density that is finite for all y). Note that the independence of $\tilde{y}$ and y is assured by the independence between the measurement errors $\{{\eta}_{n}^{(p)}({t}_{Q+1}),n\in \mathcal{N},p\in \mathcal{P}\}$ and $\{{\eta}_{n}^{(p)}({t}_{q}),n\in \mathcal{N},p\in \mathcal{P},q\in \mathcal{Q}\}$.
which we can always evaluate, since matrix ${\mathbb{D}}_{0}$ is invertible.
Prior density of log-rate constants
where ${\tau}_{m}:={T}_{m}^{\ast}/T>1$, ${\kappa}_{m}^{0}:=\mathrm{ln}{\alpha}_{m}^{0}-{E}_{m}^{0}/{k}_{B}T$, and erfc[·] is the complementary error function. We illustrate this prior in Figure S-1.1 of Additional file 1.
Equations (16) and (17) provide an analytical form for the prior density of the log-rate constants. To use this expression, we must determine appropriate values for $\mathit{\varphi}:=\{{\kappa}_{m}^{0},{\tau}_{m},{\lambda}_{m},m\in \mathcal{M}\}$, which can be treated as hyperparameters. Although we could treat ϕ as random, we will choose here known values for these parameters. This is motivated by the fact that ϕ determines the location and scale of the prior densities of the forward rate constants; see Figure S-1.1 in Additional file 1. In certain problems of interest, there might be enough information to determine possible ranges for the forward rate constant values. As a consequence, we can use this information, together with an appropriate procedure, to effectively determine values for ϕ. The reader is referred to Additional file 1 for details on how to do so.
Effective likelihood
Note that evaluating φ(κ,y) at given values of κ and y requires integration of the system of ordinary differential equations (2).
Posterior density
where θ _{ mm' } are the elements of matrix ${\mathbb{U}}_{0}{\mathbb{D}}_{0}^{-1}{\mathbb{U}}_{0}^{T}$ obtained from the SVD decomposition of ${\mathbb{S}}^{T}\mathbb{S}$, and ${\tilde{y}}_{m}$ is given by (13).
Note that the posterior density of the log-rate constants is a compromise between the prior and the likelihood. The prior terms ω(κ) and ψ(κ, y) penalize log-rate values that do not fit well with available a-priori information, whereas, the likelihood term φ(κ, y) penalizes log-rate values that produce concentration dynamics which deviate appreciably from measurements. As the number N(P + 1)Q of available measurements increases, this compromise is controlled to a greater extent by the data through the factor φ(κ, y).
A problem arises with the posterior density p(κ| y), given by (21) and (22), since nonzero probabilities may be assigned to thermodynamically infeasible log-rate constants. A Bayesian analyst might argue that we have correctly done our job by formulating the problem as we did and that it is the data which will rule out the possibility that our biochemical reaction system can be characterized by thermodynamically infeasible parameters. However, we choose to trust thermodynamics far more than we would trust noisy data and appropriately modify the posterior density based on our knowledge that the kinetic parameters must satisfy the Wegscheider conditions given by (11).
Clearly, the values of the marginal posterior p_{ W } (κ_{ f }| y) are proportional to the corresponding values of the original posterior density p(κ_{ f }, κ_{ d }| y) over the thermodynamically feasible region of the parameter space, given by the hyperplane ${\mathit{\kappa}}_{d}=\mathbb{W}{\mathit{\kappa}}_{f}$. In the following, we will base our Bayesian analysis approach on p_{ W } (κ_{ f }| y).
Computing the posterior mode
In a Bayesian setting, we use the location of the posterior density over the parameter space to provide an estimate of the unknown parameter values. Typically, two measures of location are employed, namely the mode and the mean of the posterior. The posterior mean minimizes the mean-square error between the estimated and true parameters, whereas, the posterior mode is more likely to produce dynamics that closely resemble the true dynamics (see Additional file 1 for why this is true). We note here that the main objective of parameter estimation in biochemical reaction systems is not necessarily to determine parameter values that are "close" to the true values (e.g., in the mean square sense) but to obtain appropriate values for the rate constants so that the resulting molecular concentration dynamics closely reproduce the dynamics observed in the true system [33]. As a consequence, we choose the posterior mode as our parameter estimator.
The posterior log-density lnp_{ W } (κ_{ f }| y) is usually not concave, especially when a limited amount of highly noisy data y is available. As a consequence, there is no optimization algorithm that can find the posterior mode in a finite number of steps. A method to address this problem would be to randomly sample the parameter space at a predefined (and usually large) number of points and use these points to initialize an optimization algorithm, such as the simultaneous perturbation stochastic approximation (SPSA) algorithm discussed in the Additional file 2. We can then calculate the parameters and the associated values of lnp_{ W } (κ_{ f }| y) obtained by each initialization after a set number of optimization steps, and declare the parameters associated with the highest log-posterior value as being the desired mode estimates.
Unfortunately, SPSA (and as a matter of fact any other appropriate optimization algorithm) is computationally costly, especially in the case of large biochemical reaction systems. Therefore, using SPSA in the previous multi-seed strategy may result in a computationally prohibitive approach for finding the posterior mode. In order to reduce computations, we may choose only a small number of initial points that we believe are sufficiently proximal to the posterior mode. Two such points might be the prior and posterior means. As a matter of fact, as the data sample size tends to infinity, we expect that the posterior mean will coincide with the posterior mode, since, under suitable regularity conditions, the posterior density converges asymptotically to a Gaussian distribution [12, 34]. This simple idea leads to the sequential maximization-expectation-maximization (MEM) algorithm we discuss in the Additional file 2. According to this algorithm, we perform a relatively small number of SPSA iterations, initialized by the prior mode, to obtain a posterior mode estimate ${\widehat{\mathit{\kappa}}}_{f,1}^{\text{mode}}$. We then use an MCMC algorithm, initialized by ${\widehat{\mathit{\kappa}}}_{f,1}^{\text{mode}}$, to obtain an estimate of the posterior mean ${\widehat{\mathit{\kappa}}}_{f}^{\text{mean}}$. Subsequently, we perform another set of SPSA iterations, initialized by ${\widehat{\mathit{\kappa}}}_{f}^{\text{mean}}$, to obtain the posterior mode estimate ${\widehat{\mathit{\kappa}}}_{f,2}^{\text{mode}}$. We finally set ${\widehat{\mathit{\kappa}}}_{f}^{\text{mode}}$ to be the log-rate constants that produce the maximum posterior value during all SPSA and MCMC iterations, and set the optimal estimate $\widehat{\mathit{\kappa}}$ of the log-rate constants κ equal to $\{{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}},\mathbb{W}{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}}\}$.
Estimation accuracy
A small value of ϵ_{RMSE} provides us with confidence that the estimated value of that constant is accurate. On the other hand, the estimate may be perceived as inaccurate if ϵ_{RMSE} is exceedingly large.
Another useful metric for evaluating estimation accuracy is $D:=\mathrm{ln}\mathrm{det}[\mathbb{V}]/(M+{M}_{1})$, where $\mathrm{det}[\mathbb{V}]$ is the determinant of the posterior covariance matrix $\mathbb{V}=\text{E}\left[({\mathit{\kappa}}_{f}-{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}}){({\mathit{\kappa}}_{f}-{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}})}^{T}|\mathit{y}\right]$. Note that D is the average of the log-eigenvalues of $\mathbb{V}$ and is related to the well-known D-optimal criterion used in experimental design [27]. We can use D to quantify the overall accuracy of a model calibration result, with smaller values of D indicating better overall accuracy.
Note that the RMSE's ϵ_{RMSE} can be computed from the diagonal elements of $\mathbb{V}$. It turns out the we can approximate ϵ_{RMSE} and D from an estimate $\widehat{\mathbb{V}}$ of the posterior covariance matrix $\mathbb{V}$ obtained during the second (MCMC) phase of the proposed MEM algorithm (see Additional file 2 for details).
where $\{{x}_{n}^{(p)}(t),t\in \mathcal{T}\}$ and $\{{\widehat{x}}_{n}^{(p)}(t),t\in \mathcal{T}\}$ are the true and estimated dynamics of the n^{th} molecular species under the p^{th} perturbation, produced by the biochemical reaction system with log-rate constants κ^{true} and $\widehat{\mathit{\kappa}}=\{{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}},\mathbb{W}{\widehat{\mathit{\kappa}}}_{f}^{\text{mode}}\}$, respectively. Clearly, ϵ_{MED-AE} and ϵ_{MAX-AE} provide measures of closeness between the estimated molecular responses $\{{\widehat{x}}_{n}^{(p)}(t),t\in \mathcal{T},n\in \mathcal{N},p\in \mathcal{P}\}$ and the true molecular responses $\{{x}_{n}^{(p)}(t),t\in \mathcal{T},n\in \mathcal{N},p\in \mathcal{P}\}$, normalized by the corresponding true integrated responses $\left\{{\displaystyle {\int}_{\mathcal{T}}{x}_{n}^{(p)}(t)dt,n\in \mathcal{N},p\in \mathcal{P}}\right\}$. Normalization is required in order to make sure that no one species dominates the error values more than any other. Finally, note that half of the normalized absolute errors will be between 0 and ϵ_{MED-AE}, whereas, the remaining half will be between ϵ_{MED-AE} and ϵ_{MAX-AE}.
Results/Discussion
In specifying the model depicted in Figure 1, we must provide three sets of physically reasonable values: true rate constant values, initial concentrations, and experimentally feasible perturbations to the initial concentrations. Published values for the reaction rate constants associated with our example are given in Equation (S-3.1) of Additional file 3. However, these values do not correspond to a thermodynamically feasible biochemical reaction system, since they do not satisfy the Wegscheider conditions, given by (11). We should point out here that this is a common problem in systems biology. Reaction rate values are usually amalgamated from various independent sources in the literature, so it is highly unlikely that these values will correspond to a thermodynamically feasible biochemical reaction system. As a consequence, it is desirable to develop a method that uses published values for the reaction rate constants and calculates an appropriate set of thermodynamically feasible values that can be considered as the "true" parameter values. In Additional file 3, we calculate "true" values for the log-rate constants by using a linear least squares approach to project the published values onto the thermodynamically feasible hyperplane. The resulting "true" values are given in Equation (S-3.3) of Additional file 3.
Regarding the initial concentrations, we use the values specified in [35, 36], with two minor modifications. First, molecular species with zero initial concentrations are modified to have a small number of molecules present. We do this to accommodate the fact that, in a real cellular system, these molecular species are constitutively expressed. The second modification comes from the fact that we are no longer modeling the entire EGF/ERK signaling cascade and, therefore, we must account for the upstream EGF stimulus. To take this into account, we increase the initial concentration of the most upstream molecular species in our model, namely (EGF-EGFR*)2-GAP. The initial concentrations used are given by Equation (S-3.4) in Additional file 3.
To specify appropriate perturbations to the initial molecular concentrations, note that molecular complexes, such as dimers, trimers, etc., are far more difficult to perturb than simple monomeric molecular species. For this reason, we focus our perturbation efforts on Shc*, Grb2, and Sos. Since Shc* is commercially available in a purified and quantified form, we will assume that we can increase its initial concentration by a factor of 100 using molecular injection. We will also assume that we can perturb Grb2 and Sos by RNAi, resulting in a decrease in their initial concentrations by a factor of 100. Thus, we set π_{1} = 99c_{1}, π_{2} = -.99c_{2}, and π_{4} = -.99c_{4}.
To avoid specifying different hyperparameter values for the prior densities of the forward log-rate constants, we assume here that all densities share the same known values {κ^{0}, τ, λ}, where κ^{0} = -5.1010, τ = 1.8990, and λ = 0.7409, whereas, we set α = 3 and b = 1 for the hyperparameters of the prior density of the variance σ^{2} of the measurement errors. These choices correspond to the prior densities depicted in Figure S-1.2(a) and Figure S-1.3(a) in Additional file 1. We implement our Bayesian analysis approach using the MEM algorithm described in Additional file 2, with I = 5,000 SPSA iterations in each maximization step and a total of L = 50,000 MCMC iterations in the expectation step. Finally, we observe the biochemical reaction system within a time period of 1 min.
Estimated posterior RMSE values for the case of i.i.d. zero-mean Gaussian errors with standard deviation σ = 0.3. Logarithmic sampling is used with Q = 6.
κ _{1} | κ _{3} | κ _{5} | κ _{7} | κ _{9} | κ _{11} | κ _{13} | κ _{15} | κ _{17} |
---|---|---|---|---|---|---|---|---|
0.2414 | 0.1578 | 0.1838 | 0.2950 | 0.1426 | 0.1683 | 0.0968 | 0.4474 | 0.1484 |
κ _{2} | κ _{4} | κ _{6} | κ _{8} | κ _{10} | κ _{12} | κ _{14} | κ _{16} | κ _{18} |
0.2594 | 0.2095 | 0.1704 | - | 0.2124 | 0.2136 | - | 0.5093 | 0.0494 |
The concentration dynamics produced by the estimated rate constant values match well the dynamics produced by the true values. As a matter of fact, the calculated median and maximum absolute error values imply that half of the relative integrated absolute error values between the estimated and true concentration dynamics (across all molecular species and all applied perturbations) are smaller than 3.03%, whereas, the remaining values are between 3.03% and 16.8%. On the other hand, the estimated posterior RMSE values summarized in Table 1 indicate a high probability that, given the concentration measurements, the log-rate values will lie within a relatively small region around the corresponding posterior mode values.
We expect that, in general, by selecting appropriate perturbations and by increasing the number of concentration data collected during an experiment, we can improve estimation accuracy. However, how can one know if the right perturbations have been applied on the biochemical reaction system and if enough data has been collected in a practical situation? Inspection of RMSE values can provide an answer to these important questions. If the estimated RMSE values of the log-rate constants of many reactions are large, it may be worth collecting additional data by increasing P and Q. Additional data can improve estimation accuracy by shrinking the RMSE values to a size that indicates an acceptable degree of uncertainty. However, if the biochemical reaction system is insensitive to a given kinetic parameter, then the RMSE associated with that reaction may remain large even as the quality of data improves. Therefore, additional data should only be collected when the RMSE values are large and sensitivity analysis indicates that the values of the rate constants associated with these RMSE values appreciably affect the system dynamics.
Estimated values of the D-optimal criterion for uniform and logarithmic sampling schemes.
Q | uniform | logarithmic | % change |
---|---|---|---|
2 | -1.7697 | -2.3500 | - |
3 | -2.0030 | -3.4287 | 45.90% |
4 | -2.3752 | -3.7432 | 9.17% |
5 | -2.6115 | -4.1173 | 9.99% |
6 | -2.3492 | -4.1039 | -0.33% |
Estimated values of the D-optimal criterion for different replications and perturbations.
Perturbation | D |
---|---|
NO: 1 replication | -3.0123 |
NO: 2 replications | -3.4950 |
NO: 3 replications | -3.7544 |
YES: Shc* | -3.1398 |
YES: Grb2 | -3.0747 |
YES: Sos | -3.4531 |
YES: Shc*, Grb2 | -3.9279 |
YES: Shc*, Sos | -3.7716 |
YES: Grb2, Sos | -3.6363 |
YES: Shc*, Grb2, Sos | -4.1039 |
Median and maximum absolute error values under a variety of measurement error conditions.
mean = 0 | i.i.d. Gaussian | i.i.d. Uniform | correlated Gaussian |
---|---|---|---|
σ = 0.1 | 3.98 × 10^{-3} | 8.64 × 10^{-3} | 1.48 × 10^{-2} |
5.56 × 10^{-2} | 4.81 × 10^{-2} | 7.32 × 10^{-2} | |
σ = 0.2 | 1.01 × 10^{-2} | 1.78 × 10^{-2} | 3.09 × 10^{-2} |
8.29 × 10^{-2} | 1.30 × 10^{-1} | 1.89 × 10^{-1} | |
σ = 0.3 | 3.03 × 10^{-2} | 1.78 × 10^{-2} | 3.05 × 10^{-2} |
1.68 × 10^{-1} | 1.30 × 10^{-1} | 2.46 × 10^{-1} | |
σ = 0.4 | 2.19 × 10^{-2} | 2.56 × 10^{-2} | 1.04 × 10^{-1} |
2.27 × 10^{-1} | 1.41 × 10^{-1} | 3.67 × 10^{-1} | |
σ = 0.5 | 2.67 × 10^{-2} | 3.86 × 10^{-2} | 6.43 × 10^{-2} |
2.48 × 10^{-1} | 3.32 × 10^{-1} | 3.10 × 10^{-1} |
Conclusions
In this paper, we have introduced a novel Bayesian analysis technique for estimating the kinetic parameters (rate constants) of a closed biochemical reaction system from time measurements of noisy concentration dynamics. The proposed procedure enjoys a clear advantage over other published estimation techniques: the estimated kinetic parameters satisfy the Wegscheider conditions imposed by the fundamental laws of thermodynamics. As a consequence, it always leads to physically plausible biochemical reaction systems.
From a statistical perspective, there are additional advantages for thermodynamically restricting the kinetic parameters of a biochemical reaction system to satisfy the Wegscheider conditions. This may be seen through the well-known bias-variance tradeoff in estimation [27]. The mean squared error of a given estimator can be decomposed into a bias term and a variance term. In general, imposing constraints on the estimator may increase its bias but decrease its variance (hence the tradeoff). However, if the true parameter values satisfy the constraints, then the variance may decrease without increasing the bias term [27]. Since the true values of the kinetic parameters must lie on the thermodynamically feasible manifold in the parameter space, confining the Bayesian estimator to this manifold (which is of lower dimension than the parameter space itself) may lead to lower mean squared error due to a smaller variance. Since the thermodynamically feasible manifold is of lower dimension than the parameter space, gains in variance (and hence improvements in the mean squared error) are expected to be large. This may be seen through the "curse of dimensionality," which refers to the exponential increase in the volume of the parameter space as its dimension grows, making estimation exponentially harder in higher dimensional spaces (in our example, the unconstrained parameter space has 12.5% more dimensions than the thermodynamically feasible subspace). The Wegscheider conditions reduce the dimensionality of the parameter space to a feasible region in which estimation may be easier. Thus, the proposed Bayesian analysis procedure improves on other estimation techniques by producing a statistically superior, physically meaningful and plausible estimate for the kinetic parameters of a closed biochemical reaction system.
The Bayesian analysis methodology discussed in this paper has been formulated by assuming that all initial concentrations and perturbations are precisely known and that concentration measurements can be obtained by directly sampling all system dynamics. However, current experimental practices in quantitative systems biology restrict the amount and type of data that can be collected from living cells. As a consequence, further research is needed to develop approaches that can accommodate this important issue and make a Bayesian analysis approach to parameter estimation better applicable to systems biology problems.
If the initial concentrations and the perturbations applied on these concentrations are not known, then we may try to estimate them together with the unknown kinetic parameters. Although formulation of this problem is similar to the one considered in this paper, the additional computational burden will be substantial. Moreover, while quantitative biochemical techniques are improving, the vast majority of data available in problems of systems biology are obtained by measuring ratios of molecular concentrations (e.g., by using techniques such as SILAC [37]). Estimation of the rate constants of a biochemical reaction system from concentration measurements available as ratios relative to a reference system requires special consideration and extensive modification of the proposed Bayesian analysis procedure. Finally, it is very important to address the problem of missing observations. This is a common problem in systems biology, since it is not possible to monitor and measure the concentrations of all molecular species present in the system. Although appropriate modifications to the proposed algorithm can lead to a Bayesian analysis approach that can handle missing data, we think that development of a practically effective way to address this problem is challenging. Our future plan is to expand and improve the Bayesian analysis procedure discussed in this paper in order to provide practical solutions to the previous problems.
It is worth noting here that the estimation procedure suggested in this paper applies only to closed biochemical reaction systems (or to approximations of closed systems embedded in a larger open system). However, a cell is an open system, since it effectively interacts with its environment. If we include the cell's environment into our system and monitor the combined system until steady-state (i.e., until cell death), then we would have the necessary closed system. Unfortunately, this is clearly an unrealistic scenario. As a consequence, there is also a need to develop a theoretical and computational approach for dealing with thermodynamically consistent parameter estimation in open biochemical reaction systems.
To conclude, it has been argued in a recent paper [33] that most models of computational systems biology are "sloppy," in the sense that many parameters of such models do not appreciably alter system behavior. A key conclusion of this paper is that collective fitting procedures (such as the Bayesian analysis technique presented in the present paper) are far more desirable than piecewise construction of a biochemical reaction system model from individual parameter estimates (which is how most models are constructed when investigators scour the literature for individual rate constant values). Moreover, it has been pointed out in [33] that using a method to obtain precise parameter values may be difficult, even with an unlimited amount of data, since the behavior of a sloppy model is insensitive to the values of most parameters. As a consequence, the authors suggest that, instead of focusing on the quality of parameter estimation, it will be more wise to focus on the quality of prediction achieved by an estimated model (as we have also argued in this paper).
To a certain extent, our Bayesian analysis approach addresses some of the issues raised in [33]. By imposing the Wegscheider conditions on the kinetic parameters of a biochemical reaction system, we can effectively constrain these parameters to a thermodynamically feasible manifold in the parameter space, thus reducing sloppiness. Moreover, we can effectively use the RMSE values and the D-optimal criterion to determine an appropriate experimental design and distinguish those estimated values that can be trusted from those that cannot. For example, if the RMSE value associated with a kinetic parameter is small, then we may trust these values. On the other hand, a large RMSE value may indicate high uncertainty in the estimated parameter values, which may be untrustworthy. As we mentioned before, if a sensitivity analysis approach, such as the one proposed in [38], indicates that the kinetic parameters associated with large RMSE values are influential parameters, then we must reduce these RMSE values to an acceptable level of uncertainty by adopting a new and more effective experimental design approach. On the other hand, if these parameters correspond to a non-influential reaction, then we can accept the estimated values with no further consideration, since high uncertainty in the exact values of these parameters will not affect the predicted concentration dynamics.
Declarations
Acknowledgements
GJ has been supported by The National Defense Science and Engineering Graduate (NDSEG) Fellowship through the Department of Defense. JG acknowledges the National Science Foundation (NSF) for support of this research. The authors thank the anonymous reviewers for their insightful comments, which resulted in a great improvement of the paper. Special thanks to Ali Afshar for identifying several problems with the original version of this paper and for suggesting appropriate fixes.
Authors’ Affiliations
References
- Crampin EJ, Schnell S, McSharry PE: Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. Prog Biophys Mol Bio 2004, 86: 77–112. 10.1016/j.pbiomolbio.2004.04.002View ArticleGoogle Scholar
- Feng XJ, Rabitz H: Optimal identification of biochemical reaction networks. Biophys J 2004, 86: 1270–1281. 10.1016/S0006-3495(04)74201-0View ArticlePubMedPubMed CentralGoogle Scholar
- Maria G: A review of algorithms and trends in kinetic model identification for chemical and biochemical systems. Chem Biochem Eng Q 2004, 18(3):195–222. [http://www-old.pbf.hr/cabeq/pdf/18_3_2004/CABEQ_18_3_1.pdf]Google Scholar
- Papin JA, Hunter T, Palsson BO, Subramaniam S: Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Bio 2005, 6: 99–111. 10.1038/nrm1570View ArticleGoogle Scholar
- Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics. P Natl Acad Sci USA 2002, 99(16):10555–10560. 10.1073/pnas.152046799View ArticleGoogle Scholar
- Rodriguez-Fernandez M, Mendes P, Banga JR: A hybrid approach for efficient and robust parameter estimation in biochemical pathways. BioSystems 2006, 83: 248–265. 10.1016/j.biosystems.2005.06.016View ArticlePubMedGoogle Scholar
- Liebermeister W, Klipp E: Bringing metabolic networks to life: integration of kinetic, metabolic, and proteomic data. Theor Biol Med Model 2006, 3: 42.View ArticlePubMedPubMed CentralGoogle Scholar
- Quach M, Brunel N, d'Alché Buc F: Estimating parameters and hidden variables in non-linear state-space models based on ODEs for biological networks inference. Bioinformatics 2007, 23(23):3209–3216. 10.1093/bioinformatics/btm510View ArticlePubMedGoogle Scholar
- Balsa-Canto E, Peifer M, Banga JR, Timmer J, Fleck C: Hybrid optimization method with general switching strategy for parameter estimation. BMC Syst Biol 2008, 2: 26.View ArticlePubMedPubMed CentralGoogle Scholar
- Klinke DJ: An empirical Bayesian approach for model-based inference of cellular signaling networks. BMC Bioinformatics 2009, 10: 371.View ArticlePubMedPubMed CentralGoogle Scholar
- Mazur J, Ritter D, Reinelt G, Kaderali L: Reconstructing nonlinear dynamic models of gene regulation using stochastic sampling. BMC Bioinformatics 2009, 10: 448.View ArticlePubMedPubMed CentralGoogle Scholar
- Berger JO: Statistical Decision Theory and Bayesian Analysis. 2nd edition. New York: Springer-Verlag; 1985.View ArticleGoogle Scholar
- Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 2nd edition. Boca Raton, Florida: Chapman and Hall/CRC; 2004.Google Scholar
- Heinrich R, Schuster S: The Regulation of Cellular Systems. New York: Chapman & Hall; 1996.View ArticleGoogle Scholar
- Vlad MO, Ross J: Thermodynamically based constraints for rate coefficients of large biochemical networks. WIREs Syst Biol Med 2009, 1: 348–358. 10.1002/wsbm.50View ArticleGoogle Scholar
- Alberty RA: Princple of detailed balance in kinetics. J Chem Educ 2004, 81(8):1206–1209. 10.1021/ed081p1206View ArticleGoogle Scholar
- Liebermeister W, Klipp E: Bringing metabolic networks to life: convenience rate law and thermodynamic constraints. Theor Biol Med Model 2006, 3: 41.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang J, Bruno WJ, Hlavacek WS, Pearson JE: On imposing detailed balance in complex reaction mechanisms. Biophys J 2006, 91: 1136–1141. 10.1529/biophysj.105.071852View ArticlePubMedPubMed CentralGoogle Scholar
- Ederer M, Gilles ED: Thermodynamically feasible kinetic models of reaction networks. Biophys J 2007, 92: 1846–1857. 10.1529/biophysj.106.094094View ArticlePubMedPubMed CentralGoogle Scholar
- Ederer M, Gilles ED: Thermodynamic constraints in kinetic modeling: Thermodynamic-kinetic modeling in comparison to other approaches. Eng Life Sci 2008, 8(5):467–476. 10.1002/elsc.200800040View ArticleGoogle Scholar
- Berry RS, Rice SA, Ross J: Physical Chemistry. 2nd edition. New York: Oxford University Press; 2000.Google Scholar
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001, 292: 929–934. 10.1126/science.292.5518.929View ArticlePubMedGoogle Scholar
- Tegnér J, Yeung MKS, Hasty J, Collins JJ: Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. Proc Nat Acad Sci USA 2003, 100: 5944–5949. 10.1073/pnas.0933416100View ArticlePubMedPubMed CentralGoogle Scholar
- Zak DE, Gonye GE, Schwaber JS, Doyle FJ: Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: Insights from an identifiability analysis of an in silico network. Genome Res 2003, 13: 2396–2405. 10.1101/gr.1198103View ArticlePubMedPubMed CentralGoogle Scholar
- Liu JS: Monte Carlo Strategies in Scientific Computing. New York: Springer-Verlag; 2001.Google Scholar
- Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: A comparison of global optimization methods. Genome Res 2007, 13: 2467–2474. 10.1101/gr.1262503View ArticleGoogle Scholar
- Spall JC: Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control. New York: Wiley-Interscience; 2003.View ArticleGoogle Scholar
- Šašik R, Calvo E, Corbeil J: Statistical analysis of high-density oligonucleotide arrays: a multiplicative noise model. Bioinf 2002, 18: 1633–1640. 10.1093/bioinformatics/18.12.1633View ArticleGoogle Scholar
- Anderle M, Roy S, Lin H, Becker C, Joho K: Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinf 2004, 20: 3575–3582. 10.1093/bioinformatics/bth446View ArticleGoogle Scholar
- Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 2005, 4: 419–434. 10.1074/mcp.R500005-MCP200View ArticlePubMedGoogle Scholar
- Molina H, Parmigiani G, Pandey A: Assessing reproducibility of a protein dynamics study using in vivo labeling and liquid chromatography tandem mass spectrometry. Anal Chem 2005, 77: 2739–2744. 10.1021/ac048204bView ArticlePubMedGoogle Scholar
- Klugkist I, Hoijtink H: The Bayes factor for inequality and about equality constrained models. Comput Stat Data An 2007, 51: 6367–6379. 10.1016/j.csda.2007.01.024View ArticleGoogle Scholar
- Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP: Universally sloppy parameter sensitivities in systems biology models. PLOS Comp Biol 2007, 3(10):1871–1878. 10.1371/journal.pcbi.0030189View ArticleGoogle Scholar
- Walker AM: On the asymptotic behaviour of posterior distributions. J Roy Stat Soc B Met 1969, 31: 80–88.Google Scholar
- Schoeberl B, Eichler-Jonsson C, Gilles ED, Müller G: Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat Biotechnol 2002, 20: 370–375. 10.1038/nbt0402-370View ArticlePubMedGoogle Scholar
- Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, Chelliah V, Li L, He E, Henry A, Stefan MI, Snoep JL, Hucka M, Novère NL, Laibe C: BioModels database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Syst Biol 2010, 4: 92. 10.1186/1752-0509-4-92View ArticlePubMedPubMed CentralGoogle Scholar
- Ong SE, Mann M: A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC). Nat Protoc 2006, 1: 2650–2660. 10.1038/nprot.2006.427View ArticlePubMedGoogle Scholar
- Zhang HX, Dempsey WP, Goutsias J: Probabilistic sensitivity analysis of biochemical reaction systems. J Chem Phys 2009, 131(094101):1–20.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.