- Methodology article
- Open Access
Bayesian inference of biochemical kinetic parameters using the linear noise approximation
- Michał Komorowski^{1, 2}Email author,
- Bärbel Finkenstädt^{1},
- Claire V Harper^{4} and
- David A Rand^{2, 3}
https://doi.org/10.1186/1471-2105-10-343
© Komorowski et al; licensee BioMed Central Ltd. 2009
- Received: 29 January 2009
- Accepted: 19 October 2009
- Published: 19 October 2009
Abstract
Background
Fluorescent and luminescent gene reporters allow us to dynamically quantify changes in molecular species concentration over time on the single cell level. The mathematical modeling of their interaction through multivariate dynamical models requires the deveopment of effective statistical methods to calibrate such models against available data. Given the prevalence of stochasticity and noise in biochemical systems inference for stochastic models is of special interest. In this paper we present a simple and computationally efficient algorithm for the estimation of biochemical kinetic parameters from gene reporter data.
Results
We use the linear noise approximation to model biochemical reactions through a stochastic dynamic model which essentially approximates a diffusion model by an ordinary differential equation model with an appropriately defined noise process. An explicit formula for the likelihood function can be derived allowing for computationally efficient parameter estimation. The proposed algorithm is embedded in a Bayesian framework and inference is performed using Markov chain Monte Carlo.
Conclusion
The major advantage of the method is that in contrast to the more established diffusion approximation based methods the computationally costly methods of data augmentation are not necessary. Our approach also allows for unobserved variables and measurement error. The application of the method to both simulated and experimental data shows that the proposed methodology provides a useful alternative to diffusion approximation based methods.
Keywords
- Posterior Distribution
- Transition Density
- Diffusion Approximation
- Stochastic Simulation Algorithm
- Ordinary Differential Equation Model
Background
The estimation of parameters in biokinetic models from experimental data is an important problem in Systems Biology. In general the aim is to calibrate the model so as to reproduce experimental results in the best possible way. The solution of this task plays a key role in interpreting experimental data in the context of dynamic mathematical models and hence in understanding the dynamics and control of complex intracellular chemical networks and the construction of synthetic regulatory circuits [1]. Among biochemical kinetic systems, the dynamics of gene expression and of gene regulatory networks are of particular interest. Recent developments of fluorescent microscopy allow us to quantify changes in protein concentration over time in single cells (e.g. [2, 3]) even with single molecule precision (see [4] for review). Therefore an abundance of data is becoming available to estimate parameters of mathematical models in many important cellular systems.
- 1.
The macroscopic rate equation (MRE) approach which describes the thermodynamic limit of the system with ordinary differential equations and does not take into account random fluctuations due to the stochasticity of the reactions.
- 2.
The diffusion approximation (DA) which provides stochastic differential equation (SDE) models where the stochastic perturbation is introduced by a state dependant Gaussian noise.
- 3.
The linear noise approximation (LNA) which can be seen as a combination because it incorporates the deterministic MRE as a model of the macroscopic system and the SDEs to approximatively describe the fluctuations around the deterministic state.
Statistical methods based on the MRE have been most widely studied [8, 13–15]. They require data based on large populations. The main advantages of this method are its conceptual simplicity and the existence of extensive theory for differential equations. However, single cells experiments and studies of noise in small regulatory networks created the need for statistical tools that are capable to extract information from fluctuations in molecular species. Few methods used CME to address this. Algorithm, proposed by [16], approximated the likelihood function, the other, suggested by [17] simulated it using Monte Carlo methods. Recently, also a method based on the exact likelihood [18] has been developed. Although, substantial progress has been made in numerical methods for solving CME, inference algorithms based on the CME are computationally intensive and difficult to apply to problems of realistic size and complexity [19]. Another group of methods is based on the DA [9, 20]. This uses likelihood approximation methods (e.g. [21]) that are computationally intensive and require sampling from high dimensional posterior distributions. Inference about the volatility process becomes difficult for low frequency data that are not directly measured at the molecular level [10, 20]. The aim of this study is to investigate the use of the LNA as a method for inference about kinetic parameters of stochastic biochemical systems. We find that the LNA approximation provides an explicit Gaussian likelihood for models with hidden variables and measurement error and is therefore simpler to use and computationally efficient. To account for prior information on parameters our methodology is embedded in the Bayesian paradigm. The paper is structured as follows: We first provide a description of the LNA based modeling approach and then formulate the relevant statistical framework. We then study its applicability in four examples, based on both simulated and experimental data, that clarify principles of the method. Additional file 1 contains details of mathematical and statistical modeling, particularly comparison of the proposed method with an algorithm based on the DA.
Methods
The chemical master equation (CME) is the primary tool to model the stochastic behaviour of a reacting chemical system. It describes the evolution of the joint probability distribution of the number of different molecular species in a spatially homogeneous, well stirred and thermally equilibrated chemical system [11].
where ϕ_{ i }= lim_{Ω→∞, X→∞}X_{ i }/Ω, φ = (ϕ_{1},..., ϕ_{ N })^{ T }and .
where W(t) denotes R dimensional Wiener process, and f_{ i }= f_{ i }(φ) (see Additional file 1 for derivation).
The rationale behind the expansion in terms of is that for constant average concentrations relative fluctuations will decrease with the inverse of the square root of volume [22]. Therefore the LNA is accurate when fluctuations are sufficiently small in relation to the mean (large Ω). Hence, the natural measure of adequacy of the LNA is the coefficient of variation i.e. ratio of the standard deviation to the mean (see Additional file 1). Validity of this approximation is also discussed in details in [22, 23]. In addition it can be shown that the process describing the deviation from the deterministic state converges weakly to the diffusion (3) as Ω → ∞ [24]. In order to use the LNA in a likelihood based inference method we need to derive transition densities of the process x.
Transition densities
The properties of the normal distribution allow us to construct a convenient inference framework that is reminiscent of the Kalman filtering methodology (see e.g. [27]).
Inference
It is rarely possible to observe the time evolution of all molecular components participating in the system of interest [28]. Therefore, we partition the process x_{ t }into those components y_{ t }that are observed and those z_{ t }that are unobserved.
Let , and denote the time-series that comprise the values of processes x, y and z, respectively, at times t_{0},..., t_{ n }. Here and throughout the paper we use the same letter to denote the stochastic process and its realization.
where the covariance matrix Σ = {Σ^{(i, j)}}_{i, j = 0,..., n}is a sub-matrix of such that and φ_{ y }is the vector consisting of the observed components of φ.
We use the standard Metropolis-Hastings (MH) algorithm [30] to sample from the posterior distribution in (13).
Results and Discussion
In order to study the use of the LNA method for inference we have selected four examples which are related to commonly used quantitative experimental techniques such as measurements based on reporter gene constructs and reporter assays based on Polymerase Chain Reaction (e.g. RT-PCR, Q-PCR). For expository reasons, all case studies consider a model of single gene expression.
Model of single gene expression
For the general case it is assumed that the transcription rate k_{ R }(t) is time-dependent, reflecting changes in the regulatory environment of the gene such as the availability of transcription factors or chromatin structure.
We will refer to the model in (16) and (17) as the simple model of single gene expression.
In order to test our method on a nonlinear system we will also consider the case of an autoregulated network where the transcription rate of the gene is a function of the concentration of the protein that the gene codes for and where the protein is a transcription factor that inhibits the production of its own mRNA. This is parameterized by a Hill function [31] where k_{ R }(t) now describes the maximum rate of transcription, H is a dissociation constant and n_{ H }is a Hill coefficient.
where . We refer to this model as the autoregulatory model of single gene expression. The two models constitute the basis of our inference studies below.
Inference from fluorescent reporter gene data for the simple model of single gene expression
This form of transcription corresponds to an experiment, where transcription increases for t ≤ b_{3} as a result of being induced by an environmental stimulus and for t > b_{3} decreases towards a baseline level b_{4}.
We assume that at time t_{0} (t_{0} <<b_{3}) the system is in a stationary state. Therefore, the initial condition of the MRE is a function of unknown parameters (ϕ_{ R }(t_{0}), ϕ_{ P }(t_{0})) = (b_{4}/γ_{ R }, b_{4}k_{ P }/γ_{ R }γ_{ P }).
using the standard MH algorithm. The distribution P( |Θ) is given by (12).
Inference results for (A) the simple model and (B) autoregulatory model of single gene expression
(A) Param. | Prior | Value | Estimate 1 | Estimate 2 |
---|---|---|---|---|
γ _{ R } | Γ (0.44,10^{-2}) | 0.44 | 0.43 (0.27-0.60) | 0.49 (0.38-0.61) |
γ _{ P } | Γ (0.52,10^{-2}) | 0.52 | 0.51 (0.35-0.67) | 0.49 (0.38-0.61) |
k _{ P } | Exp(100) | 10.00 | 21.09 (3.91-67.17) | 11.41 (7.64-16.00) |
λ | Exp(100) | 1.00 | 1.42 (0.60-2.57) | 1.08 (0.76-1.36) |
b _{0} | Exp(100) | 15.00 | 6.80 (0.97-18.43) | 12.78 (9.80-15.33) |
b _{1} | Exp(1) | 0.40 | 0.79 (0.05-3.02) | 0.29 (0.18-0.43) |
b _{2} | Exp(1) | 0.40 | 0.77 (0.08-2.79) | 0.77 (0.32-1.59) |
b _{3} | Exp(10) | 7.00 | 6.13 (4.41-7.85) | 7.25 (6.79-7.55) |
b _{4} | Exp(100) | 3.00 | 0.94 (0.11-2.88) | 2.87 (2.11-3.52) |
(B) Param. | Prior | Value | Estimate 1 | Estimate 2 |
γ _{ R } | Γ (0.44,10^{-2}) | 0.44 | 0.44 (0.27-0.60) | 0.42 (0.32-0.54) |
γ _{ P } | Γ (0.52,10^{-2}) | 0.52 | 0.49 (0.33-0.65) | 0.49 (0.36-0.61) |
k _{ P } | Exp(100) | 10.00 | 14.86 (3.18-47.97) | 9.35 (5.87-13.15) |
λ | Exp(100) | 1.00 | 1.26 (0.48-2.30) | 1.15 (0.81-1.50) |
b _{0} | Exp(100) | 15.00 | 5.99 (0.20-23.06) | 13.47 (9.24-17.13) |
b _{1} | Exp(1) | 0.40 | 0.59 (0.01-2.75) | 0.27 (0.14-0.53) |
b _{2} | Exp(1) | 0.40 | 0.92 (0.05-2.92) | 0.83 (0.21-3.52) |
b _{3} | Exp(10) | 7.00 | 6.53(0.74-14.69) | 7.27 (6.44-7.79) |
b _{4} | Exp(100) | 3.00 | 2.85 (0.35-7.19) | 2.64 (1.82-3.32) |
Inference for this model required sampling from the 9 dimensional posterior distribution (number of unknown parameters). If instead one used a diffusion approximation based method it would be necessary to sample from a posterior distribution of much higher dimension (see Additional file 1). In addition, incorporation of the measurement error is straightforward here, whereas for other methods it involves a substantial computational cost [20].
Inference from fluorescent reporter gene data for the model of single gene expression with autoregulation
The following example considers the autoregulatory system with only a small number of reacting molecules. Using SSA we generated artificial data from the single gene expression model with autoregulation. The protein time courses were then sampled every 15 minutes at 101 discrete points per trajectory (see Figure 1B). As before we assume that the mRNA time courses are not observed and that the protein data are of the form given in (20), i.e. proportional to the actual amount of protein with additive Gaussian measurement error. As in the previous case study we estimate parameters from two simulated data sets, a single trajectory and an ensemble of 20 independent trajectories. The inference results summarized in Table 1B show that despite the low number of mRNA (0-15 molecules, see Figure 2 in Additional file 1) and protein (10-250 molecules, see Figure B) all parameters can be estimated well with appropriate precision.
Inference for PCR based reporter data
Inference results for PCR based reporter assay simulated data
Parameter | Prior | Value | Estimate 1 | Estimate 2 |
---|---|---|---|---|
γ _{ R } | Exp(1) | 0.44 | 0.45 (0.35-0.60) | 0.46 (0.42-0.50) |
λ | Exp(100) | 1.00 | 1.07 (0.90-1.22) | 1.01 (0.95-1.05) |
b _{0} | Exp(100) | 15.00 | 13.13 (10.20-15.87) | 14.91 (13.86-15.77) |
b _{1} | Exp(1) | 0.40 | 0.29 (0.14-0.51) | 0.43 (0.32-0.54) |
b _{2} | Exp(1) | 0.40 | 0.32 (0.12-0.93) | 0.32 (0.21-0.43) |
b _{3} | Exp(10) | 7.00 | 7.05 (6.39-7.63) | 6.99 (6.76-7.15) |
b _{4} | Exp(100) | 3.00 | 2.97 (2.00-4.18) | 3.10 (2.76-3.43) |
μ _{0} | Exp(100) | 6.76 | 6.90 (5.79-7.69) | 6.55 (6.14-6.85) |
Exp(100) | 6.76 | 3.52 (0.01-8.99) | 7.59 (5.44-9.49) |
Estimation of gfp protein degradation rate from cycloheximide experiment
Conclusion
The aim of this paper is to suggest the LNA as a useful and novel approach to the inference of biochemical kinetics parameters. Its major advantage is that an explicit formula for the likelihood can be derived even for systems with unobserved variables and data with additional measurement error. In contrast to the more established diffusion approximation based methods [9, 20] the computationally costly methods of data augmentation to approximate transition densities and to integrate out unobserved model variables are not necessary. Furthermore, this method can also accommodate measurement error in a straightforward way.
The suggested procedure here is implemented in a Bayesian framework using MCMC simulation to generate posterior distributions. The LNA has previously been studied in the context of approximating Poisson birth and death processes [22–24, 35] and it was shown that for a large class of models the LNA provides an excellent approximation. Furthermore, in [35] it is shown that for the systems with linear reaction rates the first two moments of the transition densities resulting from the CME and the LNA are equal. Here we propose using the LNA directly for inference and provide evidence that the resulting method can give very good results even if the number of reacting molecules is very small. In our previous study [10] we have presented differences between fitting deterministic and stochastic models, where we used diffusion approximation based method. Our experience from that work and from study [20] is that implementation of diffusion approximation based methods is challenging especially for data that are sparsely sampled in time because the need for imputation of unobserved time points leads to a very high dimensionality of the posterior distribution. This usually results in highly autocorrelated traces affecting the speed of convergence of the Markov chain. Our method considerably reduces the dimension of the posterior distribution to the number of unknown parameters of a model only and is independent of the number of unobserved components (see Additional file 1). Nevertheless it can only be applied to the systems with sufficiently large volume, where fluctuations around a deterministic state are relatively close to the mean.
Declarations
Acknowledgements
This research was funded by BBSRC SABR grant BB/F005814/1 and EU BIOSIM Network Contract 005137. DAR is funded by EPSRC Senior Research Fellowship EP/C544587/1 and MK by studentship, Dept of Statistics, University of Warwick. CVH was funded by Wellcome Trust Programme Grant (067252, to JRED and MRHW) and now is recipient of The Prof. John Glover Memorial Postdoctoral Fellowship.
Authors’ Affiliations
References
- Ehrenberg M, Elf J, Aurell E, Sandberg R, Tegner J: Systems Biology Is Taking Off. Genome Res 2003, 13(11):2377–2380. 10.1101/gr.1763203View ArticlePubMedGoogle Scholar
- Elowitz MB, Levine AJ, Siggia ED, Swain PS: Stochastic Gene Expression in a Single Cell. Science 2002, 297(5584):1183–1186. 10.1126/science.1070919View ArticlePubMedGoogle Scholar
- Nelson DE, Ihekwaba AEC, Elliott M, Johnson JR, Gibney CA, et al.: Oscillations in NF-kappaB Signaling Control the Dynamics of Gene Expression. Science 2004, 306(5696):704–708. 10.1126/science.1099962View ArticlePubMedGoogle Scholar
- Xie SX, Choi PJ, Li GW, Lee NK, Lia G: Single-Molecule Approach to Molecular Biology in Living Bacterial Cells. Annual Review of Biophysics 2008, 37: 417–444. 10.1146/annurev.biophys.37.092607.174640View ArticlePubMedGoogle Scholar
- Raser JM, O'Shea EK: Noise in Gene Expression: Origins, Consequences, and Control. Science 2005, 309(5743):2010–2013. 10.1126/science.1105891PubMed CentralView ArticlePubMedGoogle Scholar
- Keizer J: Statistical Thermodynamics of Nonequilibrium Processes. Springer, New York; 1987.View ArticleGoogle Scholar
- Guptasarma P: Does replication-induced transcription regulate synthesis of the myriad low copy number proteins of Escherichia coli? Bioessays 1995, 17(11):987–97. 10.1002/bies.950171112View ArticlePubMedGoogle Scholar
- Moles CG, Mendes P, Banga JR: Parameter Estimation in Biochemical Pathways: A Comparison of Global Optimization Methods. Genome Res 2003, 13(11):2467–2474. 10.1101/gr.1262503PubMed CentralView ArticlePubMedGoogle Scholar
- Golightly A, Wilkinson DJ: Bayesian Inference for Stochastic Kinetic Models Using a Diffusion Approximation. Biometrics 2005, 61(3):781–788. 10.1111/j.1541-0420.2005.00345.xView ArticlePubMedGoogle Scholar
- Finkenstadt B, Heron E, Komorowski M, Edwards K, Tang S, Harper C, Davis J, White M, Millar A, Rand D: Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics 2008, 24(24):2901. 10.1093/bioinformatics/btn562PubMed CentralView ArticlePubMedGoogle Scholar
- Gillespie DT: A Rigorous Derivation of the Chemical Master Equation. Physica A 1992, 188(1–3):404–425. 10.1016/0378-4371(92)90283-VView ArticleGoogle Scholar
- Van Kampen N: Stochastic Processes in Physics and Chemistry. North Holland. 2006.Google Scholar
- Mendes P, Kell D: Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 1998, 14(10):869–883. 10.1093/bioinformatics/14.10.869View ArticlePubMedGoogle Scholar
- Ramsay JO, Hooker G, Campbell D, Cao J: Parameter estimation for differential equations: a generalized smoothing approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007, 69(5):741–796. 10.1111/j.1467-9868.2007.00610.xView ArticleGoogle Scholar
- Esposito W, Floudas C: Global Optimization for the Parameter Estimation of Differential-Algebraic Systems. Industrial and Engineering Chemistry Research 2000, 39(5):1291–1310. 10.1021/ie990486wView ArticleGoogle Scholar
- Reinker S, Altman R, Timmer J: Parameter estimation in stochastic biochemical reactions. Systems Biology, IEE Proceedings 2006, 153(4):168–178. 10.1049/ip-syb:20050105View ArticleGoogle Scholar
- Tian T, Xu S, Gao J, Burrage K: Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics 2007, 23: 84. 10.1093/bioinformatics/btl552View ArticlePubMedGoogle Scholar
- Boys R, Wilkinson D, Kirkwood T: Bayesian inference for a discretely observed stochastic kinetic model. Statistics and Computing 2008, 18(2):125–135. 10.1007/s11222-007-9043-xView ArticleGoogle Scholar
- Wilkinson D: Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 2009, 10(2):122–133. 10.1038/nrg2509View ArticlePubMedGoogle Scholar
- Heron EA, Finkenstadt B, Rand DA: Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study. Bioinformatics 2007, 23(19):2596–2603. 10.1093/bioinformatics/btm367View ArticlePubMedGoogle Scholar
- Elerian O, Chib S, Shephard N: Likelihood Inference for Discretely Observed Nonlinear Diffusions. Econometrica 2001, 69(4):959–993. 10.1111/1468-0262.00226View ArticleGoogle Scholar
- Elf J, Ehrenberg M: Fast Evaluation of Fluctuations in Biochemical Networks With the Linear Noise Approximation. Genome Res 2003, 13(11):2475–2484. 10.1101/gr.1196503PubMed CentralView ArticlePubMedGoogle Scholar
- Lars F, Per L, Andreas H: A Hierarchy of Approximations of the Master Equation Scaled by a Size Parameter. Journal of Scientific Computing 2007, 34(2):127–151.Google Scholar
- Kurtz TG: The Relationship between Stochastic and Deterministic Models for Chemical Reactions. The Journal of Chemical Physics 1972, 57(7):2976–2978. 10.1063/1.1678692View ArticleGoogle Scholar
- Arnold L: Stochastic differential equations: theory and applications. Wiley-Interscience; 1974.Google Scholar
- Oksendal B: Stochastic differential equations: an introduction with applications. 3rd edition. Springer; 1992.View ArticleGoogle Scholar
- Brockwell P, Davis R: Introduction to time series and forecasting. Springer New York; 2002.View ArticleGoogle Scholar
- Ronen M, Rosenberg R, Shraiman BI, Alon U: Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(16):10555–10560. 10.1073/pnas.152046799PubMed CentralView ArticlePubMedGoogle Scholar
- Wu JQ, Pollard TD: Counting Cytokinesis Proteins Globally and Locally in Fission Yeast. Science 2005, 310(5746):310–314. 10.1126/science.1113230View ArticlePubMedGoogle Scholar
- Gamerman D, Lopes HF: Markov Chain Monte Carlo Stochastic Simulation for Bayesian Inference. 2nd edition. Chapman & Hall/CRC; 2006.Google Scholar
- Thattai M, van Oudenaarden A: Intrinsic noise in gene regulatory networks. Proceedings of the National Academy of Sciences 2001. 151588598 151588598Google Scholar
- Chabot JR, Pedraza JM, Luitel P, van Oudenaarden A: Stochastic gene expression out-of-steady-state in the cyanobacterial circadian clock. Nature 2007, 450: 1249–1252. 10.1038/nature06395View ArticlePubMedGoogle Scholar
- Komorowski M, Miekisz J, Kierzek A: Translational Repression Contributes Greater Noise to Gene Expression than Transcriptional Repression. Biophysical Journal 2009., 96(2): 10.1016/j.bpj.2008.09.052Google Scholar
- Gillespie DT: Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 1977, 81(25):2340–2361. 10.1021/j100540a008View ArticleGoogle Scholar
- Ryota T, Hidenori K, J KT, Kazuyuki A: Multivariate analysis of noise in genetic regulatory networks. Journal of Theoretical Biology 2004, 229(4):501–521. 10.1016/j.jtbi.2004.04.034View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.