Approximate Bayesian computation schemes for parameter inference of discrete stochastic models using simulated likelihood density

Background Mathematical modeling is an important tool in systems biology to study the dynamic property of complex biological systems. However, one of the major challenges in systems biology is how to infer unknown parameters in mathematical models based on the experimental data sets, in particular, when the data are sparse and the regulatory network is stochastic. Results To address this issue, this work proposed a new algorithm to estimate parameters in stochastic models using simulated likelihood density in the framework of approximate Bayesian computation. Two stochastic models were used to demonstrate the efficiency and effectiveness of the proposed method. In addition, we designed another algorithm based on a novel objective function to measure the accuracy of stochastic simulations. Conclusions Simulation results suggest that the usage of simulated likelihood density improves the accuracy of estimates substantially. When the error is measured at each observation time point individually, the estimated parameters have better accuracy than those obtained by a published method in which the error is measured using simulations over the entire observation time period.


Background
In recent years, quantitative methods have become increasingly important for studying complex biological systems. To build a mathematical model of a complex system, two main procedures are commonly conducted [1]. The first step is to determine the elements of the network and regulatory relationships between the elements. In the second step, we need to infer the model parameters according to experimental data. Since biological experiments are time-consuming and expensive, normally experimental data are often scarce and incomplete compared with the number of unknown model parameters. In addition, the likelihood surfaces of large models are complex. The calibration of these unknown parameters within a model structure is one of the key issues in systems biology [2]. The analysis of such dynamical systems therefore requires new, effective and sophisticated inference methods.
During the last decade, several approaches have been developed for estimating unknown parameters: namely, optimization methods and Bayesian inference methods. Aiming at minimizing an objective function, optimization methods start with an initial guess, and then search in a directed manner within the parameter space [3,4]. The objective function is usually defined by the discrepancy between the simulated outputs of the model and sets of experimental data. Recently, the objective function has been extended to a continuous approach by considering simulation over the whole time period [5] and a multi-scale approach by including multiple types of experimental information [6]. Several types of optimization methods can be found in the literature, among which two major types are called gradient-based optimization methods and evolutionary-based optimization methods. Based on these two basic approaches, various techniques such as simulated annealing [7]. linear and non-linear least-squares fitting [8], genetic algorithms [9] and evolutionary computation [10,11] have been attempted to build computational biology models. Using optimization methods, the inferred set of parameters produces the best fit between simulations and experimental data [12,13]. which have been successfully applied for biological systems, however, there are still some limitations with these methods such as the problem of high computational cost when significant noise exists in the system. To address these issues, deterministic and stochastic global optimization methods have been explored [14].
When modeling biological systems where molecular species are present in low copy numbers, measurement noise and intrinsic noise play a substantial role [15], which is a major obstacle for modeling. Bayesian inference methods have been used to tackle such difficulties by extracting useful information from noise data [16]. The main advantage of Bayesian inference is that it is able to infer the whole probability distributions of parameters by updating probability estimates using Bayes' Rule, rather than just a point estimate from optimization methods. Also. Bayesian methods are more robust than using other methods when they are applied to estimate stochastic systems, which is not that obvious for modeling of deterministic systems [17]. Developments have taken place during the last 20 years and recent advances in Bayesian computation including Markov chain Monte Carlo (MCMC) techniques and sequential Monte Carlo (SMC) methods have been successfully applied to biological systems [18,19].
For the case of parameter estimation when likelihoods are analytically or computationally intractable, approximate Bayesian computation (ABC) methods have been applied successfully [20,21]. ABC algorithms provide stable parameter estimates and are also relatively computationally efficient, therefore, they have been treated as substantial techniques for solving inference problems of various types of models that were intractable only a few years ago [19]. In ABC. the evaluation of the likelihood is replaced by a simulation-based procedure using the comparison between the observed data and simulated data [22]. Recently, a semi-automatic method has been proposed to construct the summary statistics for ABC [23]. These methods have been applied in a diverse range of fields such as molecular genetics, epidemiology and evolutionary biology etc. [24][25][26].
Despite substantial progress in the application of ABC to deterministic models, the development of inference methods for stochastic models is still at the very early stage. Compared with deterministic models, there are a number of open problems in the inference of stochastic models. For example, recent work proposed ABC to infer unknown parameters in stochastic differential equation models [27]. Our recent computational tests [28] showed the advantages and disadvantages of a published ABC algorithm for stochastic chemical reaction systems in [17]. In this work, we propose two novel algorithms to improve the performance of ABC algorithms using the simulated likelihood density.

Results and discussion
The first test system with four reactions We first examine the accuracy of our proposed methods using a simple model of four chemical reactions [29]. The first reaction is the decay of molecule S 1 . Then two molecules S 1 form a dimer S 2 in the second reaction; and this dimerization process is reversible, which is represented by the third reaction. The last reaction in the system is a conversion reaction from molecule S 2 to its product S 3 . All these four reactions are given by We start with an initial condition with S = (10000, 0,0) and rate constants of c = (0.1,0.002,0.5,0.04), which is termed as the exact rate constants in this test. The stochastic simulation algorithm (SSA) was used to simulate the stochastic system [30]. A single trajectory for this model during a period of T = 30 in a step size of Δt = 3 is presented in Figure 1.
When applying the algorithms in the Method section to estimate model parameters, we assumed the prior distribution for each estimated parameter follows a uniform distribution π(θ)~U(0, A). For rate constants c 1~c4 , the values of A are (0.5, 0.005, 1, 0.1). Figure 2 shows probabilistic distributions of the estimated rate constant of c 1 over iterations (2~5). In this test, we have the step size Δt = 3 and simulation number B k = 10. Figure 2 suggests that the probabilistic distribution starts from nearly a uniform distribution in the second iteration ( Figure 2A) and gradually converges to a normalized-like distribution with a mean value that is close to the exact rate constant.
There are two tolerance values in the proposed algorithms, namely a for the discrepancy in step 2.c and ∈ k for the fitness error in step 2.d. In the following tests, we considered two strategies: the value of a is a constant [31] or its value varies over iterations. To examine the factors that influence the convergence rate of particles over iterations, we calculated the mean count number for each iteration, which is the averaged number of counts for accepting all simulated estimation of parameter sets. The averaged error is defined by the sum of relative errors of each rate constant for each iteration. Table 1 displays the performances of the tests under three schemes which used fixed discrepancy tolerance a = 0.1, 0.05 or varying values of a. In each case, we used the same values of ∈ k for the fitness tolerance. The value of a in the varying a strategy equals the value of ∈ k , namely a k = ∈ k .
In these performances, we used ∈ k = (0.07, 0.06, 0.055, 0.05, 0.045) and (0.05, 0.045, 0.04, 0.035, 0.03) for algorithm 1 with step sizes Δt = 3 and 5, respectively. For algorithm 2, these values are ∈ k = (0.095, 0.08, 065, 0.05, 0.04) and (0.059, 0.055, 0.05, 0.045, 0.04). An interesting observation is that the values of mean count number are very large in the first iteration, then decrease sharply and stay within a value stably. We have a detailed test of using different values of the fitness tolerance ∈ k and found that when using step size of Δt = 3, mean count number stays at one if ∈ k ≥ 0.1; but it starts to increase sharply to a large number if ∈ k < 0.1. The observation numbers using a step size of Δt = 3 is 10 and the maximum error that can incur calculated from step 2.d is 0.1 with one hundred particles. Similarly, this critical ∈ k value is 0.06 for a step size of Δt = 5.
Meanwhile all averaged errors have a decreasing trend over iterations. Looking at different cases with various values of discrepancy tolerance a , it is also observed that using a = 0.1 results in more discrepancies of the estimated parameters on average than the other two cases, in particular, than the case a = 0.05. Thus in our following tests, we just concentrate on the cases of a = 0.05 and varying a. In addition, we observe that by taking a = 0.05 for the case with step size of Δt = 3, it leads to more accurate approximation since a = 0. 05 is less than most values of a in the case of varying values of α. It is consistent with the cases of a step size of Δt = 5 in which little differences can be found comparing strategies using a = 0.05 and a = ∈ k since the values of ∈ k are quite close to 0.05. In the case of varying values of α, a small value of ε 5 leads to a small value of a 5 , which results in a substantial increase in mean count number. However, this large mean count number does not necessary bring more accurate estimated parameters. With these findings, we simulated results using a = 0.05 and a = ∈ only for algorithm 2. Consistent results are obtained using algorithm 2. Moreover, results obtained using algorithm 2 is more accurate than those from algorithm 1.

The second test system with eight reactions
Although numerical results of the first test system are promising regarding the accuracy, that system has only four reactions. Thus the second test system, namely a prokaryotic auto-regulatory gene network, includes more reactions. This network involves both transcriptional and translational processes of a particular gene. In addition, dimers of the protein suppress its own gene transcription by binding to a regulatory region upstream of the gene [32][33][34]. This gene regulatory network consists of eight chemical reactions which are given below: This gene network includes five species, namely DNA, messenger RNA, protein product, dimeric protein, and the compound formed by dimeric protein binding to the DNA promoter site, which are denoted by DNA, mRNA, P, P 2 and DNA · P 2 , respectively. In this network, the first two reactions R 1 and R 2 are reversible reactions for dimeric protein binding to the DNA promoter site. Reactions R 3 and R 7 are transcriptional and translation processes for producing mRNA and protein, respectively. Reactions R 5 and R 6 represent the interchange between protein P and dimeric protein P 2. The system ends up with a degradation process of protein P [32].
In addition, the following reaction rate constants are used as the exact rate constants to generate a simulation for each molecular species during a period of T = 50 in a step size of Δt = 1 and results are presented by Figure 3. This simulated dataset is used as observation data for inferring the rate constants. The prior distribution of each parameter follows a uniform distribution π(θ)~U(0, B). For rate constants c 1~c8 , the values of B are (0.5,2,1,0.1,0.5, 5,1,0.1). The proposed two algorithms were implemented over five iterations and each iteration contains 100 particles. We choose step sizes Δt = 2 or 5 and the number of stochastic simulation B k = 10. Figure 4 gives the probabilistic distribution of the estimated rate constant c 7 over 2nd~5th iterations. The distribution of the first iteration is close to the uniform distribution, and this is not presented. Since the second iteration, the estimated rate constant begins to accumulate around the exact value c 7 = 0.2. At the last iteration, the probability in Figure 4D shows a normalized-like distribution. Compared with the results of system 1 in Figure 2, the convergence rate of the parameter distribution of system 2 is slower. Our numerical results suggested that this convergence rate depends on the strategy of choosing the values of discrepancy tolerance a.
To analyze the factors that influence the convergence property of estimates, the mean count number as well as the averaged error for each iteration k are obtained. Results are presented in Table 2. Using algorithm 1 and 2, we tested for step sizes of Δt = 2 and Δt = 5. Since the errors of estimates obtained using a fixed value of a = 0.1 are always larger than those obtained by a = 0.05, we only tested with the cases of a fixed value a = 0.05 and varying values of a. For algorithm 1, we tested two cases for the varying values of discrepancy tolerance a. In the first test, the values are ∈ k = (0.21,0.2,0.19,0.18,0.175) and a = ∈ k for varying values of a, which is the case "Same ∈ k " in Table 2. The values of ∈ k are also applied to the case of a fixed value a = 0.05. In this case, the averaged count number of varying a is much smaller than that of a fixed value of α. Thus we further decreased the value of a to (0.15,0.125,0.1,0.075,0.07), which is the case "Diff. ∈ k " in Table 2. In this case, the mean count numbers are similar to those using a fixed a. Numerical results suggested that the strategy of using a fixed value of a generates estimates with better accuracy than the strategies of using varying a values, even when the computing time of the varying a strategy is larger than that of the fixed a strategy.
For algorithm 2, we carried out similar tests. In the first case, we set ∈ k =(0.24,0.23,0.22,0.21,0.2), which is applied to the strategy of fixing a = 0.05 and varying a with a= ∈ k that is the case "Same ∈ k " in Table 2. Again, the averaged count numbers of varying α strategy are much smaller than those using a fixed a. Thus we decreased the value to (0.095,0.09,0.085,0.08, 0.075), which is the case "Diff. ∈ k " in Table 2; However, the averaged count numbers in the "Diff. ∈ k " case are similar to those of the previous two strategies, namely a fixed a and "Same ∈ k ". For algorithm 2, Table 2 suggests that the varying a strategy generates estimates that are more accurate than those obtained from the fixed a strategy. However, the best estimates in Table 2 are obtained using algorithm 1 and fixed a strategy.

Conclusions
To uncover the information of biological systems, we proposed two algorithms for the inference of unknown parameters in complex stochastic models for chemical reaction systems. Algorithm 1 is in the framework of ABC SMC and uses transitional density based on the simulations over two consecutive observation time points. Algorithm 2 generates simulations of the whole time interval but differs from the published method in the error finding steps by comparing errors of simulated data to experimental data at each time point. The proposed new algorithms impose stricter criteria to measure the simulation error. Using two chemical reaction systems as the test problems, we examined the accuracy and efficiency of proposed new algorithms. Based on the results of two algorithms for system 1, we discovered that taking smaller values of discrepancy tolerance a will result in more accurate estimates of unknown model parameters. This conclusion is confirmed by the second system that we tested under different conditions. Numerical results suggested that the proposed new algorithms are promising methods to infer parameters in high-dimensional and complex biological system models and have better accuracy compared with the results of the published method [28]. The encouraging result is that new algorithms do not need more computing time to achieve such accuracy. Our computational tests showed that the selection for the value of fitness tolerance is a key step in the success of ABC algorithms. The advantage of the population Monte-Carlo methods is the ability to reduce the fitness tolerance gradually over populations. Generally, a smaller value of fitness tolerance will lead to a larger number of iteration count and consequently larger computing time. For deterministic inference problems, a smaller value of fitness tolerance normally will generate estimates with better accuracy. However, for stochastic models, this conclusion is not always true. In addition to the fitness tolerance, our numerical results suggested that other factors, such as the simulation algorithm for chemical reaction systems and the strategy of discrepancy tolerance, also have influences on the accuracy of estimates. Thus more skilled approaches, such as the adaptive selection process for the fitness tolerance, should be considered to improve the performance of ABC algorithms.
In this work, we used the SSA to simulate chemical reaction systems [30]. This approach may be appropriate when the biological system is not large. In fact, for the two biological systems discussed in this work, the computing time of inference is still very large. To reduce the computing time, more effective methods should be used to simulate the biological systems, such as the τ-leap methods [35] and multi-scale simulation methods [36,37]. Another alternative approach is to use parallel computing to reduce the heavy computing loads. All these issues are potential topics for future research work.

ABC SMC algorithm
ABC algorithms bypass the requirement for evaluating likelihood functions directly in order to obtain the posterior distributions of unknown parameters. Instead, ABC methods simulate the model with given parameters, compare the observed and simulated data, and then accept or reject the particular parameters based on the error of simulation data. Thus there are three key steps in the implementations of ABC algorithms. The first step is the generation of a sample of parameters θ* from the prior distribution of parameters or from other distributions that are determined in ABC algorithms. The second step is to define distance function d(X, Y) between the simulated data X and experimental observation data Y. Finally, a tolerance value is needed as a selection criterion to accept or reject the sampled parameter θ*. Based on the generic form of ABC algorithm [17], a number of methods have been developed including ABC rejection sampler and ABC MCMC [38,39]. The ABC rejection algorithm is one of the basic ABC algorithm that may result in long computing time when a badly prior distribution that is far away from posterior distribution is chosen. ABC MCMC introduces a concept of acceptance probability during the decision making step which saves computing time. However, this may result in getting stuck in the regions of low probability for the chain and we may never be able to get a good approximation. To tackle these challenges, the idea of particle filtering has Three strategies are used to choose the discrepancy tolerance a: a fixed value of a= 0.05; varying avalues; and a= ∈ k denoted as same ∈ k ); varying avalues that are smaller than ∈ k (denoted as diff. ∈ k ).(AE:Averaged Error; MN: Mean count Number).
been introduced. Instead of having one parameter vector at a time, we sample from a pool of parameter sets simultaneously and treat each parameter vector as a particle. The algorithm starts from sampling a pool of N particles for parameter vector θ through prior distribution π(θ).
The sampled particle candidates (θ * 1 , · · · , θ * N ) will be chosen randomly from the pool and we will assign each particle a corresponding weight ω to be considered as the sampling probability. A perturbation and filtering process following through a transition kernel q(·|θ*) finds the particles θ**. Similarly with θ**, data Y can be simulated and compared with experimental data X to further fulfil the requirements for estimating posterior distribution.
3. If k = 1, sample θ * from the proposed prior distribution π(θ). Generate a candidate data set D( b )(θ*) B k times and calculate the value of b k (θ*), where D( b )~p (D|θ) for any fixed parameter θ, and D 0 is the experimental data set. If k > 1, sample θ from the previous population θ i k−1 with weights w k−1 and perturb the particle to obtain θ * using a kernel function K k . If π(θ*) = 0 or b k (θ*) = 0, return to the beginning of step 3.
4. Set θ i k = θ * and determine the weight for each estimated particles θ i k , If i <N, update i = i + 1 and return to step 3. 5. Normalize the weights w (i) k If k <K, update k = k +1 and go back to step 2.
A number of algorithms have been developed using the particle filtering technique, such as the partial rejection control, population Monte-Carlo and SMC. Each of them differs in the formation of weight w and the transition kernels.
ABC using simulated likelihood density ABC SMC method uses the simulation over the entire time period to measure the fitness to experimental data, which is consistent to the approaches used for deterministic models [17]. For stochastic models, the widely used approach is treating transitional density as the likelihood function [40,41]. Based on a sequence of n + 1 observations X = [X 0 , X 1; · · ·, X n ] at time points [t 0 , t 1; · · ·, t n ], for a given parameter set θ the joint transitional density is defined as (2) where f 0 [·] is the density of initial state, and is the transitional density starting from (t i−1 , X i −1 ) and evolving to (t i , X i ). When the process X is Markov, the density (3) is simplified as In the simulated likelihood density (SLD) methods, this transitional density is approximated by that obtained from a large number of simulations.
Based on the discrete nature of biochemical reactions with low molecular numbers, it was proposed to use the frequency distribution of simulated molecular numbers to calculate the transitional density [31]. The frequency distribution is evaluated by using B k simulations with the simulated state X ml . Here the function δ(x) is defined by where d(x, y) is a distance measure between x and y.
Here we propose a new algorithm that uses the simulated transitional density function as the objective function. Unlike ABC SMC algorithm [17], the new method considers the transitional density function from t i −1 to t i only at each step. Based on the framework of ABC SMC, the new algorithm using transitional density is proposed as follows.
(c) For m = 1, ·· ·, B k , calculate the value of discrepancy and test for where a is a defined constant.
If it is true, let b ml (θ*) = 0, otherwise it is one. Then determine If ∈ <∈ k , update θ k i = θ * and move to the next particle i = i +1. An alternative approach is to generate simulations over the observation time period but compare the error to experimental data at each time point. The approach locates somewhere between ABC SMC algorithm [17] and the proposed Algorithm 1. which is presented below. For simplicity we do not give a detailed algorithm, but just provide the key steps 2.b)~2.d) that are different from those in Algorithm 1.
ABC SLD algorithm 2 2.b) Generate data Y B k times using θ*.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions TT conceived and designed the study. QW and TT developed algorithms and carried out research. QW, KS and TT analyzed the data, interpreted the results and wrote the paper. All authors edited and approved the final version of the manuscript.