A comparison of approximation techniques for variance-based sensitivity analysis of biochemical reaction systems

Background Sensitivity analysis is an indispensable tool for the analysis of complex systems. In a recent paper, we have introduced a thermodynamically consistent variance-based sensitivity analysis approach for studying the robustness and fragility properties of biochemical reaction systems under uncertainty in the standard chemical potentials of the activated complexes of the reactions and the standard chemical potentials of the molecular species. In that approach, key sensitivity indices were estimated by Monte Carlo sampling, which is computationally very demanding and impractical for large biochemical reaction systems. Computationally efficient algorithms are needed to make variance-based sensitivity analysis applicable to realistic cellular networks, modeled by biochemical reaction systems that consist of a large number of reactions and molecular species. Results We present four techniques, derivative approximation (DA), polynomial approximation (PA), Gauss-Hermite integration (GHI), and orthonormal Hermite approximation (OHA), for analytically approximating the variance-based sensitivity indices associated with a biochemical reaction system. By using a well-known model of the mitogen-activated protein kinase signaling cascade as a case study, we numerically compare the approximation quality of these techniques against traditional Monte Carlo sampling. Our results indicate that, although DA is computationally the most attractive technique, special care should be exercised when using it for sensitivity analysis, since it may only be accurate at low levels of uncertainty. On the other hand, PA, GHI, and OHA are computationally more demanding than DA but can work well at high levels of uncertainty. GHI results in a slightly better accuracy than PA, but it is more difficult to implement. OHA produces the most accurate approximation results and can be implemented in a straightforward manner. It turns out that the computational cost of the four approximation techniques considered in this paper is orders of magnitude smaller than traditional Monte Carlo estimation. Software, coded in MATLAB®, which implements all sensitivity analysis techniques discussed in this paper, is available free of charge. Conclusions Estimating variance-based sensitivity indices of a large biochemical reaction system is a computationally challenging task that can only be addressed via approximations. Among the methods presented in this paper, a technique based on orthonormal Hermite polynomials seems to be an acceptable candidate for the job, producing very good approximation results for a wide range of uncertainty levels in a fraction of the time required by traditional Monte Carlo sampling.


Background
Sensitivity analysis is an indispensable tool for the analysis of complex systems [1,2]. It is routinely used to investigate how uncertainty in input variables affects uncertainty in system response and to quantify the rela-tive importance of the input variables in influencing the response. In addition to many other areas of science and engineering, sensitivity analysis is used in systems biology to investigate the robustness and fragility properties of cellular systems, such as signaling, gene regulation, and metabolic networks [3][4][5][6][7][8][9][10][11], as well as in systems pharmacology [12], for designing novel pharmacological intervention strategies and for understanding drug action [13,14].
To study the sensitivity properties of a biochemical reaction system, such as a signaling network, we must construct a mathematical model that relates uncertainty in key biochemical factors of interest to a biologically relevant system response, and develop techniques for determining how factor uncertainty affects the system response. Since biochemical reaction systems are subject to physical laws, an important requirement is that sensitivity analysis must satisfy important thermodynamic constraints, such as the principle of detailed balance [15]. Bearing these in mind, we have proposed in [16] a probabilistic sensitivity analysis approach for biochemical reaction systems that uses the standard chemical potentials of the activated complexes of the underlying reactions and molecular species as the biochemical factors of interest and propagates factor uncertainty to a given system response in a thermodynamically consistent manner. Moreover, we have adopted a formal statistical approach to sensitivity analysis, known as variance-based sensitivity analysis [2,[17][18][19], which uses a set of indices to quantify the contribution of individual biochemical factors to the variance of the system response.
Unfortunately, it is not in general possible to analytically evaluate variance-based sensitivity indices. As a consequence, these indices are estimated by Monte Carlo sampling [2,16,18,20], which requires evaluation of the system response at each sample. A major drawback of this approach is its slow rate of convergence. As a matter of fact, the error produced by a naive Monte Carlo estimation approach decreases with an error rate of O (1/ ), where L is the number of Monte Carlo samples used [21]. Hence, accurate estimation of the sensitivity indices requires a large number of Monte Carlo samples and, therefore, a large number of system response evaluations. This makes Monte Carlo estimation of variance-based sensitivity indices computationally very expensive, especially in the case of biochemical reaction systems comprised of a large number of reactions and molecular species.
To reduce the computational burden of Monte Carlo estimation, it is imperative that we develop techniques which can produce sufficiently accurate estimates of the sensitivity indices in a fraction of the time required by Monte Carlo sampling. In this paper, we present four such techniques and apply them to a well-known biochemical reaction model of the mitogen-activated protein kinase (MAPK) signalling cascade. The first technique is based on a second-order Taylor series expansion of the response function and is an extension of the first-order derivative-based approach for variance-based sensitivity analysis discussed in [2,18,19,22] by including secondorder derivative terms. The other approximation techniques are based on the high-dimensional model repre-sentation (HDMR) schemes developed by H. Rabitz and his coworkers [23][24][25]. We use analytical derivations, provided in the Additional file 1 accompanying this paper, and sensitivity analysis results generated by the four methods, to clarify the relative merits of each approximation technique and produce useful insights on when these techniques can be used for sensitivity analysis of biochemical reaction systems. We have coded the sensitivity analysis techniques discussed in this paper using MAT-LAB ® . Interested readers can request a copy of the software, and the entire set of data obtained with this software, by contacting the corresponding author.
We should mention here that, in systems biology, the most commonly used sensitivity analysis techniques are based on derivatives of molecular concentrations or other system responses, known as control coefficients [3]. These differential methods are based on a Taylor series approximation of the response function and, as such, are subject to several drawbacks that must be carefully considered before applying them to problems of systems biology. For example, derivative-based sensitivity indices assess the sensitivity properties of a biochemical reaction system around a set of reference input values. Their performance usually depends on the particular choice of these values, due to the nonlinear nature of the response function. For the results to be relevant, the reference values must be the true values, which are usually not known in practice. As a consequence, derivative-based sensitivity analysis techniques are limited by the quality of the underlying Taylor series approximation. Moreover, and due to our difficulty in accurately evaluating high-order derivatives, differential sensitivity analysis techniques are usually limited in practice to assessing the effect of one input factor on the system response, by keeping all other factors fixed to their reference values. This is usually not adequate, since we are most often interested in the effects of multiple biochemical factors on the system response. Finally, traditional differential analysis cannot cope with probabilistic uncertainty in biochemical factor values, unless it is combined with variance-based sensitivity analysis (as it is done by the first approximation technique considered in this paper). It turns out that variance-based sensitivity analysis does not depend on the additivity or linearity of the system model and can be naturally used to quantify the simultaneous effect of probabilistic biochemical factor uncertainty on the system response [2,18]. For this reason, it provides a very attractive and powerful approach for sensitivity analysis of biochemical reaction systems.
We should finally mention that a number of alternative approximation techniques for variance-based sensitivity analysis have been proposed in the literature [26][27][28][29]. In these techniques, the original response function is approximated by a surrogate function and the sensitivity L indices are then estimated by Monte Carlo sampling based on that function. Reduction in computations is achieved by the fact that the time required for computing the system response at each Monte Carlo iteration using the surrogate function is much smaller than computing the response using the original function (whose evaluation requires solving a system of ordinary differential equations). However, the computations associated with these techniques are still substantial, since they must employ a large number of samples to sufficiently reduce the Monte Carlo estimation error. By contrast, the techniques discussed in this paper are based on surrogate functions that lead to analytical formulas for the sensitivity indices, thus avoiding Monte Carlo estimation. As a matter of fact, the computational cost for calculating the variance-based sensitivity indices using the techniques discussed in this paper is mainly associated with the problem of estimating the underlying parameters of the surrogate function used, which leads to appreciable computational savings over the techniques proposed in [26][27][28][29].

Biochemical reaction systems
In this paper, we consider a well-stirred (homogeneous) biochemical reaction system at constant temperature and volume that consists of M coupled reactions of the form: where κ 2m-1 , κ 2m ≥ 0 are the normalized rate constants of the forward and reverse reactions (measured in s -1 ) and ν nm , ≥ 0 are the stoichiometry coefficients of the reactants and products. We assume that the system consists of N molecular species X 1 , X 2 , ..., X N , with concentrations (measured in molecules/cell) at time t ≥ 0 given by q 1 (t), q 2 (t), ..., q N (t), respectively. We characterize the dynamic evolution of molecular concentrations by the following chemical kinetic equations: where is the stoichiometry coefficient of the n th molecular species associated with the m th reaction and is the flux of the m th reaction at time t. The sensitivity analysis approach we consider here is based on quantifying the influence of a reaction or molecular species on an appropriately chosen response characteristic R of a biochemical reaction system. We employ a well-known model of the MAPK signaling cascade (see Figure 1 and Additional file 2 for details on this model) and consider three response characteristics with established biological significance, namely the duration D, integrated response I, and strength S of the doubly phosphorylated extracellular signal-regulated kinase (ERK-PP), defined by where q(t) is the concentration profile of ERK-PP and t 0 is the time at which q(t) converges to zero. If convergence to zero does not occur within the observation time interval [0, t max ], then we set t 0 = t max . We choose to work with the duration, integrated response, and strength of ERK-PP activity, since it has been experimentally observed that differences in duration and strength may cause distinct biological outcomes, such as cell differentiation, proliferation, and apoptosis [30][31][32][33][34], whereas, the integrated response directly correlates with DNA synthesis [35,36]. We take the system response R to be the logarithm of the duration, integrated response, or strength; that is, we take R to be lnD, lnI, or lnS. This reduces the effect of outliers and increases the efficiency of numerically evaluating the indices associated with variance-based sensitivity analysis [16].

Variance-based sensitivity analysis
We employ the variance-based sensitivity analysis approach for biochemical reaction systems we recently introduced in [16]. This method is based on a biophysically-derived probabilistic model for the rate constants of a biochemical reaction system. According to this model, we treat the rate constants κ 2m-1 and κ 2m as random variables K 2m-1 and K 2m , given by the Eyring-Polanyi equations [37]  where k B is the Boltzmann constant (k B = 1.3806504 × 10 -23 JK -1 ), T is the system temperature, h is the Planck constant (h = 6.62606885 × 10 -34 Js), is the (random) capacity of the activated complex associated with the m th reaction, and C n is the (random) capacity of the n th molecular species.   [16]. 13 14 , 0 k k = 15 16 , 17 18 , 19 20 , k k 21 22 , 23 24 , k k Pho2 12 ( ) x 25 26 ,
Our variance-based sensitivity analysis technique assesses how uncertainty in the rate constants of a biochemical reaction system affect the system response. As a consequence of (3), (4), and (5), we have that where are the nominal values of the rate constants, with Equation (6) suggests that uncertainty in the forward and reverse reaction rates occurs due to uncertainty in the standard chemical potentials of the activated complexes associated with the reactions and the standard chemical potentials of the reactants.
As a consequence of the previous model, we investigate the sensitivity properties of a biochemical reaction system due to the uncertainty in the standard chemical potentials. To simplify notation, we use W = {W 1 , W 2 , ..., W J } to denote the random variables Y ‡ and Y. We consider two cases, namely J = M and W j = , for j = 1, 2, ..., M, as well as J = N and W j = Y j , j = 1, 2, ..., N. In the first case, the standard chemical potentials of the molecular species are assumed to be fixed, whereas, the standard chemical potentials of the activated complexes are perturbed randomly. Our objective is to investigate the importance of reactions in influencing the system response and, for this reason, we refer to this case as reaction-oriented sensitivity analysis (ROSA) [16]. In the second case, the standard chemical potentials of the activated complexes are assumed to be fixed, whereas, the standard for chemical potentials of the molecular species are perturbed randomly. In this case, our objective is to investigate the importance of molecular species in influencing the system response. For this reason, we refer to this case as species-oriented sensitivity analysis (SOSA) [16]. Given the response R of a biochemical reaction system with random factors W, its total variance V tot := Var[R(W )] satisfies [17,38,39]: where with similar expressions for the remaining terms. If the biochemical factors W are statistically independent (which we assume here to be true), then each term on the right-hand-side of (7) is nonnegative. This equation provides a decomposition of the total system response variance V tot into individual terms V 1 ,V 2 , ..., V 12 , .... It turns out that V j quantifies the average reduction in total response variance, obtained by keeping the j th biochemical factor fixed. As a consequence, we use V j to measure the singular influence of the j th biochemical factor W j on the system response. Moreover, the term V jj' quantifies the average reduction in the total response variance due to jointly fixing the two biochemical factors W j and W j' , not accounted for by summing the average reductions obtained by separately fixing these factors. Therefore, we use V jj' to measure the joint influence of the biochemical factors W j and W j' on the system response. Finally, higherorder terms in (7) quantify the joint influence of three or more biochemical factors on the system response.
In most practical situations, it is difficult to evaluate the high-order terms (≥ 3) in the response variance decomposition scheme given by (7). Although these terms are usually negligible at low to moderate levels of biochemical factor uncertainty, they may take substantial values at high levels [16]. Unfortunately, it is difficult to deal in practice with high-order variance terms. For this reason, it is quite convenient to base our sensitivity analysis effort only on the first-and second-order terms V j and V jj' . Then, instead of using the total system response variance V tot , we base our sensitivity analysis on its second-order portion V, given by : By using the probabilistic model given by (6) and the variance decomposition scheme in (9), we can develop a powerful (second-order) methodology for sensitivity analysis of biochemical reaction systems, similar to the one discussed in [16] that was based on the total response variance V tot . The method requires evaluation of two indices, namely the (second-order) single-effect sensitivity index (SESI) σ j , defined by and the (second-order) joint-effect sensitivity index (JESI) η j , defined by where Clearly, σ j quantifies the fractional singular contribution of the j th biochemical factor to the second-order portion V of the total response variance, whereas, η j quantifies the fractional contribution of the j th biochemical factor to V jointly with another factor. It turns out that, if σ j = η j = 0, then we can conclude that factor j does not influence the system response singularly or jointly with another factor (although, it may influence the system response jointly with two or more factors). On the other hand, if σ j > 0 and η j = 0, then we can conclude that factor j influences the system response singularly but not jointly with another factor. Moreover, if σ j = 0 and η j > 0, we can conclude that factor j does not influence the system response singularly but it does so jointly with some other factor, whereas, if σ j > 0 and η j > 0, we can conclude that factor j influences the system response both singularly and jointly with some other factor. In practice, we can set a small threshold θ to determine whether σ j and η j are sufficiently larger than zero.
Unfortunately, we cannot evaluate the exact values of the sensitivity indices σ j and η j . For this reason, we must resort to approximations. In this paper, we consider the possibility of employing one of five methods to accomplish this goal. We discuss these methods next and refer the reader to [16] and the accompanying Additional file 1 for details pertaining to their development and numerical implementation.

Monte Carlo estimation
A straightforward technique for approximating the SESI and JESI values is based on a Monte Carlo Latin hypercube sampling approach, whose details can be found in [16] (see also [2,20]). This approach can be used to provide estimates and of the second-order SESI's and JESI's by using 2L(J + 1) system evaluations [by integrating the system of N ordinary differential equations given by (1) and (2)], where L is the number of Latin hypercube samples used and J is the number of biochemical factors considered in the analysis. We refer to and as the (second-order) SESI's and JESI's obtained by Monte Carlo (MC) estimation. This method is computationally expensive, since a large number L of Latin hypercube samples is required to obtain sufficiently accurate estimates of the sensitivity indices.

Derivative approximation
A method for deriving approximations and of the sensitivity indices σ j and η j is to replace the response function R(w) by its second-order Taylor series approximation about w = 0, given by where are the first-and second-order partial derivatives of R at w = 0, and set Equation (13) and the statistical independence and zero-mean Gaussianity of the biochemical factors W j imply that where λ j is the standard deviation of W j , for j = 1, 2, ..., J. As a consequence, we obtain an analytical expression for the sensitivity indices and , which requires evaluation of the first-and second-order partial derivatives of the response function R(w), with respect to the biochemical factors, at w = 0. Although many techniques have been developed to compute response derivatives [40], for reasons we explain in Additional file 1, we choose to approximate the partial derivatives by symmetric finite differences. We refer to and given by (14), (15), and (16), as the (secondorder) SESI's and JESI's obtained by Derivative Approximation (DA). The resulting method requires 2J(J + 1) + 1 system integrations, which is quadratic in terms of the number J of the biochemical factors and is much smaller than the number 2L(J + 1) of system integrations required by MC, since J « L.

Polynomial approximation
Another way to approximate the sensitivity indices σ j and η j is to replace the response function R(w) by where the α's are parameters whose values must be appropriately determined so that (w) sufficiently approximates the response function R(w) in an appropriately chosen neighborhood around 0. Note that provides a polynomial approximation of the response function R(w). If is sufficiently close to R(w) in a neighborhood around 0, then the parameters α jj' coincide with the partial derivatives , 1 ≤ κ 1 , κ 2 ≤ 2, of R at w = 0. By using (17) and the statistical independence and zeromean Gaussianity of the biochemical factors W j , we can show that, in this case, and are given by (14) and (15), with As a consequence, we obtain again an analytical expression for the sensitivity indices and , which requires evaluation of the α parameters. This can be done by the polynomial regression approach we discuss in Additional file 1. We refer to and , given by (14), (15), and (18), as the (second-order) SESI's and JESI's obtained by Polynomial Approximation (PA). The resulting method is based on the approach proposed in [41] and requires J(J -1)S 2 /2 + JS + 1 system integrations, which is quadratic both in terms of the number J of biochemical factors and the number S of the samples per factor used in the regression. Note that J(J -1)S 2 /2 + JS + 1 . 2J 2 (S/2) 2 , for sufficiently large J. This number is much smaller than the number 2L(J + 1) . 2LJ of system integrations required by MC, since L Ŭ J(S/2) 2 , but larger than the number 2J(J + 1) + 1 . 2J 2 of system integrations required by DA, since S > 2.

Gauss-Hermite integration
We can obtain a more accurate approximation of the response function R(w) than the one given by (13) 0 by removing all terms that involve partial derivatives with respect to more than two factors [note that the approximation given by (13) is obtained from the Taylor series expansion by truncating all terms that involve partial derivatives of order greater than two]. In this case, we can show that where as we explain in Additional file 1. The approximations and are now given by (14) and (15), with where e 0 , e j , and e jj' are given by Note that evaluation of and requires only oneand two-dimensional integrations, which can be numeri-cally done by a standard Gauss-Hermite integration approach. For this reason, we refer to and , given by (14), (15), (19), (20), and (21), as the (second-order) SESI's and JESI's obtained by Gauss Hermite Integration (GHI). The resulting method is based on the approach proposed in [42,43] and requires 2J(J -1)NQ/2Q 2 + 2JNQ/2Q + 1 system integrations, which is quadratic both in terms of the number J of biochemical factors and the order Q of Gauss-Hermite integration used. Note that, if the number S of the samples per factor used in the regression associated with the PA is even, and Q = S or Q = S + 1, then GHI requires the same number of system integrations as PA.

Orthonormal Hermite approximation
The last method we consider for approximating the sensitivity indices σ j and η j is based on replacing the response function R(w) by where the α's and are parameters whose values must be appropriately determined so that sufficiently approximates the response function R(w) over the entire space of biochemical factor values. Note that provides a polynomial approximation of the response function R(w), similar to the one given by (17). However, the polynomials used in the approximation given by (22) are orthonormal Hermite polynomials, as opposed to the polynomials used in the approximation given by (17), which are standard second-and fourth-order polynomials. Note also that the approximation given by (22) is "global," in the sense that it is based on approximating the system response function R(w) over the entire factor space, whereas, the approximation given by (17) is "local," in the sense that it approximates the system response function R(w) in a neighborhood around w = 0.
By using (22), the orthonormality of the Hermite polynomials, and the statistical independence and zero-mean Gaussianity of the biochemical factors W j , we can show that and are given by (14) and (15), with As a consequence, we obtain again an analytical expression for the sensitivity indices and , which requires evaluation of the α parameters. This can be done by polynomial regression based on the Monte Carlo Latin hypercube sampling approach we discuss in Additional file 1.
We refer to and , given by (14), (15), and (23), as the (second-order) SESI's and JESI's obtained by Orthonormal Hermite Approximation (OHA). The resulting method is based on the approach suggested in [44][45][46][47] and requires L system integrations, where L is the number of regression points obtained by Latin hypercube sampling. We here take the number of regression points used to be the same as the number of Latin hypercube samples employed by MC, although these two numbers can be different in general. As a consequence, the number of system integrations performed by OHA is smaller than the number 2L(J + 1) of system integrations used in MC by a factor of 2(J + 1), but it could be larger than the number of system integrations required by DA, PA, or GHI.

Results
We now employ the previously discussed techniques to estimate the variance-based sensitivity indices σ j and η j associated with the duration, integrated response, and strength of ERK-PP activity. We do this by considering the dynamic behavior, within a time frame of 6 hours, of the MAPK signaling cascade model depicted in Figure 1 (see Additional file 2 for more details on this model). As we have explained in the previous section, we consider two cases: ROSA and SOSA. In each case, we need to set values for the standard deviations { , m = 1,2, ..., M} of the standard chemical potentials of the activated complexes of the reactions and the standard deviations {λ n , n = 1,2, ..., N} of the standard chemical potentials of the molecular species. Due to difficulties in obtaining these values in practice, we assume here that = λ ‡ , for m = 1,2, ..., M, and λ n = λ, for n = 1,2, ..., N, and consider λ ‡ , λ as two "user-defined" parameters that quantify the perturbation levels in biochemical factor values. By following our previous work in [16], we perform sensitivity analysis with λ ‡ , λ = 0.1, 0.2, 0.3, 0.4. Finally, we employ L = 6,000 Latin hypercube samples in MC and OHA, S = 4 regression samples per factor in PA, and a Gauss-Hermite integration of order Q = 5 in GHI.
In our simulations, we use S = 4 regression points per biochemical factor, located at -2w, -w, w, and 2w, where w = λ ‡ for ROSA and w = λ for SOSA (i.e., we use regression points located at ± one and two standard deviations from 0). Note also that, as a consequence of equation (6) In Table 1, we summarize the number of system integrations and the equations used by each method. For ROSA-based sensitivity analysis (J = 21), the number of system integrations required by DA, PA, GHI, and OHA, are respectively only 0.35%, 1.30%, 1.30%, and 2.27% of that required by MC. For SOSA-based sensitivity analysis (J = 23), the number of system integrations required by DA, PA, GHI, and OHA, are respectively only 0.38%, 1.44%, 1.44%, and 2.08% of that required by MC.
We list the ROSA results in Tables 2, 3, and 4, whereas, we list the SOSA results in Tables S-3.1, S-3.2, and S-3.3 of Additional file 3. The results are given in percentages and have been truncated to the nearest integers. To reduce the size of the tables, we depict only the results associated with reactions whose truncated SESI or JESI values, estimated by MC, are at least 5%. We consider the SESI and JESI values obtained by MC as the "true" values. By following our previous work in [16], we classify reactions and molecular species into one of four categories of interest: singularly influential, jointly influential, singularly/jointly influential, and noninfluential. We do this by comparing their SESI and JESI values to a 10% threshold. Bold reaction numbers indicate SESI or JESI values, obtained by MC, that are above the 10% threshold. Note that a reaction is singularly influential if the corresponding SESI value is at least 10% and the JESI value is smaller than 10%, jointly influential if the JESI value is at least 10% and the SESI value is smaller than 10%, singularly/ jointly influential if both the SESI and JESI values are at least 10%, and noninfluential if both the SESI and JESI values are smaller than 10%.
In the remaining of this section, we discuss the ROSA results separately for each response characteristic. A similar discussion applies for the SOSA results presented in Additional file 3.

Duration
Estimation, by MC, of the ROSA-based sensitivity indices associated with the duration of ERK-PP activity produces values that change little with the size λ ‡ of the underlying perturbations; see Table 2. Moreover, the estimated SESI and JESI values indicate that the duration is primarily affected by reactions 4, 6, and 13 (refer to Figure 1 and Additional file 2 for identifying these reactions), which exert their influence only singularly (since the SESI values are larger than 10%, whereas the corresponding JESI values are less than 10%). As a matter of fact, all JESI values are negligible, which indicates that the log-duration may be approximately additive, at least within the range of the applied perturbations. Note that a multivariate response function is called additive if it can be decomposed into a sum of one-dimensional functions of one variable. Additive response functions do not produce high-order (≥ 2) joint effects and result in zero JESI values [2]. Although a linear response function is additive, the inverse is not necessarily true. It turns out that the SESI's associated with an additive response function can be well estimated by all previous approximation techniques.
From the results depicted in Table 2 (and Table S-3.1 in Additional file 3), it is clear that, as compared to MC, the DA, PA, GHI, and OHA consistently provide good approximations to the SESI and JESI values at all perturbation levels. Moreover, all methods can be used to correctly classify reactions 4, 6, and 13 as being singularly influential.

Integrated response
Estimation, by MC, of the ROSA-based sensitivity indices associated with the integrated response of ERK-PP activity produces the SESI and JESI values depicted in Table 3. These values indicate that the integrated response is primary influenced by reactions 4 and 6 (refer to Figure 1 and Additional file 2 for identifying these reactions). For small to moderate perturbations (i.e., for λ ‡ = 0.1, 0.2), reactions 4 and 6 influence the integrated response only singularly. However, for large perturbations (i.e., for λ ‡ = 0.3, 0.4), reaction 4 influences the integrated response both singularly and jointly (since both SESI and JESI values are at least 10%), whereas, reaction 6 still influences the integrated response only singularly.
It is clear from the results depicted in Table 3 (and  Table S-3.2 in Additional file 3) that all approximation techniques work relatively well for small to moderate perturbation levels (i.e., for λ ‡ = 0.1, 0.2), providing accurate SESI and JESI values, as compared to the values obtained Table 1   matter of fact, the DA is not capable of capturing secondorder joint effects and the resulting JESI values are very small. If we use the DA results to classify the reactions, then we will erroneously conclude that reaction 4 influences the integrated response only singularly, when λ ‡ = 0.3, 0.4. From the results depicted in Table 3 (and Table S-3.2 in Additional file 3), it is clear that, for large perturbations, GHI and OHA provide good approximations to the sensitivity indices. Moreover, the results indicate that OHA may be a better approximation technique than GHI (e.g., compare the SESI results obtained by GHI and OHA for reaction 4). On the other hand, the results obtained by PA are much better than the results obtained by DA. However, the performance of PA may deteriorate at high perturbation levels and may become inferior to GHI and OHA (e.g., compare the results obtained by PA, GHI, and OHA for reaction 4). Finally, it is clear that the sensitivity results obtained by GHI and OHA can be used to correctly classify all reactions.

Strength
Estimation, by MC, of the ROSA-based sensitivity indices associated with the strength of ERK-PP activity produces   Finally, the largest differences between the values obtained by GHI and MC are 5%, 7% and 5% for λ ‡ = 0.2, 0.3, 0.4, respectively. Once more, OHA consistently provides good results, which can be used to correctly classify the reactions at all perturbation levels.

Discussion
The previous numerical results demonstrate that, in terms of estimation accuracy, OHA is the best method and DA is the worst, whereas, PA and GHI are in between, with GHI slightly better than PA. To explain why this is so, we must investigate the sources of error introduced by each technique, which we summarize in Table 1.
The estimation error produced by the MC approach is mainly due to the finite number L of samples used and decreases slowly as L increases, regardless of the number J of biochemical factors used, at least theoretically. Note, however, that to achieve a certain level of accuracy in practice, we may also need to increase L as the number J of biochemical factors increases, due to the exponential growth in the volume of the biochemical factor space when adding extra dimensions ("curse of dimensionality").
There are two sources of error associated with DA. First, substantial errors may be introduced due to the fact that DA locally approximates the response function by a Taylor series expansion that includes only first-and second-order partial derivatives. Consequently, DA may not produce good estimates of the sensitivity indices under large perturbations, since a second-order Taylor series approximation of the response function may not be sufficiently accurate over the range of factor values generated by such perturbations. This is especially true when the response function is nonadditive (as it is the case with the log integrated response and the log strength of ERK-PP in the MAPK example). In such cases, large factor variations may produce substantial joint effects, which cannot be captured by a local second-order Taylor series approximation. This is evident by the fact that, under large perturbations, the JESI values obtained by DA, associated with the integrated response and strength, are significantly different than the ones produced by MC.
A second source of error associated with DA is the approximation of the first-and second-order derivatives of the response function by finite-differences. In our simulations, we approximate the first-and second-order partial derivatives of the response function by using equations (S-1.35) and (S-1.36) in Additional file 1, with Δ = 0.1. It has been pointed out in [1] that the resulting approximations must be carefully used, since it is difficult to theoretically predict, control, and numerically evaluate their accuracy. Although a number of techniques have been developed to deal with this problem [40], exact evaluation of the response derivatives usually requires simultaneous integration of a set of "sensitivity equations," together with the differential equations governing the underlying molecular concentration dynamics, which turns out to be a very difficult task due to stiffness of the resulting system of differential equations [1].
PA attempts to improve the accuracy of DA by adding high-order derivative terms in the Taylor series expansion of the response function. In addition to the first-and second-order partial derivatives used by the DA, the Taylor series expansion now includes third-and fourth-order partial derivatives that involve only two biochemical factors. Moreover, instead of approximating the derivatives by finite differences, the method avoids such computations by expanding the response function using FD-HDMR, by truncating all components of order ≥ 3, by respectively approximating the first-and second-order FD-HDMR components with second-and fourth-order polynomials, and by estimating the coefficients of these polynomials using regression (see Additional file 1 for details). Errors are introduced by truncating the FD-HDMR and locally approximating the resulting response function by a fourth-order polynomial including only single biochemical factors and pairs of factors. As a consequence, PA may not be able to accurately estimate some SESI and JESI values under large perturbations, since the underlying truncation and polynomial approximation of the response function may not be sufficiently accurate over the range of factor values generated by such perturbations. Note also that errors can be introduced due to estimating the polynomial coefficients by regression, a situation that cannot be evaluated and controlled easily. As a matter of fact, and counter to intuition, we cannot necessarily increase accuracy of estimation by using more samples per biochemical factor, especially when dealing with polynomial regression [48,49]. GHI attempts to improve the accuracy of estimating the sensitivity indices by employing the exact first-and second-order FD-HDMR components, and numerically calculating the required expectations and variances using Gauss-Hermite integrations (see Additional file 1 for details). Errors are introduced when truncating the FD-HDMR and evaluating the expectations and variances by one-and two-dimensional Gauss-Hermite integrations. Evaluating and controlling these errors is practically impossible. Note that higher-order Gauss-Hermite integrations do not necessarily produce higher accuracy. This is true only when the integrands are sufficiently smooth, in the sense that can be well-approximated by polynomials [49]. Truncation of the FD-HDMR essentially corresponds to a local approximation of the response function, although this approximation is expected to be more accurate than the Taylor series and polynomial approximations used by DA and PA, respectively. As a consequence, GHI may not be able to accurately estimate some SESI and JESI values under large perturbations, since the underlying FD-HDMR truncation may not be sufficiently accurate over the range of factor values generated by such perturbations.
Finally, the errors introduced by OHA are due to approximating the ANOVA-HDMR expansion of the response function by first-and second-order ANOVA-HDMR components, approximating these components with first-and second-order orthonormal Hermite polynomials, and estimating the coefficients of these polynomials using regression (see Additional file 1 for details). Here, the truncation of high-order ANOVA-HDMR components does not correspond to a local approximation of the response function, which is why this approximation is more accurate than truncating the FD-HDMR components, as in GHI. In fact, if we consider perturbation levels at which the higher-order (≥ 3) terms in the variance decomposition scheme given by (7) are negligible, then the higher-order (≥ 3) terms in the ANOVA-HDMR decomposition of the response function will be negligible as well [see equation (S-1.30) in Additional file 1]. This is not necessarily true for the higher-order terms in the FD-HDMR decomposition. Therefore, truncating the ANOVA-HDMR decomposition of the response function, as opposed to the FD-HDMR decomposition, is well justified for perturbation levels at which the response variance is not appreciably influenced by high-order joint effects. Under very large perturbations, OHA may not accurately estimate the sensitivity indices, since the underlying truncation of ANOVA-HDMR may not be accurate enough due to appreciable high-order (≥ 3) joint effects in the response variance. However, the global nature of the approximation methodology employed by OHA, the direct relationship between ANOVA-HDMR and the response variance decomposition scheme given by (7), and the orthonormality properties of the Hermite polynomials, make OHA the most desirable technique for approximating the sensitivity indices, among the techniques considered in this paper.
Although we have also obtained simulation results for other biochemical reaction systems, due to lack of space, we have limited our presentation in this paper to the results obtained for the MAPK model depicted in Figure  1. To illustrate various aspects of the approximation techniques and their relative merits, we have chosen the response functions to represent three types of highdimensional system responses: the log duration, lnD, is approximately additive for the levels of biochemical factor uncertainty considered in this paper, the log integrated response, lnI, is moderately nonadditive, whereas, the log strength, lnS, is highly nonadditive. Based on our experience so far, all our simulation results are consistent with each other and perfectly agree with the theoretical analysis presented in this paper. We therefore believe that the conclusions based on the MAPK model are general and can be applied to other biochemical reaction systems as well.
It is very important to keep in mind that the four approximation techniques considered in this paper are based on the assumption that, for most biochemical reaction systems of interest, perturbations of input biochemical factors will produce only single and second-order joint effects at the output. As a consequence, truncating the HDMR of the response function to a second-order is a natural thing to do. Note that this assumption depends on the particular choice of the biochemical factors used, on how the system response relates to these factors, and on the perturbation levels used for sensitivity analysis. In general, the approximation methods discussed in this paper are expected to fail in the presence of high-order ≥ 3 joint effects among biochemical factors. Therefore, it may be necessary in these cases to consider truncated HDMR's that include higher-order basis functions. Extension to this case is straightforward but computationally demanding, since higher-order cases require evaluation of a large number of variance terms in the decomposition scheme given by (7), which can be a tedious thing to do for large biochemical reaction systems.
We should point out here that GHI is based on the methodology proposed in [42,43], which has been effectively used to calculate statistical moments of the responses of high-dimensional mechanical systems subject to randomly fluctuating loads. In this paper, we have reformulated this method to fit the framework of variance-based sensitivity analysis and have applied it to biochemical reaction systems. On the other hand, OHA is based on the methodology proposed in [25,44,45,50] for approximating ANOVA-HDMR's using orthonormal basis functions. OHA can also be viewed as a special case of the polynomial chaos expansion (PCE) approach to sensitivity analysis discussed in [46,47,51], and has been recently employed in [52] for estimating variance-based sensitivity indices in order to learn the topology of a functional network of interactions from given data. To our knowledge, this is the first time that the four approximation techniques presented in this paper are systematically compared to each other and used to study the sensitivity properties of biochemical reaction systems.
To conclude, we would like to stress the fact that the approximation techniques presented in this paper have been derived by assuming that the biochemical factors used for sensitivity analysis are statistically independent and that each factor follows a Gaussian distribution. The assumption of statistical independence between the random variables { , m = 1,2, ..., M} and {Y n , n = 1, 2, ..., N} has been justified in [16]. However, justifying mutual independence within the sets { , m = 1, 2, ..., M} and {Y n , n = 1,2, ..., N} is a very difficult thing to do. We simply view this assumption as a convenient approximation that allows us to proceed with the sensitivity analysis approaches discussed in this paper. Developing variancebased sensitivity analysis for correlated biochemical factors is a challenging problem that needs careful investigation [2,53]. On the other hand, if the biochemical factors follow non-Gaussian distributions, such as uniform, gamma, binomial, etc., the approximation techniques must be appropriately modified to accommodate these distributions. For example, if each biochemical factor follows a uniform distribution, then we must replace the Gauss-Hermite integration step in GHI by Gauss-Legendre integration [49]. Moreover, if the biochemical factors follow gamma distributions, then we must replace the orthonormal Hermite polynomials in OHA by orthonormal Laguerre polynomials [47,51].

Conclusions
In this paper, we discussed four methods that one can use to analytically approximate the second-order sensitivity indices associated with a previously introduced variancebased sensitivity analysis methodology for biochemical reaction systems. The need for developing such methods stems from an effort to remedy the large computational burden associated with Monte Carlo estimation. We highlighted important theoretical, numerical, and computational aspects of each method, in an attempt to provide a comprehensive understanding of the advantages and disadvantages of each technique. Our simulation results, based on a mathematical model for the MAPK signalling cascade, clearly demonstrate the inferiority of second-order derivative-based sensitivity analysis at moderate to high levels of uncertainty. It also shows the superiority of OHA, which is constructed by truncating the ANOVA-HDMR of the response function of a biochemical reaction system and approximating the firstand second-order ANOVA-HDMR component functions with orthonormal Hermite polynomials.