A method for estimation of elasticities in metabolic networks using steady state and dynamic metabolomics data and linlog kinetics

Background Dynamic modeling of metabolic reaction networks under in vivo conditions is a crucial step in order to obtain a better understanding of the (dis)functioning of living cells. So far dynamic metabolic models generally have been based on mechanistic rate equations which often contain so many parameters that their identifiability from experimental data forms a serious problem. Recently, approximative rate equations, based on the linear logarithmic (linlog) format have been proposed as a suitable alternative with fewer parameters. Results In this paper we present a method for estimation of the kinetic model parameters, which are equal to the elasticities defined in Metabolic Control Analysis, from metabolite data obtained from dynamic as well as steady state perturbations, using the linlog kinetic format. Additionally, we address the question of parameter identifiability from dynamic perturbation data in the presence of noise. The method is illustrated using metabolite data generated with a dynamic model of the glycolytic pathway of Saccharomyces cerevisiae based on mechanistic rate equations. Elasticities are estimated from the generated data, which define the complete linlog kinetic model of the glycolysis. The effect of data noise on the accuracy of the estimated elasticities is presented. Finally, identifiable subset of parameters is determined using information on the standard deviations of the estimated elasticities through Monte Carlo (MC) simulations. Conclusion The parameter estimation within the linlog kinetic framework as presented here allows the determination of the elasticities directly from experimental data from typical dynamic and/or steady state experiments. These elasticities allow the reconstruction of the full kinetic model of Saccharomyces cerevisiae, and the determination of the control coefficients. MC simulations revealed that certain elasticities are potentially unidentifiable from dynamic data only. Addition of steady state perturbation of enzyme activities solved this problem.

ulated via the use of recombinant DNA technology [1]. Within the purpose of metabolic engineering, i.e. rational redesign of the metabolic systems, the highly relevant question of which (combination of) perturbation should be applied in order to increase the productivity of the microorganism is addressed. The answer to this question requires information on both the regulatory level and the metabolic level. Ter Kuile and Westerhoff showed by their Hierarchical Control Analysis that the pathway flux is rarely controlled solely by gene expression, but that metabolite levels are also relevant [2].
In this context, we focus on the metabolic reaction network level. Modeling metabolic reaction systems is usually based on stoichiometric sometimes followed by kinetic modeling.
One of the initial steps in the modeling of metabolic reaction networks is to determine the structure and steady state characteristics of a given network using stoichiometric information alone. Steady state models describe time invariant fluxes, gathered from steady state experiments; hence they reflect the structural characteristics of the system. Metabolic Flux Analysis (MFA) and Metabolic Network Analysis (MNA) were developed as powerful tools to analyze such flux data. At steady state, the mass balances over the metabolites in the metabolic network yield a set of linear relations between the metabolic fluxes which can be expressed as: Where S is the (m × r) stoichiometric matrix and v is the (r × 1) vector of metabolic fluxes, where m is the number of balanced metabolites and r is the number of fluxes. Here, the system is in most of the cases underdetermined so that there are an infinite number of possible solutions. The realized solution depends on the kinetic properties of the involved reactions; this information is seldom known. To bypass the need of information on kinetics of individual reactions, alternative mathematical approaches have been proposed in the past to obtain a unique solution. An example is the constraint based optimization approach which is based on assumed optimality criteria, e.g. maximum growth, given biochemical, thermodynamic and irreversibility constraints and maximal reaction rates [3][4][5]. Later, Segre et al. proposed the optimality constraint that requires maximization of biomass formation while minimization of metabolic adjustment (MOMA) in order to obtain a unique flux distribution of a mutant strain. In their approach, they defend their optimality criterion that a knock-out mutant strain would optimize its biomass production rate by changing minimally its metabolic fluxes from the wild type strain [6]. An alternative metabolic modeling framework which uses a fitness function is the cybernetic approach. This approach assumes that an organism is an optimal strategist in utilizing all available sources with maximum efficiency. The expression and activity of the enzymes that catalyze network functionality are regulated by cybernetic control variables obtained from the solution of a constrained optimization problem [7][8][9].
Despite a number of successful applications especially in mixed substrate and prediction of knock-out lethality, all stoichiometric modeling approaches have their limitations, e.g. they can not predict time courses of the cellular processes, are based on "assumed goals" of the cell, and do not give insight in molecular events occurring in the cells, since information about the kinetic properties of the individual enzymes are not required. In the light of the above arguments, it is apparent that, to advance our understanding of the (dis)functioning of living cells a systems biology approach is needed, whereby the use of dynamic mathematical models of metabolic reaction networks to describe the complex kinetic behavior and interactions (allosterical, feedback and feed forward effects, cofactor coupling, compartmentation, intracellular transport, etc.) is becoming increasingly relevant.
The kinetic behavior of many important enzymes occurring in metabolic networks have been studied extensively, however these studies have generally been performed under non-physiological conditions in test tubes (in vitro), and therefore the applicability of these results to the in vivo metabolism is doubtful [10][11][12][13]. Teusink et al. showed that discrepancies exist between the in vivo measured changes of the concentrations of the glycolytic metabolites and their estimates using models based on mechanistic rate equations and in vitro determined parameters [12]. This basic problem invalidates detailed models of metabolism containing kinetic parameters which have been determined in vitro. Therefore, it is preferred to base the kinetic analysis of metabolic networks on in vivo studies of intact cellular networks. These in vivo studies are based on steady state and/or dynamic perturbations of a metabolic network starting from a reference steady state that is defined by its fluxes, enzyme activities, metabolite levels, extracellular concentrations, and kinetic parameters.
It is also important to notice that the number of parameters that is typically involved in the traditional mechanistic equations is very large which causes identifiability problems due to parameter insensitivity. Despite the information richness of data obtained from dynamic perturbations in in vivo experiments, there is a limit for identification of the parameters. For these reasons, it seems justified not to limit ourselves to the available complex mechanistic enzyme kinetics which must also be consid-ered as approximations of the true in vivo behavior [14,15]. Approximative enzyme kinetic formats, which contain much fewer kinetic parameters are therefore of general interest. An overview of different approximative kinetic formats (linear, power law, loglin and linlog kinetics) used in metabolic network modeling is given by Heijnen [16].
One of the proposed formats is, linlog kinetics, which has been introduced for modeling of in vivo kinetics and for metabolic redesign, and shown to have a good approximation quality, standardized format and relatively few parameters [16,17]. In linlog kinetics, all the rate equations have the same mathematical structure in which the relation between rates and enzyme levels is proportional, while for metabolite levels, a linear sum of logarithmic concentration terms is proposed. All variables are considered relative to a reference steady state (Eq. (2)). The linlog approximation is valid in the neighborhood of the reference state (defined by J 0 , x 0 and c 0 in equation (2)), but quite large changes of metabolite concentrations, enzyme levels and fluxes are allowed [17]. The parameters ( and ) in the kinetic equations are the same scaled elasticities that are used in Metabolic Control Analysis (MCA). It is important to note that the elasticity parameters appear in the model in a linear fashion.
When the elasticities are known, a full dynamic model of the whole metabolic network can be set up using linlog kinetics. Such a model allows in principle the calculation of control coefficients also under dynamic conditions. In linlog kinetics, the elasticities are the kinetic parameters represented in the elasticity matrix. From these, and a given network structure specified in the form of a stoichiometric matrix, the control coefficients for a reference condition can be calculated from the summation and connectivity relations developed in the framework of MCA. Also the change in control coefficients upon large changes in enzyme levels can be calculated [18]. Moreover, the linlog formulation enables the analytical solution of the mass balances for steady state metabolite and flux levels in the metabolic network, providing the solution of the metabolic redesign problem, i.e. determination of the optimal enzyme levels that maximize a certain flux while the total amount of enzyme and the metabolite levels are con-strained. Visser et al., reported a successful application of linlog kinetics in an in silico study that aims to solve this metabolic redesign problem [19].
In order to determine the kinetic parameters of a model of a given in vivo metabolic system, in vivo perturbations of the complete metabolic network have to be performed. There are two main types of perturbations that can be imposed on the system: steady state and dynamic perturbations. In steady state perturbations, usually the enzyme activity of one (sometimes more) of the reactions is changed by adding specific inhibitors or activators or by genetically changing the enzyme activity, resulting in a new steady state. In steady state perturbations an important problem has been addressed by Kacser and Burns, which is the determination of the set of reactions that has to be perturbed in steady state fashion, in order to be able to determine all elasticities for a given metabolic network which resulted in their "double modulation" [20]. They showed that for a simple linear chain of reaction, perturbation of the activities of the first and last enzymes allows determination of the elasticities of all enzymes of the chain under the condition that each enzyme is only responsive to its substrate and product, which rules out the fact that feedback loops are present. The theoretical basis is presented in later studies which showed that determination of the elasticities for any enzyme in such a simple chain requires two perturbations, one upstream and one downstream of the enzyme concerned [21]. Giersch and Cornish-Bowden extended the double perturbation approach to more complex pathways containing branch points, regulatory loops, and conserved moieties and they provided guidelines to list the possible reactions to be modulated in order to determine the elasticities for arbitrary metabolic networks [22]. From the obtained list a minimum set of steady state perturbations, to be imposed on a specific network, can be chosen. As an alternative to the analysis of Giersch and Cornish-Bowden, Hofmeyr and Cornish-Bowden offered co-response analysis, to identify the mono-functional units that respond together to any perturbation applied [23]. These mono-functional units have to be dissected in order to determine the elasticities belonging to these groups.
Linlog kinetics has been successfully applied to estimate the elasticity parameters for a linear pathway from sets of steady state metabolite concentrations and enzyme activities [18]. Additionally, Heijnen et al. proposed a method to obtain flux control coefficients around a branch point, from large enzyme perturbation leading to large steady state flux perturbations, using the linlog kinetic format [24]. This approach allows obtaining explicit solutions for steady state flux and metabolite levels as a function of large changes in enzyme activities.
An alternative to steady state perturbations are dynamic perturbations, in which the system, being initially at the reference state, is disturbed to create time dependent data of transient metabolite levels. The dynamic metabolite profiles are typically obtained as a series of snapshots in time. A recent example of these are in vivo measurements of a number of metabolites in rather dense time sequences of a few seconds or minutes, using 'rapid sampling' methods with subsequent high precision metabolite measurement techniques [25][26][27][28]. Transient data are rich in information and allow determination of the time hierarchy of the different elements of the metabolic network and the causal relationships between the network elements. It is possible to exploit these data to estimate the parameters of a traditional full kinetic model [13,29]. Such a model can subsequently be used to calculate the values of the elasticities at a given reference steady state. Alternatively, when the linlog kinetic format is used the elasticities can be directly estimated from the transient data. This was demonstrated by Kresnowati et al. who estimated the elasticities of a small example network from transient concentration data, assuming that the dynamic fluxes are unknown [30].
An important difference between steady state and dynamic perturbations is that in the former both the steady state enzyme levels (e i ) and the metabolite concentrations (x i ) have to be measured, whereas in rapid dynamic perturbation experiments only x i is required because the enzyme activities can be considered constant within a sufficiently short time window following the perturbation. Note that flux data are required in both methods. However, fluxes are not independent variables as they follow from the measured intra-and extracellular metabolite concentrations and the proper mass balances.
In this work, we present a method to estimate kinetic parameters (elasticities), using linlog kinetics, using metabolome data obtained from steady state and dynamic perturbation experiments. As such, this can be considered as an extension of the work by Visser and Heijnen [17], Wu et al., [18] and Kresnowati et al., [30] who have provided the theoretical framework and a small practical application of the linlog kinetic format. We first generalize the notation to make it applicable to networks of arbitrary size and complexity. Additionally, we address further the question of parameter identification from experimental data. We monitor the propagation of error throughout the proposed parameter estimation procedure and we determine the subset of identifiable parameters.
To illustrate the proof of principle, we apply the presented theory on in silico generated data, with realistic experimental settings, to be able to compare the obtained results with the "known truth". This data is generated using a pre-viously published glycolytic pathway model [31] of Saccharomyces cerevisiae.

In silico pulse experiments
In these experiments, the steady state was perturbed by increasing the extracellular glucose concentration. When the external glucose concentration is increased, the intracellular glucose concentration also increases as the sugar is transported through facilitated diffusion. From Figure 1a it can be seen that the G6P concentration increases around 6% in the first five seconds and then starts to decrease, as the ATP concentration drops due to the phosphorylation of the glucose in the hexokinase reaction. ATP starts to recover after about 15 seconds due to increased increase ATP production downstream of glycolysis. This is in agreement with the literature on the glucose uptake dynamics following a sudden increase in the external sugar concentration [32]. Following this ATP fluctuation, the PEP concentration also decreases and then recovers. In the first five minutes, the concentrations of the external metabolites increase because of their increased production.
After this initial period of 2 min, in which the external glucose concentration is nearly constant, the concentration of external glucose slowly decreases due to continuous consumption and wash-out from the reactor. The cells follow this glucose drop by dropping the levels of intracellular metabolites. For the extracellular products (polysaccharides, glycerol, ethanol) the washout from the reactor is larger than their production and hence their concentration also drops. After about 200 minutes, the original steady state is restored (Figure 1b). Such a pulse experiment therefore always delivers a highly dynamic dataset, followed by a pseudo steady state dataset.

Parameter estimation
The dynamic data obtained during the first five minutes of the glucose pulse experiment ( Figure 1a) were used to estimate the elasticities via the linlog parameter estimation procedure outlined in the Methods section (see section Determining elasticities from dynamic perturbation data). In Figure 1a, the simulation of the same dynamic perturbation using the estimated elasticities is also given. In Figure 2, the estimated elasticities are compared with the theoretically calculated elasticities, derived from mechanistic rate equations at the reference state.
Although the linlog simulation results do not differ much from the noise-free experimental data generated with the mechanistic model, some estimated elasticities are far from the expected theoretical values. The difference is in some cases in magnitude, for other cases there is a sign contradiction. Specifically, 6 out of 16 elasticities (E3, E5, E6, E7, E8, E16) are predicted in very good agreement with the theoretical ones (difference being around 10%), 6 estimated elasticities (E2, E9, E10, E12, E14, E15) agree in sign, but the magnitudes differ by more than 50% (in some cases up to 200%), and 4 elasticities (E1, E4, E11, E13) are estimated with a sign contradiction. Notice that in the case of E12, the 200% deviation is already expected. This elasticity explains the effect of G6P on the polysaccharide formation rate and in the mechanistic model this effect is explained with a Hill type kinetic equation with a Hill coefficient of 8.25, hence this rate is very sensitive to the changes in G6P. To mimic this behavior, E12 is estimated to be much higher than its theoretical value.
The poor identification of the elasticities has two possible reasons: either the experimental design is poor or there is a structural problem resulting that some interactions in the network can not be resolved from the available information. From an experimentalist point of view, it can be seen from Figure 1a that the changes in the intracellular metabolites during the first 300 seconds are between 25-30% which can be easily detected with the current measurement techniques [33]. The extracellular glucose and polysaccharides also change to a detectable extent, but the changes in the ethanol (and therefore glycerol) are only 2-3% which is hard to detect. To check if the poor experimental design causes problems in the parameter identification, we have altered the experimental design to create larger changes in these concentrations. First the biomass concentration was increased to 15 gDW L -1 . Also at the same moment that the fermentor was pulsed with the glucose, the inflow and outflow of the fermentor was stopped so the operation was effectively switched from though-flow mode to batch mode, and the extracellular metabolites were not washed out anymore. This allowed more rapid accumulation of the secreted products resulting in much larger changes. Also, the change in the extracellular glucose concentration was more pronounced because there was no further addition of glucose after the pulse. These changes did not result in considerable changes in the intracellular metabolite profiles; they follow the slight change in the external glucose profile. The results of the new experimental design are depicted in Figure 3 which represents the new experimental data where only the significantly changed external metabolite profiles are depicted, together with the linlog simulation using the estimated elasticities. However, no significant improvement was achieved in the identification of the elasticities compared to the initial experimental design (data not shown). Hence it can be concluded that the problem is not due to poor experimental design.
A closer look at the estimated elasticities reveals that the elasticities belonging to the part of the metabolic network, between V HK and V GAPD are correctly estimated, whereas the estimation of the elasticities belonging to the lower part of glycolysis and the uptake reaction is poor. This is mainly due to the fact that the information content of the metabolite data is insufficient to resolve the complex interactions of metabolites and enzymes. Specifically, in the V PK reaction rate, the feed forward activation of FdP and the mass action effect of PEP are assumed. The perturbation of the system via an increase in the external glucose concentration results in an increase in the V PK flux. However, this does not allow a separate determination as to what extent the change in V PK is due to mass action effect Response of the network to an increase in external glucose concentration Figure 1 Response of the network to an increase in external glucose concentration. a) First five minutes, a "rapid sampling" experiment (black) and the simulation of the same perturbation with linlog kinetics (blue) using the elasticities estimated from the experimental data from dynamic perturbation. b) Long term response of the cells to the glucose pulse. The time is presented in minutes; intracellular metabolites are given in μmol gDW -1 ; extracellular metabolites are given in mM. of PEP or due to feed forward effect of FdP. This results in identifiability problems for the elasticity parameters that describe the ethanol production (E9, E10, and E11). The same argument is valid for the elasticities belonging to the glycerol producing reaction, since in the in silico network the glycerol production is assumed to be proportional with the ethanol production; hence the same combined effect is seen in the glycerol production, resulting in an identifiability problem for the latter triplet of elasticities (E13, E14, E15) In order to analyze the information content of the data, the Fisher Information Matrix (FIM) is calculated as described in [30]. FIM is an indicator of the information content with respect to the parameters and due to the parameter linearity in linlog kinetics FIM is independent of the values of the kinetic parameters. It can be calculated as Y T Y, Y being the design matrix appearing in equation (9). The singular values of the FIM hold information on the number of linear dependencies between the columns of the data matrix. In our system, we checked the singular values of FIM and concluded that not all the singular values are equally significant, which shows that there are some colinearities within the data matrix. This is also evident from the condition number, which equals the ratio of the highest to the lowest eigenvalues of the matrix and is ideally 1 in a completely uncorrelated case. In this case the condition number was calculated to be 5.0 × 10 6 . This fact supports the previous argument that using only one single set of dynamic data is not sufficient to resolve all the elasticity values.
In order to obtain better estimates of the elasticities, steady state perturbation data are introduced. The steady state enzyme perturbation data are generated as described in the Methods section (see section Steady state perturba-Results of the estimation of the elasticities using dynamic data only, comparison of the theoretical elasticities (black), with the linlog elasticities (white) Figure 2 Results of the estimation of the elasticities using dynamic data only, comparison of the theoretical elasticities (black), with the linlog elasticities (white). tions): in order to resolve the feed forward effect of FdP on the V PK reaction, the enzyme amount of the V GAPD reaction is perturbed in addition to the V IN and V ATPase . Using the steady state perturbation data in Table 1, and the calculation procedure described in the Methods section (see section Determining elasticities from steady state perturbation data), we have estimated the elasticities for the in silico network under study. After the estimation of the elasticities using steady state data only, we have also simulated the dynamic perturbation of Figure 1a using linlog kinetics, using the estimated elasticities. The results are presented in Figure 4. Figure 4a shows the experimental data and the linlog simulation of the same pulse, and Figure 4b compares the elasticities estimated using the steady state data with the theoretical elasticities. As can be seen from Figure 4b, using the steady state data, the elasticities are estimated in good agreement with the theoret-ical elasticities. However, as can be observed from Figure  4a, the linlog model deviates significantly from the experimental data, so the estimated elasticities have to be further refined.
Having a good initial estimate for the linlog elasticities from the steady state perturbation data only, we used in the second step both the dynamic and steady state data for the parameter estimation procedure. The nonlinear regression procedure allows combining both available steady state and dynamic experimental data, so that all parameters can be accurately estimated. After the non-linear fit to the data, we have obtained the final estimation of the elasticities. The simulation of the pulse experiment with these final estimates, and the comparison of the elasticities are given in Figure 5a and Figure 5b respectively, The results of the new experimental design Figure 3 The results of the new experimental design (The inlet and outlet of the reactor is blocked just after the glucose is increased. The biomass concentration is 15 gDW L -1 , see text for further details in the experimental design) (black) and simulation of the same perturbation with linlog kinetics (blue). EtOH which shows that most of the elasticities are correctly estimated and that the experimental data are well described.

Analysis of parameter identifiability and model reduction
In the previous sections, we have seen that we could not determine all the parameters from dynamic perturbation data only and we have introduced the steady state perturbation data. After determination of all of the elasticities, using both types of perturbation data, we analyzed back the identifiability of each parameter under noise. In order to carry the identifiability and error propagation analysis, MC simulation was used. As described in the Methods section (see section Error propagation analysis), 10% relative error was added to the noise free data represented in Figure 1a (rapid sampling experiment), and the non-linear estimation procedure was implemented using the elasticities estimated from the steady state data, as the initial guesses. After repeating this scheme 50 times, we obtained a distribution for each of the elasticities from which we calculated the relative error for each elasticity as the standard deviation per mean of the corresponding elasticity ( Table 2).
The adjustable parameter λ in equation (6) determines the relative importance of the different sources of data. In our case, these are dynamic and steady state data. Notice that during the MC simulations, noise is added only to dynamic data. For steady state perturbations, the number of data points that can be obtained is theoretically infinite. This implies that, considering the central limit theorem, the steady state data can, in principle, be considered as noise free. However, the aim of the MC simulations was to elucidate the potentially unidentifiable elasticities using dynamic perturbation data, and therefore we chose λ to be equal to zero. The other extreme (λ = 1) would suppress the effect of noise and wouldn't lead to detection of poorly identifiable elasticities. The values in between, are up to the choice of the modeler, depending on how many data points have been obtained from a dynamic pulse experiment, the standard deviations of the measurements, etc. We have implemented different values for the value of λ, but the outcomes of the MC simulation, i.e.
which elasticities can hardly be identified, did not change qualitatively.
From the distribution of each elasticity, the p-value, indicating the probability that the actual elasticity is zero and that the estimated value was caused by the random error only, is calculated via t-test and the results are given in Table 2 (second column). The elasticities E1, E4, E9, and E15 have a relative standard deviation higher than 200%, so these elasticities are not identifiable under the 10% noise present. Therefore we conclude that these elasticities are clearly candidates for model reduction by assigning them to be zero (E1, E4, E9, E15 = 0).
A closer look reveals that E9 represents the feed-forward effect of FdP on V PK . E4 and E15 represent the effect of ATP on V HK and V Gol respectively and E1 is the feedback inhibition of G6P on V IN . Notice that the elasticities of the reactions of V IN , V PK and V Gol were already badly identi-fied, when we estimated the elasticities using dynamic data only (Figure 2).
Using the arguments stated above, we set the values of these four (E1, E4, E9, E15) elasticities to zero, implemented the non-linear regression step, with the new sparser elasticity matrix, and estimated the elasticities using steady state and dynamic data. The results are presented in Figure 6a, where the experimental data and the linlog simulation using the new set of elasticities are given Results of the final estimation. a) Experimental data (black) and the linlog simulation using final elasticities (blue). Units are the same as Figure 1. b) Comparison of the theoretical elasticities (black), initial estimates from steady state perturbation data (grey) and the final estimates (white).  Figure 6b where the comparison of the estimated elasticities with the theoretical ones is presented. It is clear that with the new set of 12 elasticities (replacing the original 16) the dynamic pulse can also be simulated in good agreement with the experimental data. The value of the objective function increased only by 0.04%, when the degrees of freedom increased by 21% (degrees of freedom increased from 19 (= 35 data points -16 parameters), to 23 (= 35 data points -12 parameters)) showing that elimination of these four elasticities in fact improved the quality of the fit.

Cross validation of the obtained model
In order to validate the estimated elasticities, we have performed a cross validation study, for which we have generated an independent dataset, consisting of steady state perturbation data, and we have predicted the effect of that perturbation with our estimated parameters, and compared the results. Since we already perturbed V IN , V GAPD and V ATPase and included the resulting data in the identification dataset (Table 1), we had to select another enzyme in the pathway as the target for independent steady state perturbation. Among the remaining enzymes, we chose to inhibit the activity of V PFK by 50%, because this enzyme assumes many interactions and is known to be a complex enzyme, which would be a challenge to cross validate for our network. The comparison between the model prediction and the experimental results is given in Figure 7. The level of extracellular and intracellular glucose did not change, whereas the levels of G6P and polysaccharides increased considerably and the levels of PEP, FdP, ethanol and glycerol decreased. The linlog model predicted this steady state perturbation with maximal deviation of only 8.5% in G6P. We conclude that the linlog model with the estimated parameters performed satisfactorily for the test case considered.

Calculation of the systemic properties
After the estimation of the elasticities, we have calculated the flux control coefficients (C J0 ) using equation (11). The comparison of these with the theoretical control coefficients of the main flux (J PK ) is presented in Figure 8. The flux is controlled mainly by three enzymes: the hexose transporter (e IN ), and to a lesser extent by the phosphofructokinase (e PFK ), and the ATPase (e ATPase ). These findings are in qualitative agreement with the literature [31]. It is noticeable that the elimination of the four elasticities did not change these results significantly; the same three reactions still have the main control of the glycolytic flux.
The concentration control coefficients (C x0 ) are also calculated using equation (10). Figure 9 gives the comparison of the C x0 s of the two branch point metabolites, G6P and FdP. The levels of both metabolites can be increased by increasing the activity of the hexose transporter. Increasing the activity of e ATPase will decrease the level of G6P, whereas it increases the level of FdP. Increasing the activity of e PFK will decrease and increase the levels of G6P and FdP respectively. Lastly, an increase in the activity of e GAPD will result in a slight increase in G6P level and a decrease in FdP level. These results are also in agreement with the previous studies. As expected, the model reduction did not

Discussion
Large mathematical models are needed for finding the targets for engineering realistic metabolic systems. Currently, the available large models of several organisms e.g. Saccharomyces cerevisiae are mostly stoichiometric models. Although these models are highly relevant for knockout studies and determining the capabilities of the network considered, they have some limitations, e.g., they cannot predict the time courses, do not give insights in molecular events and are based on the "assumed goals" of the cells. Hence, for driving engineering interventions, kinetic models are needed. Here, we have presented a method to estimate the elasticities using the linlog kinetic format, from a combination of steady state and dynamic pulse experiments. In the past, there have been efforts to extract MCA parameters from transient metabolite data. Liao and Delgado presented a method to obtain the control coefficients from such data [34]. Later, it was shown by Elde and Zacchi through MC simulations that this method is highly sensitive to noise [35]. Moreover a full dynamic kinetic model is not obtained. Here, we have described an estimation procedure which directly yields the elasticities as parameters instead of control coefficients, hence allowing the construction of a full kinetic model. Subsequently, the control coefficients can be obtained from the summation and connectivity theorems. Such a full kinetic model can also be used to simulate the effect of different perturbations on metabolism of a microorganism in a fermentor.
The method presented here assumes the availability of dynamic perturbation data. Although the list of metabolites is growing, not all of the metabolites in the cell can be measured. In a recent study, Wang and coworkers described an extension of the MCA, under uncertainty [36][37][38] in which the authors described a framework to calculate the control coefficients when either there are no measurements on metabolites, or the available measurements are subject to high uncertainty. Their proposed framework can be considered as complementary to the method presented in this paper. In a case where accurate measurements are available, the present paper provides a mathematical approach to obtain the elasticities as kinetic parameters, from which one can calculate the control coefficients.
When compared to previous studies where linlog kinetics have been applied in combination with highly idealized linear pathways or small networks with branch points, the model in this work represents a more realistic case. The model used in this study represents an intermediate size system with branch points and conserved moieties. The proposed method can easily be extended to larger networks that are needed not only to understand the functioning of living cells, but also to infer engineering applications of e.g. microorganisms.
In the example considered in this paper, we have assumed that the zero entries of the elasticity matrix are known. This is a reasonable assumption, since numerous enzymes in the primary metabolism of many organisms are extensively studied and there is a dedicated public compen-Results of the estimation of the elasticities, after model reduction Figure 6 Results of the estimation of the elasticities, after model reduction. a) Experimental data (black) and the linlog simulation with the reduced elasticity matrix (blue). Units are the same as Figure 1. b) Comparison of the theoretical elasticities (black), with the linlog elasticities (white). dium for such information [39], allowing the use of a priori knowledge on elasticities which are zero.
In the absence of any a priori knowledge on the zero elasticities, the elasticities have to be estimated from the full elasticity matrix, for which vastly more experimental data are needed. In the MCA literature, there are several attempts to determine the different perturbations needed in order to determine all elasticities [22,23]. However, one needs to consider that an enzyme will never be affected by all metabolites; there is a physical limit for e.g. the maximum number of binding sites of each enzyme in a biological network.
From the results presented here, it has been shown that, despite the rich information content of the data obtained from dynamic experiments, not every elasticity of the network could be correctly estimated. This problem of parameter identifiability would be much more pronounced in a case where all (possible) interactions are taken into account, caused by the combinatorial explosion of number of parameters to be estimated. In the current work, in order to resolve some of the interactions which could not be resolved from the dynamic data, steady state enzyme perturbations have been introduced. In order to get full kinetic models of microorganisms, accurate measurements of metabolite concentrations resulting from independent perturbations are needed, such as presented in [28] and [40]. At this point it is important to state that the property of the linlog kinetic format, that the rate equations are linear in the kinetic parameters, allows simultaneous use of alternative datasets by concatenating them in one parameter estimation scheme, i.e. it is straightforward to extend the data matrix Y in equation (9) and the vector χ sim in equation (6), with the data from alternative dynamic and steady state pertur- Figure 7 Results of the cross validation study. An independent steady state perturbation is introduced, in which the activity of V PFK is decreased by half (see text). A comparison of the normalized in silico experimental results (black) with normalized model prediction (white) are given. The concentrations are normalized with respect to the reference conditions, given in Table 3. bation experiments. In addition, the decoupling of the parameters is also immediate i.e. we can isolate poorly identifiable elasticities and estimate the remaining ones accurately and thereby reduce the problem.

Results of the cross validation study
It is worth mentioning that the obtained parameters (elasticities) are true kinetic parameters that reflect the properties of the enzymes with respect to the corresponding metabolites. They can be assumed to remain invariant as long as the enzyme keeps its properties in response to the changes in the environmental conditions. Additionally, since the linlog kinetics provides an approximation to the actual rate of the corresponding reaction over certain interval, the elasticity parameter that performs best may be different than the theoretical value of the corresponding elasticity. This should be kept in mind when comparing some of the less well determined elasticity values to the theoretical ones.

Conclusion
Constructing dynamic models of metabolic reaction networks under in vivo conditions using data obtained from perturbation experiments remains still a challenging problem in the area of systems biology. In this contribution, we presented a method which allows the determination of the elasticities directly from experimental data from typical dynamic and/or steady state perturbation experiments. These elasticities allow the reconstruction of the full kinetic model of the glycolysis of Saccharomyces cerevisiae, and the determination of the control coefficients. We further show by a posteriori parameter identifiability analysis that a subset of elasticities could not be identified using dynamic perturbation data only. Introduction of additional experimental information, i.e. steady state experiments, solved this parameter identification problem. Considering the description of the dynamics, results of the cross validation studies and the final values of the elasticities, we conclude that linlog kinetics, although being an approximate kinetic format, performs very satisfactorily for estimating elasticities from data obtained from dynamic simulation of a mechanistic model, that was realistic in terms of biochemical complexity (glycolysis), noise added and sampling frequency.

Methods
In linlog kinetics, all rate equations have the same mathematical structure: proportionality to the enzyme level and

Determining elasticities from steady state perturbation data
Given that for steady state perturbation experiments, we obtain information on fluxes (J i 's) using the mass balances and the measured metabolite concentrations (x j ), we can directly use the equation (3) for the estimation of the elasticities. When it is rearranged, the equation (3) can be presented in the following standard linear model: where a is the (r × 1) measurement vector that contains the measured normalized fluxes and normalized enzyme levels (a = (J 0 e') -1 v -i), b the (p × 1) vector that contains the non-zero elasticities of the original elasticity matrices E x and E c , and Y is the (r × p) design matrix of which each i th row contains as nonzero elements ln(x j / ) and ln(c k / ) at positions corresponding with the non-zero elasticities and in vector b. The equation (4) can be solved to obtain the elasticities using linear regression according to: This shows that estimation of elasticities from steady state perturbations requires data on metabolite levels (ln(x j / ) and ln(c k / ) presented in matrix Y) enzyme activities (presented in matrix e') and steady state fluxes (presented in the vector v and the matrix J 0 ).

Determining elasticities from dynamic perturbation data
Since in a dynamic perturbation experiment, the rate information can not be obtained directly, we will follow different procedure here and we will only make use of the time profiles of the measured metabolites (x i 's) in treating dynamic perturbation data. The elasticities are, in this case, estimated via non-linear parameter estimation procedure, in which the objective function to be minimized is the weighed squared error between the experimental response and simulation results. The general form of the objective function is given in equation (6): the limits of the summations, m, q and r are the number of metabolites, measured time points and fluxes respectively. In equation (6), χ exp is the experimentally measured, χ sim is the simulated (using linlog kinetics) metabolite matrix containing intracellular and extracellular metabolite concentrations. Two additional parameters are introduced here: γ to weigh the effect of uncertainty in different metabolites and λ to weigh the effect of two different sources of data, namely steady state and dynamic perturbation data. In the second term of the equation (6), J' sim and J' exp represent the experimental and simulated steady state fluxes. The equation (6) contains the rate information implicitly, since the χ sim results from the integration of the set of ode's representing the dynamics of the system. Note also that although we already explained the treatment of the steady state perturbation data in the previous section, we explicitly included the second term in equation (6) again; to state clearly that this form of the objective function allows also the integration of data from different types of experiments.
Two main classes of optimization algorithms are available to minimize the objective function in equation (6): greedy algorithms and evolutionary algorithms. Moles et al. presented a comparison of global optimization methods used for parameter estimation in biochemical pathways [41]. In that review, they discussed various global optimization methods and concluded that the algorithm that uses evolutionary strategy using stochastic ranking performed best. On the other hand, they also pointed out that, generally, evolutionary algorithms require high computational effort. The alternative, greedy algorithms, are fast, but in turn require an initial estimate close to the optimal solution. A good initial estimate is necessary not only to evade local minima and improve the solution performance, (i.e. convergence time, finding a global optimum) but also to prevent highly stiff systems, which increase the computation time. In this work, we chose to use a greedy (simplex) algorithm, mainly because the linlog kinetic format has the advantage to provide a good initial estimate that can be obtained directly from the experimental data via linear regression (see below) so that the method presented in this paper does not require the robustness of the evolutionary algorithms towards the initial estimate. It is noteworthy that alternative approximative kinetic formats such as the two proposed formats of BST (i.e. GMA or S-system forms) lack this advantage of providing a good initial guess to the non-linear regression step. With these formats, zero is generally assumed as the initial guess for the non-linear parameter estimation problem [42].
To obtain initial estimates, we start with the general dynamic model of a metabolic system in a typical rapid pulse experiment in a chemostat which is given by the mass balances for the m x intracellular (x) and m c extracellular (c) metabolites: in the feed expressed in μmol L -1 , D is the dilution rate (hr -1 ), μ is the biomass specific growth rate (hr -1 ) and c X is the biomass concentration in gDW L -1 .
Substitution of v by linlog kinetic rate equation (3) in the mass balance equation (7) and assuming that there is no change in the enzyme levels (e' = I r × r ) yields: After rearrangement (and taking into account that due to steady state of the reference, S·J 0 ·i = 0), the set of equations are integrated for each metabolite, from t i to t i+1 , This can be presented in the following standard linear model: Where and Y is a ((q -1)·(m x + m c )) × p matrix, combining Y x and Y c , containing the time integrals of the logarithm of the normalized metabolite concentrations, and S, S c , J 0 . Here q is the number of measured time points. b is a (p × 1) vector containing the unknown elasticity coefficients, which is estimated by linear regression similarly as above in eq. (5). To obtain the integrals of equation (9), a linear interpolation of concentration between the measurements at t i and t i+1 is used.
Although both equations ( (6) and (5)) are based on least square principle, the use of equation (6) (where b of equation (5) is used as an initial estimate) has advantages, namely, further improvement of the quality of the estimated parameters, because initial linear regression assumes that errors are only present in the dependent variables (a in Eq. (5)), whereas errors in the measured metabolites in fact also affect the independent variables (matrix Y in Eq.(4) and Eq. (9)). Furthermore, the non-linear optimization allows the incorporation of additional degrees of freedom for the correction of errors in the metabolite levels at the first data point (t 0 ) used for the model simulation (integration of Eq. (8)). Moreover, during the non-linear parameter estimation, linear interpolation between the logarithms of the measured metabolites is not needed anymore. An additional ease of the non-linear regression is in the introduction of the adjustable parameter γ to weigh the effect of one (or more) metabolite(s) on the fit. This is useful, when the measurement precision is poor for a certain metabolite or when the relative change in one metabolite is small compared to the others. The obvious choice for γ would be the inverse of the standard deviation of the corresponding metabolite.
Although the weighing parameter γ can be introduced in the linear regression step as well, introducing the factor γ in the non-linear regression step is much more straightforward.

Calculation of the systemic properties
Having estimated the elasticities, we can calculate scaled flux and concentration control coefficients , using classical summation and connectivity theorems [34]: (11) Here L x is the metabolite link matrix, which links the dependent metabolites to the independent ones and J 0 , S and E x0 are steady state flux matrix, stoichiometric matrix and the elasticity matrix as represented in the previous sections. The scaled response coefficient (R J0 ) for the external metabolites can also be calculated [43]:

Metabolic model
The glycolytic pathway of Saccharomyces cerevisiae has been extensively studied in terms of enzyme kinetics and metabolite levels. Over the last 30 years, modeling of the glycolysis of yeast cells has been applied for several reasons, such as simulation of physiology to understand regulation under dynamic conditions, MCA to amplify and/ or redirect metabolic flux [29,31,[44][45][46][47], and to investigate how changing environmental conditions change metabolism.
In order to test the proposed method for estimation of elasticities from steady state and dynamic perturbations and the applicability of linlog kinetics, noise-free data were generated using the modified version of the mechanistic model of Galazzo and Bailey describing the glycolysis of yeast cells, which contains 8 reactions and 9 metabolites ( Figure 10). The set of mechanistic kinetic equations are highly non-linear and contain 41 parameters. The parameters used were taken from Galazzo and Bailey [31] using the experimental settings for pH ext = 5.5 with suspended cells in the original study. The glucose uptake rate of the original model was modified in order to be able to mimic realistic fermentation conditions. The details of the metabolic model, the list of mechanistic rate equations and the mass balances describing the kinetics of the metabolic network of Figure 10 are given in the Appendix.

Reactor model
The chosen glycolytic model was not designed to describe growth, so in order to prevent washout of the cells and to keep a constant amount of biomass within the fermentor, the organism was assumed to be cultivated in a chemostat with complete biomass retention. The glucose concentration in the feed was set at a low value resulting in low steady state ethanol and glycerol concentrations, thus allowing measurable changes in those metabolites when a perturbation was applied. Furthermore the biomass concentration was set at 5 gDW L -1 and the dilution rate was set at 1.5 hr -1 to provide a reasonable substrate load (q s ) and glycolytic flux. The resulting reference steady state conditions (fluxes, metabolite levels) are represented in Table 3.

Linlog kinetic model
The number of non-zero entries of the elasticity matrix ([E x E c ] in equation (3)) defines the number of elasticities to be estimated. Specifically, the elasticity matrix has 72 entries (8 reactions × 9 metabolites) and it followed that 16 of these were nonzero for the Galazzo and Bailey model which was extended with the alterations on the uptake reaction, mentioned in the Appendix. For the present network, the structure of the elasticity matrix is presented in Figure 11. The linlog model contains 16 elasticities and 8 reference rates as parameters whereas the mechanistic model has 41 parameters.

Dynamic perturbations
To obtain transient data, the reference steady state (Table  3) was perturbed by increasing the extracellular glucose concentration (G ext ) at t = 0 from 2666 μmol L -1 to 8000 μmol L -1 , where after the relaxation of the system was simulated. The levels of the metabolites were recorded until the system returned to the reference state.
In choosing the number of data points, i.e. the sampling frequency, special attention had been paid to use a realistic number of data points that an experimenter can obtain from a typical rapid sampling experiment (for a short time period) or from a dynamic pulse experiment (for a longer time period). The data were obtained as follows: in the first 120 seconds, where the initial dynamics after the glucose uptake are important, 20 samples uniformly distributed over time were taken, in the next 120 seconds, 10 samples and in the last minute 5 samples were taken. For long term experiments, from the 5 th minute until the 10 th minute, 5 samples were taken, and from the 10 th minute until the end (400 min) 30 uniformly distributed samples were taken. Metabolites for which concentrations were assumed to be available at these times are intracellular glucose, G6P, FdP, PEP, ATP and extracellular metabolites glucose, ethanol, glycerol and polysaccharides.

Steady state perturbations
For the steady state perturbations, the enzyme levels were modulated and the system was allowed to reach a new steady state. The new steady state conditions (fluxes, metabolite levels) were measured.
The minimum number of independent steady state perturbations needed in order to identify all the elasticities for a reaction equals the number of non-zero elasticities for that rate expression. From Figure 11, where the elasticity matrix is shown, it can be inferred that the minimum number of perturbations to be applied per reaction is 3, due to the V PK and V Gol reactions that have three non-zero elasticities (E9-E10-E11 and E13-E14-E15 respectively).
As the reaction to be perturbed, V IN was chosen because it is at the beginning of the glycolysis and therefore affects the entire pathway. As the second reaction to be perturbed, V ATPase was chosen because it directly affects the level of ATP which is the effector of 6 out of 8 reactions. Lastly, V GAPD was chosen for two reasons: 1) to resolve the branch point relations around G6P and 2) to resolve the feed forward effect of FdP on V PK . Note that since the production rate of glycerol (V Gol ) was coupled (proportional) Glycolytic Pathway of Saccharomyces cerevisiae (Adapted from Galazzo and Bailey, 1990) Figure 10 Glycolytic Pathway of Saccharomyces cerevisiae (Adapted from Galazzo and Bailey, 1990). to the production of ethanol (V PK ), the steady state branch point relations around FdP can be reformulated as follows: At steady state, J PFK = J GAPD + J Gol and J GAPD = J PK /2. Additionally, as described above, J Gol = k·J PK , hence J PFK = (1 + 2·k)·J GAPD where the value of k is given in the Appendix. Therefore, there is no need to introduce additional steady state perturbations to resolve the branch point relations around FdP. Table 1 summarizes the steady state perturbations used as data in this study. For example, in the first column, the activity of the glucose uptake rate was increased by 20% and the system is allowed to reach the new steady state, the fluxes and the metabolite levels relative to the reference state are presented in the corresponding column. Similarly, the second column represents the inhibition of V ATPase by 10% and the third represents the data after the inhibition of V GAPD by 20%.

Error propagation analysis
In order to inspect whether the estimated elasticities are robust against experimental noise in metabolite concentrations, we performed Monte Carlo (MC) simulations. To this end, random errors of 10% are repeatedly added to form 50 sets of "noisy data", which were used in the non-linear parameter estimation procedure. This yields a distribution of estimated elasticities which allows calculating the relative standard deviation for each elasticity as the ratio of standard deviation to the mean of that elasticity distribution.

Parameter identifiability and model reduction
The elasticities that have a high relative standard deviation are suspected to be poorly identifiable in the network. Potentially unidentifiable elasticities were recognized by calculating a p-value for testing the null hypothesis that the true mean of a given elasticity distribution is zero c feed Glucose through the statistic μ/(σ/ ) that has a t-distribution with N degrees of freedom, where N is the number of MC simulations. The elasticities that were found to be potentially unidentifiable were set to zero, thus reducing the number of non-zero entries in the elasticity matrix. Subsequently, for the reduced model, the remaining elasticities were re-estimated, using the same steady state and dynamic perturbation data and the parameter estimation procedure explained in the previous sections.

Software used
MATLAB software (version 6.5, Stat-Ease Inc., Minneapolis, USA) was used for generating the in silico data and analyses of the data and the parameters obtained. The non-linear minimization problem of parameter estimation was solved using the Nelder-Mead simplex search method. the hexokinase reaction is described by a two substrate single displacement mechanism, the phosphofructokinase and pyruvate kinase reactions are described by allosteric kinetics according to Monod-Wyman-Changeux, polysaccharide formation is explained by Hill type kinetics and the glyceraldehyde 3-phosphate dehydrogenase reaction is described by a kinetic expression which takes into account the crossed product inhibition by G3P and NAD + and competitive inhibition by AMP, ADP and ATP. The ATP consumption is assumed to have first order kinetics.
One property of the original model is that the glycerol production is assumed to be stoichiometrically coupled to ethanol formation, i.e. there is no separate dynamics for the glycerol formation. Additionally, the NAD + and NADH levels are assumed to be constant, as given below.
As it was originally published, the glucose influx was a linear function of the glucose-6-phosphate concentration only; hence, it was insensitive to the changes in extracellular glucose concentration. To mimic realistic fermentation conditions, where one can perturb the organism by adding a specific compound such as glucose to a chemostat, we have modified the glucose uptake reaction and modeled the glucose uptake by a symmetrical carrier model, suggesting that the transport of glucose across the cell membrane occurs via facilitated diffusion, as proposed by Teusink et al., [4]. The list of mechanistic rate equations used is given in the following section.

Mechanistic rate equations used to construct the in silico network. (Adapted from Galazzo and Bailey, 1990)
The rate equations of the reactions in Figure

Mass balances
The concentrations of the extracellular metabolites are described by the following mass balances: